qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations
@ 2018-12-17 12:23 Mark Cave-Ayland
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:23 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

This patchset is an attempt at trying to improve the VMX (Altivec) instruction
performance by making use of the new TCG vector operations where possible.

In order to use TCG vector operations, the registers must be accessible from cpu_env
whilst currently they are accessed via arrays of static TCG globals. Patches 1-3
are therefore mechanical patches which introduce access helpers for FPR, AVR and VSR
registers using the supplied TCGv_i64 parameter. Meanwhile patch 4 fixes a minor issue
spotted by Richard during review to ensure that AVR registers are not modified until
after exceptions are processing during register load.

Once this is done, patch 5 enables us to remove the static TCG global arrays and updates
the access helpers to read/write to the relevant fields in cpu_env directly.

Patches 6 and 7 perform the legwork required to enable VSX instructions to be converted
to use TCG vector operations in future by rearranging the FP, VMX and VSX registers into
a single aligned VSR register array (the scope of this patchset is VMX only).

The final patches 8 and 9 convert the VMX logical instructions and addition/subtraction
instructions respectively over to the TCG vector operations.

NOTE: there are a lot of instructions that cannot (yet) be optimised to use TCG vector
operations, however it struck me that there may be some potential for converting
saturating add/sub and cmp instructions if there were a mechanism to return a set of
flags indicating the result of the saturation/comparison.

Finally thanks to Richard for taking the time to answer some of my (mostly beginner)
questions related to TCG.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>


v2:
- Rebase onto master
- Add comment explaining rationale for FPR helpers in description for patch 1
- Add R-B tags from Richard
- Add patch 3 to delay AVR register writeback as spotted by Richard
- Add patches 6 and 7 to merge FPR, VMX and VSX registers into the vsr array
  to facilitate conversion of VSX instructions to vector operations later
- Fix accidental bug whereby the conversion of get_vsr()/set_vsr() to access
  data from cpu_env was incorrectly squashed into patch 3
- Move set_fpr() further down in gen_fsqrts() and gen_frsqrtes() in patch 1


Mark Cave-Ayland (9):
  target/ppc: introduce get_fpr() and set_fpr() helpers for FP register
    access
  target/ppc: introduce get_avr64() and set_avr64() helpers for VMX
    register access
  target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}()
    helpers for VSR register access
  target/ppc: delay writeback of avr{l,h} during lvx instruction
  target/ppc: switch FPR, VMX and VSX helpers to access data directly
    from cpu_env
  target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  target/ppc: move FP and VMX registers into aligned vsr register array
  target/ppc: convert VMX logical instructions to use vector operations
  target/ppc: convert vaddu[b,h,w,d] and vsubu[b,h,w,d] over to use
    vector operations

 linux-user/ppc/signal.c             |  24 +-
 target/ppc/arch_dump.c              |  12 +-
 target/ppc/cpu.h                    |  26 +-
 target/ppc/gdbstub.c                |   8 +-
 target/ppc/helper.h                 |   8 -
 target/ppc/int_helper.c             |  63 ++-
 target/ppc/internal.h               |  29 +-
 target/ppc/machine.c                |  72 +++-
 target/ppc/monitor.c                |   4 +-
 target/ppc/translate.c              |  74 ++--
 target/ppc/translate/dfp-impl.inc.c |   2 +-
 target/ppc/translate/fp-impl.inc.c  | 490 +++++++++++++++++-----
 target/ppc/translate/vmx-impl.inc.c | 186 ++++++---
 target/ppc/translate/vsx-impl.inc.c | 782 ++++++++++++++++++++++++++----------
 target/ppc/translate_init.inc.c     |  24 +-
 15 files changed, 1262 insertions(+), 542 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
@ 2018-12-17 12:23 ` Mark Cave-Ayland
  2018-12-17 16:40   ` Richard Henderson
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 2/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Mark Cave-Ayland
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:23 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

These helpers allow us to move FP register values to/from the specified TCGv_i64
argument in the VSR helpers to be introduced shortly.

To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate.c             |  10 +
 target/ppc/translate/fp-impl.inc.c | 490 ++++++++++++++++++++++++++++---------
 2 files changed, 390 insertions(+), 110 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 2b37910248..1d4bf624a3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6694,6 +6694,16 @@ static inline void gen_##name(DisasContext *ctx)               \
 GEN_TM_PRIV_NOOP(treclaim);
 GEN_TM_PRIV_NOOP(trechkpt);
 
+static inline void get_fpr(TCGv_i64 dst, int regno)
+{
+    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+}
+
+static inline void set_fpr(int regno, TCGv_i64 src)
+{
+    tcg_gen_mov_i64(cpu_fpr[regno], src);
+}
+
 #include "translate/fp-impl.inc.c"
 
 #include "translate/vmx-impl.inc.c"
diff --git a/target/ppc/translate/fp-impl.inc.c b/target/ppc/translate/fp-impl.inc.c
index 08770ba9f5..04b8733055 100644
--- a/target/ppc/translate/fp-impl.inc.c
+++ b/target/ppc/translate/fp-impl.inc.c
@@ -34,24 +34,38 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
 #define _GEN_FLOAT_ACB(name, op, op1, op2, isfloat, set_fprf, type)           \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
+    TCGv_i64 t3;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
+    t3 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);     \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rC(ctx->opcode));                                             \
+    get_fpr(t2, rB(ctx->opcode));                                             \
+    gen_helper_f##op(t3, cpu_env, t0, t1, t2);                                \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        get_fpr(t0, rD(ctx->opcode));                                         \
+        gen_helper_frsp(t3, cpu_env, t0);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t3);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t3);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
+    tcg_temp_free_i64(t3);                                                    \
 }
 
 #define GEN_FLOAT_ACB(name, op2, set_fprf, type)                              \
@@ -61,24 +75,34 @@ _GEN_FLOAT_ACB(name##s, name, 0x3B, op2, 1, set_fprf, type);
 #define _GEN_FLOAT_AB(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rB(ctx->opcode)]);                               \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rB(ctx->opcode));                                             \
+    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        get_fpr(t0, rD(ctx->opcode));                                         \
+        gen_helper_frsp(t2, cpu_env, t0);                                     \
     }                                                                         \
+    set_fpr(rD(ctx->opcode), t2);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t2);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
 }
 #define GEN_FLOAT_AB(name, op2, inval, set_fprf, type)                        \
 _GEN_FLOAT_AB(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
@@ -87,24 +111,35 @@ _GEN_FLOAT_AB(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
 #define _GEN_FLOAT_AC(name, op, op1, op2, inval, isfloat, set_fprf, type)     \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
+    TCGv_i64 t2;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
+    t2 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,                       \
-                     cpu_fpr[rA(ctx->opcode)],                                \
-                     cpu_fpr[rC(ctx->opcode)]);                               \
+    get_fpr(t0, rA(ctx->opcode));                                             \
+    get_fpr(t1, rC(ctx->opcode));                                             \
+    gen_helper_f##op(t2, cpu_env, t0, t1);                                    \
+    set_fpr(rD(ctx->opcode), t2);                                             \
     if (isfloat) {                                                            \
-        gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,                    \
-                        cpu_fpr[rD(ctx->opcode)]);                            \
+        get_fpr(t0, rD(ctx->opcode));                                         \
+        gen_helper_frsp(t2, cpu_env, t0);                                     \
+        set_fpr(rD(ctx->opcode), t2);                                         \
     }                                                                         \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t2);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
+    tcg_temp_free_i64(t2);                                                    \
 }
 #define GEN_FLOAT_AC(name, op2, inval, set_fprf, type)                        \
 _GEN_FLOAT_AC(name, name, 0x3F, op2, inval, 0, set_fprf, type);               \
@@ -113,37 +148,51 @@ _GEN_FLOAT_AC(name##s, name, 0x3B, op2, inval, 1, set_fprf, type);
 #define GEN_FLOAT_B(name, op2, op3, set_fprf, type)                           \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
-                       cpu_fpr[rB(ctx->opcode)]);                             \
+    get_fpr(t0, rB(ctx->opcode));                                             \
+    gen_helper_f##name(t1, cpu_env, t0);                                      \
+    set_fpr(rD(ctx->opcode), t1);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t1);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
 }
 
 #define GEN_FLOAT_BS(name, op1, op2, set_fprf, type)                          \
 static void gen_f##name(DisasContext *ctx)                                    \
 {                                                                             \
+    TCGv_i64 t0;                                                              \
+    TCGv_i64 t1;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
+    t1 = tcg_temp_new_i64();                                                  \
     gen_reset_fpstatus();                                                     \
-    gen_helper_f##name(cpu_fpr[rD(ctx->opcode)], cpu_env,                     \
-                       cpu_fpr[rB(ctx->opcode)]);                             \
+    get_fpr(t0, rB(ctx->opcode));                                             \
+    gen_helper_f##name(t1, cpu_env, t0);                                      \
+    set_fpr(rD(ctx->opcode), t1);                                             \
     if (set_fprf) {                                                           \
-        gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);                   \
+        gen_compute_fprf_float64(t1);                                         \
     }                                                                         \
     if (unlikely(Rc(ctx->opcode) != 0)) {                                     \
         gen_set_cr1_from_fpscr(ctx);                                          \
     }                                                                         \
+    tcg_temp_free_i64(t0);                                                    \
+    tcg_temp_free_i64(t1);                                                    \
 }
 
 /* fadd - fadds */
@@ -165,19 +214,25 @@ GEN_FLOAT_BS(rsqrte, 0x3F, 0x1A, 1, PPC_FLOAT_FRSQRTE);
 /* frsqrtes */
 static void gen_frsqrtes(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_frsqrte(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                       cpu_fpr[rB(ctx->opcode)]);
-    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                    cpu_fpr[rD(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_frsqrte(t1, cpu_env, t0);
+    gen_helper_frsp(t1, cpu_env, t1);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fsel */
@@ -189,34 +244,47 @@ GEN_FLOAT_AB(sub, 0x14, 0x000007C0, 1, PPC_FLOAT);
 /* fsqrt */
 static void gen_fsqrt(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                     cpu_fpr[rB(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_fsqrt(t1, cpu_env, t0);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_fsqrts(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    gen_helper_fsqrt(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                     cpu_fpr[rB(ctx->opcode)]);
-    gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,
-                    cpu_fpr[rD(ctx->opcode)]);
-    gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_fsqrt(t1, cpu_env, t0);
+    gen_helper_frsp(t1, cpu_env, t1);
+    set_fpr(rD(ctx->opcode), t1);
+    gen_compute_fprf_float64(t1);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /***                     Floating-Point multiply-and-add                   ***/
@@ -268,21 +336,32 @@ GEN_FLOAT_B(rim, 0x08, 0x0F, 1, PPC_FLOAT_EXT);
 
 static void gen_ftdiv(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_ftdiv(cpu_crf[crfD(ctx->opcode)], t0, t1);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_ftsqrt(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    gen_helper_ftsqrt(cpu_crf[crfD(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
 
@@ -293,32 +372,46 @@ static void gen_ftsqrt(DisasContext *ctx)
 static void gen_fcmpo(DisasContext *ctx)
 {
     TCGv_i32 crf;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
     crf = tcg_const_i32(crfD(ctx->opcode));
-    gen_helper_fcmpo(cpu_env, cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)], crf);
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_fcmpo(cpu_env, t0, t1, crf);
     tcg_temp_free_i32(crf);
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fcmpu */
 static void gen_fcmpu(DisasContext *ctx)
 {
     TCGv_i32 crf;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_reset_fpstatus();
     crf = tcg_const_i32(crfD(ctx->opcode));
-    gen_helper_fcmpu(cpu_env, cpu_fpr[rA(ctx->opcode)],
-                     cpu_fpr[rB(ctx->opcode)], crf);
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_fcmpu(cpu_env, t0, t1, crf);
     tcg_temp_free_i32(crf);
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /***                         Floating-point move                           ***/
@@ -326,100 +419,153 @@ static void gen_fcmpu(DisasContext *ctx)
 /* XXX: beware that fabs never checks for NaNs nor update FPSCR */
 static void gen_fabs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_andi_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                     ~(1ULL << 63));
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_andi_i64(t1, t0, ~(1ULL << 63));
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fmr  - fmr. */
 /* XXX: beware that fmr never checks for NaNs nor update FPSCR */
 static void gen_fmr(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_mov_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
+    t0 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    set_fpr(rD(ctx->opcode), t0);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
 }
 
 /* fnabs */
 /* XXX: beware that fnabs never checks for NaNs nor update FPSCR */
 static void gen_fnabs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_ori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                    1ULL << 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_ori_i64(t1, t0, 1ULL << 63);
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fneg */
 /* XXX: beware that fneg never checks for NaNs nor update FPSCR */
 static void gen_fneg(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_xori_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rB(ctx->opcode)],
-                     1ULL << 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_xori_i64(t1, t0, 1ULL << 63);
+    set_fpr(rD(ctx->opcode), t1);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* fcpsgn: PowerPC 2.05 specification */
 /* XXX: beware that fcpsgn never checks for NaNs nor update FPSCR */
 static void gen_fcpsgn(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
+    TCGv_i64 t2;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                        cpu_fpr[rB(ctx->opcode)], 0, 63);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    get_fpr(t0, rA(ctx->opcode));
+    get_fpr(t1, rB(ctx->opcode));
+    tcg_gen_deposit_i64(t2, t0, t1, 0, 63);
+    set_fpr(rD(ctx->opcode), t2);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
 }
 
 static void gen_fmrgew(DisasContext *ctx)
 {
     TCGv_i64 b0;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     b0 = tcg_temp_new_i64();
-    tcg_gen_shri_i64(b0, cpu_fpr[rB(ctx->opcode)], 32);
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpr[rA(ctx->opcode)],
-                        b0, 0, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    tcg_gen_shri_i64(b0, t0, 32);
+    get_fpr(t0, rA(ctx->opcode));
+    tcg_gen_deposit_i64(t1, t0, b0, 0, 32);
+    set_fpr(rD(ctx->opcode), t1);
     tcg_temp_free_i64(b0);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_fmrgow(DisasContext *ctx)
 {
+    TCGv_i64 t0;
+    TCGv_i64 t1;
+    TCGv_i64 t2;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
-    tcg_gen_deposit_i64(cpu_fpr[rD(ctx->opcode)],
-                        cpu_fpr[rB(ctx->opcode)],
-                        cpu_fpr[rA(ctx->opcode)],
-                        32, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+    get_fpr(t0, rB(ctx->opcode));
+    get_fpr(t1, rA(ctx->opcode));
+    tcg_gen_deposit_i64(t2, t0, t1, 32, 32);
+    set_fpr(rD(ctx->opcode), t2);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
 }
 
 /***                  Floating-Point status & ctrl register                ***/
@@ -458,15 +604,19 @@ static void gen_mcrfs(DisasContext *ctx)
 /* mffs */
 static void gen_mffs(DisasContext *ctx)
 {
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_reset_fpstatus();
-    tcg_gen_extu_tl_i64(cpu_fpr[rD(ctx->opcode)], cpu_fpscr);
+    tcg_gen_extu_tl_i64(t0, cpu_fpscr);
+    set_fpr(rD(ctx->opcode), t0);
     if (unlikely(Rc(ctx->opcode))) {
         gen_set_cr1_from_fpscr(ctx);
     }
+    tcg_temp_free_i64(t0);
 }
 
 /* mtfsb0 */
@@ -522,6 +672,7 @@ static void gen_mtfsb1(DisasContext *ctx)
 static void gen_mtfsf(DisasContext *ctx)
 {
     TCGv_i32 t0;
+    TCGv_i64 t1;
     int flm, l, w;
 
     if (unlikely(!ctx->fpu_enabled)) {
@@ -541,7 +692,9 @@ static void gen_mtfsf(DisasContext *ctx)
     } else {
         t0 = tcg_const_i32(flm << (w * 8));
     }
-    gen_helper_store_fpscr(cpu_env, cpu_fpr[rB(ctx->opcode)], t0);
+    t1 = tcg_temp_new_i64();
+    get_fpr(t1, rB(ctx->opcode));
+    gen_helper_store_fpscr(cpu_env, t1, t0);
     tcg_temp_free_i32(t0);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         tcg_gen_trunc_tl_i32(cpu_crf[1], cpu_fpscr);
@@ -549,6 +702,7 @@ static void gen_mtfsf(DisasContext *ctx)
     }
     /* We can raise a differed exception */
     gen_helper_float_check_status(cpu_env);
+    tcg_temp_free_i64(t1);
 }
 
 /* mtfsfi */
@@ -588,21 +742,26 @@ static void gen_mtfsfi(DisasContext *ctx)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDUF(name, ldop, opc, type)                                       \
 static void glue(gen_, name##u)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -613,20 +772,25 @@ static void glue(gen_, name##u)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDUXF(name, ldop, opc, type)                                      \
 static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
+    t0 = tcg_temp_new_i64();                                                  \
     if (unlikely(rA(ctx->opcode) == 0)) {                                     \
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);                   \
         return;                                                               \
@@ -634,24 +798,30 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDXF(name, ldop, opc2, opc3, type)                                \
 static void glue(gen_, name##x)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##ldop(ctx, cpu_fpr[rD(ctx->opcode)], EA);                       \
+    gen_qemu_##ldop(ctx, t0, EA);                                             \
+    set_fpr(rD(ctx->opcode), t0);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_LDFS(name, ldop, op, type)                                        \
@@ -677,6 +847,7 @@ GEN_LDFS(lfs, ld32fs, 0x10, PPC_FLOAT);
 static void gen_lfdepx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     CHK_SV;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
@@ -684,16 +855,19 @@ static void gen_lfdepx(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_qemu_ld_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_LOAD,
-        DEF_MEMOP(MO_Q));
+    tcg_gen_qemu_ld_i64(t0, EA, PPC_TLB_EPID_LOAD, DEF_MEMOP(MO_Q));
+    set_fpr(rD(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfdp */
 static void gen_lfdp(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -701,24 +875,31 @@ static void gen_lfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);
+    t0 = tcg_temp_new_i64();
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfdpx */
 static void gen_lfdpx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -726,18 +907,24 @@ static void gen_lfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
+    t0 = tcg_temp_new_i64();
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode), t0);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, t0, EA);
+        set_fpr(rD(ctx->opcode) + 1, t0);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* lfiwax */
@@ -745,6 +932,7 @@ static void gen_lfiwax(DisasContext *ctx)
 {
     TCGv EA;
     TCGv t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
@@ -752,47 +940,59 @@ static void gen_lfiwax(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
     gen_qemu_ld32s(ctx, t0, EA);
-    tcg_gen_ext_tl_i64(cpu_fpr[rD(ctx->opcode)], t0);
+    tcg_gen_ext_tl_i64(t1, t0);
+    set_fpr(rD(ctx->opcode), t1);
     tcg_temp_free(EA);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* lfiwzx */
 static void gen_lfiwzx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld32u_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+    gen_qemu_ld32u_i64(ctx, t0, EA);
+    set_fpr(rD(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 /***                         Floating-point store                          ***/
 #define GEN_STF(name, stop, opc, type)                                        \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STUF(name, stop, opc, type)                                       \
 static void glue(gen_, name##u)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -803,16 +1003,20 @@ static void glue(gen_, name##u)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STUXF(name, stop, opc, type)                                      \
 static void glue(gen_, name##ux)(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
@@ -823,25 +1027,32 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STXF(name, stop, opc2, opc3, type)                                \
 static void glue(gen_, name##x)(DisasContext *ctx)                                    \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 t0;                                                              \
     if (unlikely(!ctx->fpu_enabled)) {                                        \
         gen_exception(ctx, POWERPC_EXCP_FPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_FLOAT);                                   \
     EA = tcg_temp_new();                                                      \
+    t0 = tcg_temp_new_i64();                                                  \
     gen_addr_reg_index(ctx, EA);                                              \
-    gen_qemu_##stop(ctx, cpu_fpr[rS(ctx->opcode)], EA);                       \
+    get_fpr(t0, rS(ctx->opcode));                                             \
+    gen_qemu_##stop(ctx, t0, EA);                                             \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(t0);                                                    \
 }
 
 #define GEN_STFS(name, stop, op, type)                                        \
@@ -867,6 +1078,7 @@ GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT);
 static void gen_stfdepx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     CHK_SV;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
@@ -874,60 +1086,76 @@ static void gen_stfdepx(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
-    tcg_gen_qemu_st_i64(cpu_fpr[rD(ctx->opcode)], EA, PPC_TLB_EPID_STORE,
-                       DEF_MEMOP(MO_Q));
+    get_fpr(t0, rD(ctx->opcode));
+    tcg_gen_qemu_st_i64(t0, EA, PPC_TLB_EPID_STORE, DEF_MEMOP(MO_Q));
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* stfdp */
 static void gen_stfdp(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, EA, 0);
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
     } else {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* stfdpx */
 static void gen_stfdpx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->fpu_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_FPU);
         return;
     }
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
+    t0 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, EA);
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does
        necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
     } else {
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        get_fpr(t0, rD(ctx->opcode));
+        gen_qemu_st64_i64(ctx, t0, EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        get_fpr(t0, rD(ctx->opcode) + 1);
+        gen_qemu_st64_i64(ctx, t0, EA);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 /* Optional: */
@@ -949,13 +1177,18 @@ static void gen_lfq(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr(rd, t1);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr((rd + 1) % 32, t1);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* lfqu */
@@ -964,17 +1197,22 @@ static void gen_lfqu(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     t1 = tcg_temp_new();
+    t2 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t2, t0);
+    set_fpr(rd, t2);
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, t2, t1);
+    set_fpr((rd + 1) % 32, t2);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
+    tcg_temp_free_i64(t2);
 }
 
 /* lfqux */
@@ -984,16 +1222,21 @@ static void gen_lfqux(DisasContext *ctx)
     int rd = rD(ctx->opcode);
     gen_set_access_type(ctx, ACCESS_FLOAT);
     TCGv t0, t1;
+    TCGv_i64 t2;
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t2, t0);
+    set_fpr(rd, t2);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, t2, t1);
+    set_fpr((rd + 1) % 32, t2);
     tcg_temp_free(t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* lfqx */
@@ -1001,13 +1244,18 @@ static void gen_lfqx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr(rd, t1);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, t1, t0);
+    set_fpr((rd + 1) % 32, t1);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* stfq */
@@ -1015,13 +1263,18 @@ static void gen_stfq(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
+    t1 = tcg_temp_new_i64();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t1, rd);
+    gen_qemu_st64_i64(ctx, t1, t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    get_fpr(t1, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t1, t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 /* stfqu */
@@ -1030,17 +1283,23 @@ static void gen_stfqu(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t2, rd);
+    gen_qemu_st64_i64(ctx, t2, t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    get_fpr(t2, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t2, t1);
     tcg_temp_free(t1);
-    if (ra != 0)
+    if (ra != 0) {
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
+    }
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* stfqux */
@@ -1049,17 +1308,23 @@ static void gen_stfqux(DisasContext *ctx)
     int ra = rA(ctx->opcode);
     int rd = rD(ctx->opcode);
     TCGv t0, t1;
+    TCGv_i64 t2;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t2 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t2, rd);
+    gen_qemu_st64_i64(ctx, t2, t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    get_fpr(t2, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t2, t1);
     tcg_temp_free(t1);
-    if (ra != 0)
+    if (ra != 0) {
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
+    }
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t2);
 }
 
 /* stfqx */
@@ -1067,13 +1332,18 @@ static void gen_stfqx(DisasContext *ctx)
 {
     int rd = rD(ctx->opcode);
     TCGv t0;
+    TCGv_i64 t1;
     gen_set_access_type(ctx, ACCESS_FLOAT);
+    t1 = tcg_temp_new_i64();
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
+    get_fpr(t1, rd);
+    gen_qemu_st64_i64(ctx, t1, t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    get_fpr(t1, (rd + 1) % 32);
+    gen_qemu_st64_i64(ctx, t1, t0);
     tcg_temp_free(t0);
+    tcg_temp_free_i64(t1);
 }
 
 #undef _GEN_FLOAT_ACB
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 2/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX register access
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
@ 2018-12-17 12:23 ` Mark Cave-Ayland
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:23 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

These helpers allow us to move AVR register values to/from the specified TCGv_i64
argument.

To prevent VMX helpers accessing the cpu_avr{l,h} arrays directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c              |  10 +++
 target/ppc/translate/vmx-impl.inc.c | 130 ++++++++++++++++++++++++++++--------
 2 files changed, 111 insertions(+), 29 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 1d4bf624a3..fa3e8dc114 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6704,6 +6704,16 @@ static inline void set_fpr(int regno, TCGv_i64 src)
     tcg_gen_mov_i64(cpu_fpr[regno], src);
 }
 
+static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
+{
+    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+}
+
+static inline void set_avr64(int regno, TCGv_i64 src, bool high)
+{
+    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+}
+
 #include "translate/fp-impl.inc.c"
 
 #include "translate/vmx-impl.inc.c"
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 3cb6fc2926..30046c6e31 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -18,52 +18,66 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 avr;                                                             \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
+    avr = tcg_temp_new_i64();                                                 \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, false);                               \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, true);                                \
     } else {                                                                  \
-        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, true);                                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
+        set_avr64(rD(ctx->opcode), avr, false);                               \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(avr);                                                   \
 }
 
 #define GEN_VR_STX(name, opc2, opc3)                                          \
 static void gen_st##name(DisasContext *ctx)                                   \
 {                                                                             \
     TCGv EA;                                                                  \
+    TCGv_i64 avr;                                                             \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
+    avr = tcg_temp_new_i64();                                                 \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_st64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), false);                               \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), true);                                \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
     } else {                                                                  \
-        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), true);                                \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
+        get_avr64(avr, rD(ctx->opcode), false);                               \
+        gen_qemu_st64_i64(ctx, avr, EA);                                      \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
+    tcg_temp_free_i64(avr);                                                   \
 }
 
 #define GEN_VR_LVE(name, opc2, opc3, size)                              \
@@ -159,15 +173,20 @@ static void gen_lvsr(DisasContext *ctx)
 static void gen_mfvscr(DisasContext *ctx)
 {
     TCGv_i32 t;
+    TCGv_i64 avr;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
         return;
     }
-    tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);
+    avr = tcg_temp_new_i64();
+    tcg_gen_movi_i64(avr, 0);
+    set_avr64(rD(ctx->opcode), avr, true);
     t = tcg_temp_new_i32();
     tcg_gen_ld_i32(t, cpu_env, offsetof(CPUPPCState, vscr));
-    tcg_gen_extu_i32_i64(cpu_avrl[rD(ctx->opcode)], t);
+    tcg_gen_extu_i32_i64(avr, t);
+    set_avr64(rD(ctx->opcode), avr, false);
     tcg_temp_free_i32(t);
+    tcg_temp_free_i64(avr);
 }
 
 static void gen_mtvscr(DisasContext *ctx)
@@ -188,6 +207,7 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     TCGv_i64 t0 = tcg_temp_new_i64();                                   \
     TCGv_i64 t1 = tcg_temp_new_i64();                                   \
     TCGv_i64 t2 = tcg_temp_new_i64();                                   \
+    TCGv_i64 avr = tcg_temp_new_i64();                                  \
     TCGv_i64 ten, z;                                                    \
                                                                         \
     if (unlikely(!ctx->altivec_enabled)) {                              \
@@ -199,26 +219,35 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     z = tcg_const_i64(0);                                               \
                                                                         \
     if (add_cin) {                                                      \
-        tcg_gen_mulu2_i64(t0, t1, cpu_avrl[rA(ctx->opcode)], ten);      \
-        tcg_gen_andi_i64(t2, cpu_avrl[rB(ctx->opcode)], 0xF);           \
-        tcg_gen_add2_i64(cpu_avrl[rD(ctx->opcode)], t2, t0, t1, t2, z); \
+        get_avr64(avr, rA(ctx->opcode), false);                         \
+        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
+        get_avr64(avr, rB(ctx->opcode), false);                         \
+        tcg_gen_andi_i64(t2, avr, 0xF);                                 \
+        tcg_gen_add2_i64(avr, t2, t0, t1, t2, z);                       \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
     } else {                                                            \
-        tcg_gen_mulu2_i64(cpu_avrl[rD(ctx->opcode)], t2,                \
-                          cpu_avrl[rA(ctx->opcode)], ten);              \
+        get_avr64(avr, rA(ctx->opcode), false);                         \
+        tcg_gen_mulu2_i64(avr, t2, avr, ten);                           \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
     }                                                                   \
                                                                         \
     if (ret_carry) {                                                    \
-        tcg_gen_mulu2_i64(t0, t1, cpu_avrh[rA(ctx->opcode)], ten);      \
-        tcg_gen_add2_i64(t0, cpu_avrl[rD(ctx->opcode)], t0, t1, t2, z); \
-        tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);                 \
+        get_avr64(avr, rA(ctx->opcode), true);                          \
+        tcg_gen_mulu2_i64(t0, t1, avr, ten);                            \
+        tcg_gen_add2_i64(t0, avr, t0, t1, t2, z);                       \
+        set_avr64(rD(ctx->opcode), avr, false);                         \
+        set_avr64(rD(ctx->opcode), z, true);                            \
     } else {                                                            \
-        tcg_gen_mul_i64(t0, cpu_avrh[rA(ctx->opcode)], ten);            \
-        tcg_gen_add_i64(cpu_avrh[rD(ctx->opcode)], t0, t2);             \
+        get_avr64(avr, rA(ctx->opcode), true);                          \
+        tcg_gen_mul_i64(t0, avr, ten);                                  \
+        tcg_gen_add_i64(avr, t0, t2);                                   \
+        set_avr64(rD(ctx->opcode), avr, true);                          \
     }                                                                   \
                                                                         \
     tcg_temp_free_i64(t0);                                              \
     tcg_temp_free_i64(t1);                                              \
     tcg_temp_free_i64(t2);                                              \
+    tcg_temp_free_i64(avr);                                             \
     tcg_temp_free_i64(ten);                                             \
     tcg_temp_free_i64(z);                                               \
 }                                                                       \
@@ -232,12 +261,27 @@ GEN_VX_VMUL10(vmul10ecuq, 1, 1);
 #define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
 {                                                                       \
+    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
+    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
+    TCGv_i64 avr = tcg_temp_new_i64();                                  \
+                                                                        \
     if (unlikely(!ctx->altivec_enabled)) {                              \
         gen_exception(ctx, POWERPC_EXCP_VPU);                           \
         return;                                                         \
     }                                                                   \
-    tcg_op(cpu_avrh[rD(ctx->opcode)], cpu_avrh[rA(ctx->opcode)], cpu_avrh[rB(ctx->opcode)]); \
-    tcg_op(cpu_avrl[rD(ctx->opcode)], cpu_avrl[rA(ctx->opcode)], cpu_avrl[rB(ctx->opcode)]); \
+    get_avr64(t0, rA(ctx->opcode), true);                               \
+    get_avr64(t1, rB(ctx->opcode), true);                               \
+    tcg_op(avr, t0, t1);                                                \
+    set_avr64(rD(ctx->opcode), avr, true);                              \
+                                                                        \
+    get_avr64(t0, rA(ctx->opcode), false);                              \
+    get_avr64(t1, rB(ctx->opcode), false);                              \
+    tcg_op(avr, t0, t1);                                                \
+    set_avr64(rD(ctx->opcode), avr, false);                             \
+                                                                        \
+    tcg_temp_free_i64(t0);                                              \
+    tcg_temp_free_i64(t1);                                              \
+    tcg_temp_free_i64(avr);                                             \
 }
 
 GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
@@ -406,6 +450,7 @@ GEN_VXFORM(vmrglw, 6, 6);
 static void gen_vmrgew(DisasContext *ctx)
 {
     TCGv_i64 tmp;
+    TCGv_i64 avr;
     int VT, VA, VB;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -415,15 +460,28 @@ static void gen_vmrgew(DisasContext *ctx)
     VA = rA(ctx->opcode);
     VB = rB(ctx->opcode);
     tmp = tcg_temp_new_i64();
-    tcg_gen_shri_i64(tmp, cpu_avrh[VB], 32);
-    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VA], tmp, 0, 32);
-    tcg_gen_shri_i64(tmp, cpu_avrl[VB], 32);
-    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VA], tmp, 0, 32);
+    avr = tcg_temp_new_i64();
+
+    get_avr64(avr, VB, true);
+    tcg_gen_shri_i64(tmp, avr, 32);
+    get_avr64(avr, VA, true);
+    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
+    set_avr64(VT, avr, true);
+
+    get_avr64(avr, VB, false);
+    tcg_gen_shri_i64(tmp, avr, 32);
+    get_avr64(avr, VA, false);
+    tcg_gen_deposit_i64(avr, avr, tmp, 0, 32);
+    set_avr64(VT, avr, false);
+
     tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(avr);
 }
 
 static void gen_vmrgow(DisasContext *ctx)
 {
+    TCGv_i64 t0, t1;
+    TCGv_i64 avr;
     int VT, VA, VB;
     if (unlikely(!ctx->altivec_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -432,9 +490,23 @@ static void gen_vmrgow(DisasContext *ctx)
     VT = rD(ctx->opcode);
     VA = rA(ctx->opcode);
     VB = rB(ctx->opcode);
-
-    tcg_gen_deposit_i64(cpu_avrh[VT], cpu_avrh[VB], cpu_avrh[VA], 32, 32);
-    tcg_gen_deposit_i64(cpu_avrl[VT], cpu_avrl[VB], cpu_avrl[VA], 32, 32);
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    avr = tcg_temp_new_i64();
+
+    get_avr64(t0, VB, true);
+    get_avr64(t1, VA, true);
+    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
+    set_avr64(VT, avr, true);
+
+    get_avr64(t0, VB, false);
+    get_avr64(t1, VA, false);
+    tcg_gen_deposit_i64(avr, t0, t1, 32, 32);
+    set_avr64(VT, avr, false);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(avr);
 }
 
 GEN_VXFORM(vmuloub, 4, 0);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 2/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Mark Cave-Ayland
@ 2018-12-17 12:23 ` Mark Cave-Ayland
  2018-12-17 16:43   ` Richard Henderson
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction Mark Cave-Ayland
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:23 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

These helpers allow us to move VSR register values to/from the specified TCGv_i64
argument.

To prevent VSX helpers accessing the cpu_vsr array directly, add extra TCG
temporaries as required.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate/vsx-impl.inc.c | 782 ++++++++++++++++++++++++++----------
 1 file changed, 561 insertions(+), 221 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 85ed135d44..e9a05d66f7 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1,20 +1,48 @@
 /***                           VSX extension                               ***/
 
-static inline TCGv_i64 cpu_vsrh(int n)
+static inline void get_vsr(TCGv_i64 dst, int n)
+{
+    tcg_gen_mov_i64(dst, cpu_vsr[n]);
+}
+
+static inline void set_vsr(int n, TCGv_i64 src)
+{
+    tcg_gen_mov_i64(cpu_vsr[n], src);
+}
+
+static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
+{
+    if (n < 32) {
+        get_fpr(dst, n);
+    } else {
+        get_avr64(dst, n - 32, true);
+    }
+}
+
+static inline void get_cpu_vsrl(TCGv_i64 dst, int n)
 {
     if (n < 32) {
-        return cpu_fpr[n];
+        get_vsr(dst, n);
     } else {
-        return cpu_avrh[n-32];
+        get_avr64(dst, n - 32, false);
     }
 }
 
-static inline TCGv_i64 cpu_vsrl(int n)
+static inline void set_cpu_vsrh(int n, TCGv_i64 src)
 {
     if (n < 32) {
-        return cpu_vsr[n];
+        set_fpr(n, src);
     } else {
-        return cpu_avrl[n-32];
+        set_avr64(n - 32, src, true);
+    }
+}
+
+static inline void set_cpu_vsrl(int n, TCGv_i64 src)
+{
+    if (n < 32) {
+        set_vsr(n, src);
+    } else {
+        set_avr64(n - 32, src, false);
     }
 }
 
@@ -22,16 +50,20 @@ static inline TCGv_i64 cpu_vsrl(int n)
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
     TCGv EA;                                                  \
+    TCGv_i64 t0;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
+    t0 = tcg_temp_new_i64();                                  \
     gen_set_access_type(ctx, ACCESS_INT);                     \
     EA = tcg_temp_new();                                      \
     gen_addr_reg_index(ctx, EA);                              \
-    gen_qemu_##operation(ctx, cpu_vsrh(xT(ctx->opcode)), EA); \
+    gen_qemu_##operation(ctx, t0, EA);                        \
+    set_cpu_vsrh(xT(ctx->opcode), t0);                        \
     /* NOTE: cpu_vsrl is undefined */                         \
     tcg_temp_free(EA);                                        \
+    tcg_temp_free_i64(t0);                                    \
 }
 
 VSX_LOAD_SCALAR(lxsdx, ld64_i64)
@@ -44,39 +76,54 @@ VSX_LOAD_SCALAR(lxsspx, ld32fs)
 static void gen_lxvd2x(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_ld64_i64(ctx, cpu_vsrl(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrl(xT(ctx->opcode), t0);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_lxvdsx(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0;
+    TCGv_i64 t1;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
+    gen_qemu_ld64_i64(ctx, t0, EA);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_gen_mov_i64(t1, t0);
+    set_cpu_vsrl(xT(ctx->opcode), t1);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void gen_lxvw4x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrh(xtl, xT(ctx->opcode));
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -104,6 +151,8 @@ static void gen_lxvw4x(DisasContext *ctx)
         tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl,
@@ -151,8 +200,10 @@ static void gen_bswap32x4(TCGv_i64 outh, TCGv_i64 outl,
 static void gen_lxvh8x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrh(xtl, xT(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -169,13 +220,17 @@ static void gen_lxvh8x(DisasContext *ctx)
         gen_bswap16x8(xth, xtl, xth, xtl);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_lxvb16x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+    get_cpu_vsrh(xth, xT(ctx->opcode));
+    get_cpu_vsrh(xtl, xT(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -188,6 +243,8 @@ static void gen_lxvb16x(DisasContext *ctx)
     tcg_gen_addi_tl(EA, EA, 8);
     tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 #define VSX_VECTOR_LOAD_STORE(name, op, indexed)            \
@@ -195,15 +252,16 @@ static void gen_##name(DisasContext *ctx)                   \
 {                                                           \
     int xt;                                                 \
     TCGv EA;                                                \
-    TCGv_i64 xth, xtl;                                      \
+    TCGv_i64 xth = tcg_temp_new_i64();                      \
+    TCGv_i64 xtl = tcg_temp_new_i64();                      \
                                                             \
     if (indexed) {                                          \
         xt = xT(ctx->opcode);                               \
     } else {                                                \
         xt = DQxT(ctx->opcode);                             \
     }                                                       \
-    xth = cpu_vsrh(xt);                                     \
-    xtl = cpu_vsrl(xt);                                     \
+    get_cpu_vsrh(xth, xt);                                  \
+    get_cpu_vsrl(xtl, xt);                                  \
                                                             \
     if (xt < 32) {                                          \
         if (unlikely(!ctx->vsx_enabled)) {                  \
@@ -225,14 +283,20 @@ static void gen_##name(DisasContext *ctx)                   \
     }                                                       \
     if (ctx->le_mode) {                                     \
         tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_LEQ);   \
+        set_cpu_vsrl(xt, xtl);                              \
         tcg_gen_addi_tl(EA, EA, 8);                         \
         tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_LEQ);   \
+        set_cpu_vsrh(xt, xth);                              \
     } else {                                                \
         tcg_gen_qemu_##op(xth, EA, ctx->mem_idx, MO_BEQ);   \
+        set_cpu_vsrh(xt, xth);                              \
         tcg_gen_addi_tl(EA, EA, 8);                         \
         tcg_gen_qemu_##op(xtl, EA, ctx->mem_idx, MO_BEQ);   \
+        set_cpu_vsrl(xt, xtl);                              \
     }                                                       \
     tcg_temp_free(EA);                                      \
+    tcg_temp_free_i64(xth);                                 \
+    tcg_temp_free_i64(xtl);                                 \
 }
 
 VSX_VECTOR_LOAD_STORE(lxv, ld_i64, 0)
@@ -276,7 +340,8 @@ VSX_VECTOR_LOAD_STORE_LENGTH(stxvll)
 static void gen_##name(DisasContext *ctx)                         \
 {                                                                 \
     TCGv EA;                                                      \
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
+    TCGv_i64 xth = tcg_temp_new_i64();                            \
+    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
                                                                   \
     if (unlikely(!ctx->altivec_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VPU);                     \
@@ -286,8 +351,10 @@ static void gen_##name(DisasContext *ctx)                         \
     EA = tcg_temp_new();                                          \
     gen_addr_imm_index(ctx, EA, 0x03);                            \
     gen_qemu_##operation(ctx, xth, EA);                           \
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);                      \
     /* NOTE: cpu_vsrl is undefined */                             \
     tcg_temp_free(EA);                                            \
+    tcg_temp_free_i64(xth);                                       \
 }
 
 VSX_LOAD_SCALAR_DS(lxsd, ld64_i64)
@@ -297,15 +364,19 @@ VSX_LOAD_SCALAR_DS(lxssp, ld32fs)
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
     TCGv EA;                                                  \
+    TCGv_i64 t0;                                              \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
+    t0 = tcg_temp_new_i64();                                  \
     gen_set_access_type(ctx, ACCESS_INT);                     \
     EA = tcg_temp_new();                                      \
     gen_addr_reg_index(ctx, EA);                              \
-    gen_qemu_##operation(ctx, cpu_vsrh(xS(ctx->opcode)), EA); \
+    gen_qemu_##operation(ctx, t0, EA);                        \
+    set_cpu_vsrh(xS(ctx->opcode), t0);                        \
     tcg_temp_free(EA);                                        \
+    tcg_temp_free_i64(t0);                                    \
 }
 
 VSX_STORE_SCALAR(stxsdx, st64_i64)
@@ -318,6 +389,7 @@ VSX_STORE_SCALAR(stxsspx, st32fs)
 static void gen_stxvd2x(DisasContext *ctx)
 {
     TCGv EA;
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -325,17 +397,23 @@ static void gen_stxvd2x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_st64_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
+    get_cpu_vsrh(t0, xS(ctx->opcode));
+    gen_qemu_st64_i64(ctx, t0, EA);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_st64_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
+    get_cpu_vsrl(t0, xS(ctx->opcode));
+    gen_qemu_st64_i64(ctx, t0, EA);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_stxvw4x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    TCGv_i64 xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -362,13 +440,17 @@ static void gen_stxvw4x(DisasContext *ctx)
         tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 static void gen_stxvh8x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    TCGv_i64 xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -393,13 +475,17 @@ static void gen_stxvh8x(DisasContext *ctx)
         tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 static void gen_stxvb16x(DisasContext *ctx)
 {
-    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
-    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    TCGv_i64 xsl = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    get_cpu_vsrl(xsl, xS(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -412,13 +498,16 @@ static void gen_stxvb16x(DisasContext *ctx)
     tcg_gen_addi_tl(EA, EA, 8);
     tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
     tcg_temp_free(EA);
+    tcg_temp_free_i64(xsh);
+    tcg_temp_free_i64(xsl);
 }
 
 #define VSX_STORE_SCALAR_DS(name, operation)                      \
 static void gen_##name(DisasContext *ctx)                         \
 {                                                                 \
     TCGv EA;                                                      \
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);                \
+    TCGv_i64 xth = tcg_temp_new_i64();                            \
+    get_cpu_vsrh(xth, rD(ctx->opcode) + 32);                      \
                                                                   \
     if (unlikely(!ctx->altivec_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VPU);                     \
@@ -430,62 +519,119 @@ static void gen_##name(DisasContext *ctx)                         \
     gen_qemu_##operation(ctx, xth, EA);                           \
     /* NOTE: cpu_vsrl is undefined */                             \
     tcg_temp_free(EA);                                            \
+    tcg_temp_free_i64(xth);                                       \
 }
 
 VSX_LOAD_SCALAR_DS(stxsd, st64_i64)
 VSX_LOAD_SCALAR_DS(stxssp, st32fs)
 
-#define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
-static void gen_##name(DisasContext *ctx)                       \
-{                                                               \
-    if (xS(ctx->opcode) < 32) {                                 \
-        if (unlikely(!ctx->fpu_enabled)) {                      \
-            gen_exception(ctx, POWERPC_EXCP_FPU);               \
-            return;                                             \
-        }                                                       \
-    } else {                                                    \
-        if (unlikely(!ctx->altivec_enabled)) {                  \
-            gen_exception(ctx, POWERPC_EXCP_VPU);               \
-            return;                                             \
-        }                                                       \
-    }                                                           \
-    TCGv_i64 tmp = tcg_temp_new_i64();                          \
-    tcg_gen_##tcgop1(tmp, source);                              \
-    tcg_gen_##tcgop2(target, tmp);                              \
-    tcg_temp_free_i64(tmp);                                     \
+static void gen_mfvsrwz(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    get_cpu_vsrh(xsh, xS(ctx->opcode));
+    tcg_gen_ext32u_i64(tmp, xsh);
+    tcg_gen_trunc_i64_tl(cpu_gpr[rA(ctx->opcode)], tmp);
+    tcg_temp_free_i64(tmp);
 }
 
+static void gen_mtvsrwa(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_ext32s_i64(xsh, tmp);
+    set_cpu_vsrh(xT(ctx->opcode), xsh);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
+}
 
-MV_VSRW(mfvsrwz, ext32u_i64, trunc_i64_tl, cpu_gpr[rA(ctx->opcode)], \
-        cpu_vsrh(xS(ctx->opcode)))
-MV_VSRW(mtvsrwa, extu_tl_i64, ext32s_i64, cpu_vsrh(xT(ctx->opcode)), \
-        cpu_gpr[rA(ctx->opcode)])
-MV_VSRW(mtvsrwz, extu_tl_i64, ext32u_i64, cpu_vsrh(xT(ctx->opcode)), \
-        cpu_gpr[rA(ctx->opcode)])
+static void gen_mtvsrwz(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    TCGv_i64 tmp = tcg_temp_new_i64();
+    TCGv_i64 xsh = tcg_temp_new_i64();
+    tcg_gen_extu_tl_i64(tmp, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_ext32u_i64(xsh, tmp);
+    set_cpu_vsrh(xT(ctx->opcode), xsh);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(xsh);
+}
 
 #if defined(TARGET_PPC64)
-#define MV_VSRD(name, target, source)                           \
-static void gen_##name(DisasContext *ctx)                       \
-{                                                               \
-    if (xS(ctx->opcode) < 32) {                                 \
-        if (unlikely(!ctx->fpu_enabled)) {                      \
-            gen_exception(ctx, POWERPC_EXCP_FPU);               \
-            return;                                             \
-        }                                                       \
-    } else {                                                    \
-        if (unlikely(!ctx->altivec_enabled)) {                  \
-            gen_exception(ctx, POWERPC_EXCP_VPU);               \
-            return;                                             \
-        }                                                       \
-    }                                                           \
-    tcg_gen_mov_i64(target, source);                            \
+static void gen_mfvsrd(DisasContext *ctx)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    get_cpu_vsrh(t0, xS(ctx->opcode));
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
-MV_VSRD(mfvsrd, cpu_gpr[rA(ctx->opcode)], cpu_vsrh(xS(ctx->opcode)))
-MV_VSRD(mtvsrd, cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)])
+static void gen_mtvsrd(DisasContext *ctx)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->fpu_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_FPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
+}
 
 static void gen_mfvsrld(DisasContext *ctx)
 {
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (xS(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -497,12 +643,14 @@ static void gen_mfvsrld(DisasContext *ctx)
             return;
         }
     }
-
-    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_vsrl(xS(ctx->opcode)));
+    get_cpu_vsrl(t0, xS(ctx->opcode));
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], t0);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_mtvsrdd(DisasContext *ctx)
 {
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (xT(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -516,16 +664,20 @@ static void gen_mtvsrdd(DisasContext *ctx)
     }
 
     if (!rA(ctx->opcode)) {
-        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);
+        tcg_gen_movi_i64(t0, 0);
     } else {
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)]);
+        tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
     }
+    set_cpu_vsrh(xT(ctx->opcode), t0);
 
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rB(ctx->opcode)]);
+    tcg_gen_mov_i64(t0, cpu_gpr[rB(ctx->opcode)]);
+    set_cpu_vsrl(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_mtvsrws(DisasContext *ctx)
 {
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (xT(ctx->opcode) < 32) {
         if (unlikely(!ctx->vsx_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -538,55 +690,60 @@ static void gen_mtvsrws(DisasContext *ctx)
         }
     }
 
-    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)],
+    tcg_gen_deposit_i64(t0, cpu_gpr[rA(ctx->opcode)],
                         cpu_gpr[rA(ctx->opcode)], 32, 32);
-    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xT(ctx->opcode)));
+    set_cpu_vsrl(xT(ctx->opcode), t0);
+    set_cpu_vsrh(xT(ctx->opcode), t0);
+    tcg_temp_free_i64(t0);
 }
 
 #endif
 
 static void gen_xxpermdi(DisasContext *ctx)
 {
+    TCGv_i64 xh, xl;
+
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
 
+    xh = tcg_temp_new_i64();
+    xl = tcg_temp_new_i64();
+
     if (unlikely((xT(ctx->opcode) == xA(ctx->opcode)) ||
                  (xT(ctx->opcode) == xB(ctx->opcode)))) {
-        TCGv_i64 xh, xl;
-
-        xh = tcg_temp_new_i64();
-        xl = tcg_temp_new_i64();
-
         if ((DM(ctx->opcode) & 2) == 0) {
-            tcg_gen_mov_i64(xh, cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xh, xA(ctx->opcode));
         } else {
-            tcg_gen_mov_i64(xh, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xh, xA(ctx->opcode));
         }
         if ((DM(ctx->opcode) & 1) == 0) {
-            tcg_gen_mov_i64(xl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xl, xB(ctx->opcode));
         } else {
-            tcg_gen_mov_i64(xl, cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(xl, xB(ctx->opcode));
         }
 
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xh);
-        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xl);
-
-        tcg_temp_free_i64(xh);
-        tcg_temp_free_i64(xl);
+        set_cpu_vsrh(xT(ctx->opcode), xh);
+        set_cpu_vsrl(xT(ctx->opcode), xl);
     } else {
         if ((DM(ctx->opcode) & 2) == 0) {
-            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xh, xA(ctx->opcode));
+            set_cpu_vsrh(xT(ctx->opcode), xh);
         } else {
-            tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xh, xA(ctx->opcode));
+            set_cpu_vsrh(xT(ctx->opcode), xh);
         }
         if ((DM(ctx->opcode) & 1) == 0) {
-            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xl, xB(ctx->opcode));
+            set_cpu_vsrl(xT(ctx->opcode), xl);
         } else {
-            tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(xl, xB(ctx->opcode));
+            set_cpu_vsrl(xT(ctx->opcode), xl);
         }
     }
+    tcg_temp_free_i64(xh);
+    tcg_temp_free_i64(xl);
 }
 
 #define OP_ABS 1
@@ -606,7 +763,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
         }                                                         \
         xb = tcg_temp_new_i64();                                  \
         sgm = tcg_temp_new_i64();                                 \
-        tcg_gen_mov_i64(xb, cpu_vsrh(xB(ctx->opcode)));           \
+        get_cpu_vsrh(xb, xB(ctx->opcode));                        \
         tcg_gen_movi_i64(sgm, sgn_mask);                          \
         switch (op) {                                             \
             case OP_ABS: {                                        \
@@ -623,7 +780,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
             }                                                     \
             case OP_CPSGN: {                                      \
                 TCGv_i64 xa = tcg_temp_new_i64();                 \
-                tcg_gen_mov_i64(xa, cpu_vsrh(xA(ctx->opcode)));   \
+                get_cpu_vsrh(xa, xA(ctx->opcode));                \
                 tcg_gen_and_i64(xa, xa, sgm);                     \
                 tcg_gen_andc_i64(xb, xb, sgm);                    \
                 tcg_gen_or_i64(xb, xb, xa);                       \
@@ -631,7 +788,7 @@ static void glue(gen_, name)(DisasContext * ctx)                  \
                 break;                                            \
             }                                                     \
         }                                                         \
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xb);           \
+        set_cpu_vsrh(xT(ctx->opcode), xb);                        \
         tcg_temp_free_i64(xb);                                    \
         tcg_temp_free_i64(sgm);                                   \
     }
@@ -647,7 +804,7 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     int xa;                                                       \
     int xt = rD(ctx->opcode) + 32;                                \
     int xb = rB(ctx->opcode) + 32;                                \
-    TCGv_i64 xah, xbh, xbl, sgm;                                  \
+    TCGv_i64 xah, xbh, xbl, sgm, tmp;                             \
                                                                   \
     if (unlikely(!ctx->vsx_enabled)) {                            \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                    \
@@ -656,8 +813,9 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     xbh = tcg_temp_new_i64();                                     \
     xbl = tcg_temp_new_i64();                                     \
     sgm = tcg_temp_new_i64();                                     \
-    tcg_gen_mov_i64(xbh, cpu_vsrh(xb));                           \
-    tcg_gen_mov_i64(xbl, cpu_vsrl(xb));                           \
+    tmp = tcg_temp_new_i64();                                     \
+    get_cpu_vsrh(xbh, xb);                                        \
+    get_cpu_vsrl(xbl, xb);                                        \
     tcg_gen_movi_i64(sgm, sgn_mask);                              \
     switch (op) {                                                 \
     case OP_ABS:                                                  \
@@ -672,17 +830,19 @@ static void glue(gen_, name)(DisasContext *ctx)                   \
     case OP_CPSGN:                                                \
         xah = tcg_temp_new_i64();                                 \
         xa = rA(ctx->opcode) + 32;                                \
-        tcg_gen_and_i64(xah, cpu_vsrh(xa), sgm);                  \
+        get_cpu_vsrh(tmp, xa);                                    \
+        tcg_gen_and_i64(xah, tmp, sgm);                           \
         tcg_gen_andc_i64(xbh, xbh, sgm);                          \
         tcg_gen_or_i64(xbh, xbh, xah);                            \
         tcg_temp_free_i64(xah);                                   \
         break;                                                    \
     }                                                             \
-    tcg_gen_mov_i64(cpu_vsrh(xt), xbh);                           \
-    tcg_gen_mov_i64(cpu_vsrl(xt), xbl);                           \
+    set_cpu_vsrh(xt, xbh);                                        \
+    set_cpu_vsrl(xt, xbl);                                        \
     tcg_temp_free_i64(xbl);                                       \
     tcg_temp_free_i64(xbh);                                       \
     tcg_temp_free_i64(sgm);                                       \
+    tcg_temp_free_i64(tmp);                                       \
 }
 
 VSX_SCALAR_MOVE_QP(xsabsqp, OP_ABS, SGN_MASK_DP)
@@ -701,8 +861,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
         xbh = tcg_temp_new_i64();                                \
         xbl = tcg_temp_new_i64();                                \
         sgm = tcg_temp_new_i64();                                \
-        tcg_gen_mov_i64(xbh, cpu_vsrh(xB(ctx->opcode)));         \
-        tcg_gen_mov_i64(xbl, cpu_vsrl(xB(ctx->opcode)));         \
+        set_cpu_vsrh(xB(ctx->opcode), xbh);                      \
+        set_cpu_vsrl(xB(ctx->opcode), xbl);                      \
         tcg_gen_movi_i64(sgm, sgn_mask);                         \
         switch (op) {                                            \
             case OP_ABS: {                                       \
@@ -723,8 +883,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
             case OP_CPSGN: {                                     \
                 TCGv_i64 xah = tcg_temp_new_i64();               \
                 TCGv_i64 xal = tcg_temp_new_i64();               \
-                tcg_gen_mov_i64(xah, cpu_vsrh(xA(ctx->opcode))); \
-                tcg_gen_mov_i64(xal, cpu_vsrl(xA(ctx->opcode))); \
+                get_cpu_vsrh(xah, xA(ctx->opcode));              \
+                get_cpu_vsrl(xal, xA(ctx->opcode));              \
                 tcg_gen_and_i64(xah, xah, sgm);                  \
                 tcg_gen_and_i64(xal, xal, sgm);                  \
                 tcg_gen_andc_i64(xbh, xbh, sgm);                 \
@@ -736,8 +896,8 @@ static void glue(gen_, name)(DisasContext * ctx)                 \
                 break;                                           \
             }                                                    \
         }                                                        \
-        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xbh);         \
-        tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xbl);         \
+        set_cpu_vsrh(xT(ctx->opcode), xbh);                      \
+        set_cpu_vsrl(xT(ctx->opcode), xbl);                      \
         tcg_temp_free_i64(xbh);                                  \
         tcg_temp_free_i64(xbl);                                  \
         tcg_temp_free_i64(sgm);                                  \
@@ -768,12 +928,17 @@ static void gen_##name(DisasContext * ctx)                                    \
 #define GEN_VSX_HELPER_XT_XB_ENV(name, op1, op2, inval, type) \
 static void gen_##name(DisasContext * ctx)                    \
 {                                                             \
+    TCGv_i64 t0 = tcg_temp_new_i64();                         \
+    TCGv_i64 t1 = tcg_temp_new_i64();                         \
     if (unlikely(!ctx->vsx_enabled)) {                        \
         gen_exception(ctx, POWERPC_EXCP_VSXU);                \
         return;                                               \
     }                                                         \
-    gen_helper_##name(cpu_vsrh(xT(ctx->opcode)), cpu_env,     \
-                      cpu_vsrh(xB(ctx->opcode)));             \
+    get_cpu_vsrh(t0, xB(ctx->opcode));                        \
+    gen_helper_##name(t1, cpu_env, t0);                       \
+    set_cpu_vsrh(xT(ctx->opcode), t1);                        \
+    tcg_temp_free_i64(t0);                                    \
+    tcg_temp_free_i64(t1);                                    \
 }
 
 GEN_VSX_HELPER_2(xsadddp, 0x00, 0x04, 0, PPC2_VSX)
@@ -949,10 +1114,13 @@ GEN_VSX_HELPER_2(xxpermr, 0x08, 0x07, 0, PPC2_ISA300)
 
 static void gen_xxbrd(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -960,28 +1128,49 @@ static void gen_xxbrd(DisasContext *ctx)
     }
     tcg_gen_bswap64_i64(xth, xbh);
     tcg_gen_bswap64_i64(xtl, xbl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrh(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     gen_bswap16x8(xth, xtl, xbh, xbl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrq(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     TCGv_i64 t0 = tcg_temp_new_i64();
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -990,35 +1179,65 @@ static void gen_xxbrq(DisasContext *ctx)
     }
     tcg_gen_bswap64_i64(t0, xbl);
     tcg_gen_bswap64_i64(xtl, xbh);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
     tcg_gen_mov_i64(xth, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xth);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xxbrw(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     gen_bswap32x4(xth, xtl, xbh, xbl);
+    set_cpu_vsrl(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 #define VSX_LOGICAL(name, tcg_op)                                    \
 static void glue(gen_, name)(DisasContext * ctx)                     \
     {                                                                \
+        TCGv_i64 t0;                                                 \
+        TCGv_i64 t1;                                                 \
+        TCGv_i64 t2;                                                 \
         if (unlikely(!ctx->vsx_enabled)) {                           \
             gen_exception(ctx, POWERPC_EXCP_VSXU);                   \
             return;                                                  \
         }                                                            \
-        tcg_op(cpu_vsrh(xT(ctx->opcode)), cpu_vsrh(xA(ctx->opcode)), \
-            cpu_vsrh(xB(ctx->opcode)));                              \
-        tcg_op(cpu_vsrl(xT(ctx->opcode)), cpu_vsrl(xA(ctx->opcode)), \
-            cpu_vsrl(xB(ctx->opcode)));                              \
+        t0 = tcg_temp_new_i64();                                     \
+        t1 = tcg_temp_new_i64();                                     \
+        t2 = tcg_temp_new_i64();                                     \
+        get_cpu_vsrh(t0, xA(ctx->opcode));                           \
+        get_cpu_vsrh(t1, xB(ctx->opcode));                           \
+        tcg_op(t2, t0, t1);                                          \
+        set_cpu_vsrh(xT(ctx->opcode), t2);                           \
+        get_cpu_vsrl(t0, xA(ctx->opcode));                           \
+        get_cpu_vsrl(t1, xB(ctx->opcode));                           \
+        tcg_op(t2, t0, t1);                                          \
+        set_cpu_vsrl(xT(ctx->opcode), t2);                           \
+        tcg_temp_free_i64(t0);                                       \
+        tcg_temp_free_i64(t1);                                       \
+        tcg_temp_free_i64(t2);                                       \
     }
 
 VSX_LOGICAL(xxland, tcg_gen_and_i64)
@@ -1033,7 +1252,7 @@ VSX_LOGICAL(xxlorc, tcg_gen_orc_i64)
 #define VSX_XXMRG(name, high)                               \
 static void glue(gen_, name)(DisasContext * ctx)            \
     {                                                       \
-        TCGv_i64 a0, a1, b0, b1;                            \
+        TCGv_i64 a0, a1, b0, b1, tmp;                       \
         if (unlikely(!ctx->vsx_enabled)) {                  \
             gen_exception(ctx, POWERPC_EXCP_VSXU);          \
             return;                                         \
@@ -1042,27 +1261,29 @@ static void glue(gen_, name)(DisasContext * ctx)            \
         a1 = tcg_temp_new_i64();                            \
         b0 = tcg_temp_new_i64();                            \
         b1 = tcg_temp_new_i64();                            \
+        tmp = tcg_temp_new_i64();                           \
         if (high) {                                         \
-            tcg_gen_mov_i64(a0, cpu_vsrh(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(a1, cpu_vsrh(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(b0, cpu_vsrh(xB(ctx->opcode))); \
-            tcg_gen_mov_i64(b1, cpu_vsrh(xB(ctx->opcode))); \
+            get_cpu_vsrh(a0, xA(ctx->opcode));              \
+            get_cpu_vsrh(a1, xA(ctx->opcode));              \
+            get_cpu_vsrh(b0, xB(ctx->opcode));              \
+            get_cpu_vsrh(b1, xB(ctx->opcode));              \
         } else {                                            \
-            tcg_gen_mov_i64(a0, cpu_vsrl(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(a1, cpu_vsrl(xA(ctx->opcode))); \
-            tcg_gen_mov_i64(b0, cpu_vsrl(xB(ctx->opcode))); \
-            tcg_gen_mov_i64(b1, cpu_vsrl(xB(ctx->opcode))); \
+            get_cpu_vsrl(a0, xA(ctx->opcode));              \
+            get_cpu_vsrl(a1, xA(ctx->opcode));              \
+            get_cpu_vsrl(b0, xB(ctx->opcode));              \
+            get_cpu_vsrl(b1, xB(ctx->opcode));              \
         }                                                   \
         tcg_gen_shri_i64(a0, a0, 32);                       \
         tcg_gen_shri_i64(b0, b0, 32);                       \
-        tcg_gen_deposit_i64(cpu_vsrh(xT(ctx->opcode)),      \
-                            b0, a0, 32, 32);                \
-        tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)),      \
-                            b1, a1, 32, 32);                \
+        tcg_gen_deposit_i64(tmp, b0, a0, 32, 32);           \
+        set_cpu_vsrh(xT(ctx->opcode), tmp);                 \
+        tcg_gen_deposit_i64(tmp, b1, a1, 32, 32);           \
+        set_cpu_vsrl(xT(ctx->opcode), tmp);                 \
         tcg_temp_free_i64(a0);                              \
         tcg_temp_free_i64(a1);                              \
         tcg_temp_free_i64(b0);                              \
         tcg_temp_free_i64(b1);                              \
+        tcg_temp_free_i64(tmp);                             \
     }
 
 VSX_XXMRG(xxmrghw, 1)
@@ -1070,7 +1291,7 @@ VSX_XXMRG(xxmrglw, 0)
 
 static void gen_xxsel(DisasContext * ctx)
 {
-    TCGv_i64 a, b, c;
+    TCGv_i64 a, b, c, tmp;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
@@ -1078,34 +1299,43 @@ static void gen_xxsel(DisasContext * ctx)
     a = tcg_temp_new_i64();
     b = tcg_temp_new_i64();
     c = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
 
-    tcg_gen_mov_i64(a, cpu_vsrh(xA(ctx->opcode)));
-    tcg_gen_mov_i64(b, cpu_vsrh(xB(ctx->opcode)));
-    tcg_gen_mov_i64(c, cpu_vsrh(xC(ctx->opcode)));
+    get_cpu_vsrh(a, xA(ctx->opcode));
+    get_cpu_vsrh(b, xB(ctx->opcode));
+    get_cpu_vsrh(c, xC(ctx->opcode));
 
     tcg_gen_and_i64(b, b, c);
     tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), a, b);
+    tcg_gen_or_i64(tmp, a, b);
+    set_cpu_vsrh(xT(ctx->opcode), tmp);
 
-    tcg_gen_mov_i64(a, cpu_vsrl(xA(ctx->opcode)));
-    tcg_gen_mov_i64(b, cpu_vsrl(xB(ctx->opcode)));
-    tcg_gen_mov_i64(c, cpu_vsrl(xC(ctx->opcode)));
+    get_cpu_vsrl(a, xA(ctx->opcode));
+    get_cpu_vsrl(b, xB(ctx->opcode));
+    get_cpu_vsrl(c, xC(ctx->opcode));
 
     tcg_gen_and_i64(b, b, c);
     tcg_gen_andc_i64(a, a, c);
-    tcg_gen_or_i64(cpu_vsrl(xT(ctx->opcode)), a, b);
+    tcg_gen_or_i64(tmp, a, b);
+    set_cpu_vsrl(xT(ctx->opcode), tmp);
 
     tcg_temp_free_i64(a);
     tcg_temp_free_i64(b);
     tcg_temp_free_i64(c);
+    tcg_temp_free_i64(tmp);
 }
 
 static void gen_xxspltw(DisasContext *ctx)
 {
     TCGv_i64 b, b2;
-    TCGv_i64 vsr = (UIM(ctx->opcode) & 2) ?
-                   cpu_vsrl(xB(ctx->opcode)) :
-                   cpu_vsrh(xB(ctx->opcode));
+    TCGv_i64 vsr;
+
+    vsr = tcg_temp_new_i64();
+    if (UIM(ctx->opcode) & 2) {
+        get_cpu_vsrl(vsr, xB(ctx->opcode));
+    } else {
+        get_cpu_vsrh(vsr, xB(ctx->opcode));
+    }
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1122,9 +1352,11 @@ static void gen_xxspltw(DisasContext *ctx)
     }
 
     tcg_gen_shli_i64(b2, b, 32);
-    tcg_gen_or_i64(cpu_vsrh(xT(ctx->opcode)), b, b2);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
+    tcg_gen_or_i64(vsr, b, b2);
+    set_cpu_vsrh(xT(ctx->opcode), vsr);
+    set_cpu_vsrl(xT(ctx->opcode), vsr);
 
+    tcg_temp_free_i64(vsr);
     tcg_temp_free_i64(b);
     tcg_temp_free_i64(b2);
 }
@@ -1134,6 +1366,7 @@ static void gen_xxspltw(DisasContext *ctx)
 static void gen_xxspltib(DisasContext *ctx)
 {
     unsigned char uim8 = IMM8(ctx->opcode);
+    TCGv_i64 vsr = tcg_temp_new_i64();
     if (xS(ctx->opcode) < 32) {
         if (unlikely(!ctx->altivec_enabled)) {
             gen_exception(ctx, POWERPC_EXCP_VPU);
@@ -1145,8 +1378,10 @@ static void gen_xxspltib(DisasContext *ctx)
             return;
         }
     }
-    tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), pattern(uim8));
-    tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), pattern(uim8));
+    tcg_gen_movi_i64(vsr, pattern(uim8));
+    set_cpu_vsrh(xT(ctx->opcode), vsr);
+    set_cpu_vsrl(xT(ctx->opcode), vsr);
+    tcg_temp_free_i64(vsr);
 }
 
 static void gen_xxsldwi(DisasContext *ctx)
@@ -1161,40 +1396,40 @@ static void gen_xxsldwi(DisasContext *ctx)
 
     switch (SHW(ctx->opcode)) {
         case 0: {
-            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
-            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrh(xth, xA(ctx->opcode));
+            get_cpu_vsrl(xtl, xA(ctx->opcode));
             break;
         }
         case 1: {
             TCGv_i64 t0 = tcg_temp_new_i64();
-            tcg_gen_mov_i64(xth, cpu_vsrh(xA(ctx->opcode)));
+            get_cpu_vsrh(xth, xA(ctx->opcode));
             tcg_gen_shli_i64(xth, xth, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(t0, xA(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xth, xth, t0);
-            tcg_gen_mov_i64(xtl, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xtl, xA(ctx->opcode));
             tcg_gen_shli_i64(xtl, xtl, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xtl, xtl, t0);
             tcg_temp_free_i64(t0);
             break;
         }
         case 2: {
-            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
-            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrl(xth, xA(ctx->opcode));
+            get_cpu_vsrh(xtl, xB(ctx->opcode));
             break;
         }
         case 3: {
             TCGv_i64 t0 = tcg_temp_new_i64();
-            tcg_gen_mov_i64(xth, cpu_vsrl(xA(ctx->opcode)));
+            get_cpu_vsrl(xth, xA(ctx->opcode));
             tcg_gen_shli_i64(xth, xth, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xth, xth, t0);
-            tcg_gen_mov_i64(xtl, cpu_vsrh(xB(ctx->opcode)));
+            get_cpu_vsrh(xtl, xB(ctx->opcode));
             tcg_gen_shli_i64(xtl, xtl, 32);
-            tcg_gen_mov_i64(t0, cpu_vsrl(xB(ctx->opcode)));
+            get_cpu_vsrl(t0, xB(ctx->opcode));
             tcg_gen_shri_i64(t0, t0, 32);
             tcg_gen_or_i64(xtl, xtl, t0);
             tcg_temp_free_i64(t0);
@@ -1202,8 +1437,8 @@ static void gen_xxsldwi(DisasContext *ctx)
         }
     }
 
-    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), xth);
-    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), xtl);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
 
     tcg_temp_free_i64(xth);
     tcg_temp_free_i64(xtl);
@@ -1214,6 +1449,7 @@ static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
     TCGv xt, xb;                                                \
     TCGv_i32 t0 = tcg_temp_new_i32();                           \
+    TCGv_i64 t1 = tcg_temp_new_i64();                           \
     uint8_t uimm = UIMM4(ctx->opcode);                          \
                                                                 \
     if (unlikely(!ctx->vsx_enabled)) {                          \
@@ -1226,8 +1462,9 @@ static void gen_##name(DisasContext *ctx)                       \
      * uimm > 12 handle as per hardware in helper               \
      */                                                         \
     if (uimm > 15) {                                            \
-        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);         \
-        tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), 0);         \
+        tcg_gen_movi_i64(t1, 0);                                \
+        set_cpu_vsrh(xT(ctx->opcode), t1);                      \
+        set_cpu_vsrl(xT(ctx->opcode), t1);                      \
         return;                                                 \
     }                                                           \
     tcg_gen_movi_i32(t0, uimm);                                 \
@@ -1235,6 +1472,7 @@ static void gen_##name(DisasContext *ctx)                       \
     tcg_temp_free(xb);                                          \
     tcg_temp_free(xt);                                          \
     tcg_temp_free_i32(t0);                                      \
+    tcg_temp_free_i64(t1);                                      \
 }
 
 VSX_EXTRACT_INSERT(xxextractuw)
@@ -1244,30 +1482,41 @@ VSX_EXTRACT_INSERT(xxinsertw)
 static void gen_xsxexpdp(DisasContext *ctx)
 {
     TCGv rt = cpu_gpr[rD(ctx->opcode)];
+    TCGv_i64 t0 = tcg_temp_new_i64();
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
-    tcg_gen_extract_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52, 11);
+    get_cpu_vsrh(t0, xB(ctx->opcode));
+    tcg_gen_extract_i64(rt, t0, 52, 11);
+    tcg_temp_free_i64(t0);
 }
 
 static void gen_xsxexpqp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
-    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     tcg_gen_extract_i64(xth, xbh, 48, 15);
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
     tcg_gen_movi_i64(xtl, 0);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
+
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
 }
 
 static void gen_xsiexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xth;
     TCGv ra = cpu_gpr[rA(ctx->opcode)];
     TCGv rb = cpu_gpr[rB(ctx->opcode)];
     TCGv_i64 t0;
@@ -1277,21 +1526,30 @@ static void gen_xsiexpdp(DisasContext *ctx)
         return;
     }
     t0 = tcg_temp_new_i64();
+    xth = tcg_temp_new_i64();
     tcg_gen_andi_i64(xth, ra, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, rb, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     /* dword[1] is undefined */
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
 }
 
 static void gen_xsiexpqp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
-    TCGv_i64 xah = cpu_vsrh(rA(ctx->opcode) + 32);
-    TCGv_i64 xal = cpu_vsrl(rA(ctx->opcode) + 32);
-    TCGv_i64 xbh = cpu_vsrh(rB(ctx->opcode) + 32);
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xah = tcg_temp_new_i64();
+    TCGv_i64 xal = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, rA(ctx->opcode) + 32);
+    get_cpu_vsrl(xal, rA(ctx->opcode) + 32);
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
+
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -1303,14 +1561,22 @@ static void gen_xsiexpqp(DisasContext *ctx)
     tcg_gen_andi_i64(t0, xbh, 0x7FFF);
     tcg_gen_shli_i64(t0, t0, 48);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
     tcg_gen_mov_i64(xtl, xal);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
 }
 
 static void gen_xsxsigdp(DisasContext *ctx)
 {
     TCGv rt = cpu_gpr[rD(ctx->opcode)];
-    TCGv_i64 t0, zr, nan, exp;
+    TCGv_i64 t0, t1, zr, nan, exp;
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1318,17 +1584,21 @@ static void gen_xsxsigdp(DisasContext *ctx)
     }
     exp = tcg_temp_new_i64();
     t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(2047);
 
-    tcg_gen_extract_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52, 11);
+    get_cpu_vsrh(t1, xB(ctx->opcode));
+    tcg_gen_extract_i64(exp, t1, 52, 11);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-    tcg_gen_andi_i64(rt, cpu_vsrh(xB(ctx->opcode)), 0x000FFFFFFFFFFFFF);
+    get_cpu_vsrh(t1, xB(ctx->opcode));
+    tcg_gen_andi_i64(rt, t1, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(rt, rt, t0);
 
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
@@ -1337,8 +1607,13 @@ static void gen_xsxsigdp(DisasContext *ctx)
 static void gen_xsxsigqp(DisasContext *ctx)
 {
     TCGv_i64 t0, zr, nan, exp;
-    TCGv_i64 xth = cpu_vsrh(rD(ctx->opcode) + 32);
-    TCGv_i64 xtl = cpu_vsrl(rD(ctx->opcode) + 32);
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, rB(ctx->opcode) + 32);
+    get_cpu_vsrl(xbl, rB(ctx->opcode) + 32);
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1349,29 +1624,41 @@ static void gen_xsxsigqp(DisasContext *ctx)
     zr = tcg_const_i64(0);
     nan = tcg_const_i64(32767);
 
-    tcg_gen_extract_i64(exp, cpu_vsrh(rB(ctx->opcode) + 32), 48, 15);
+    tcg_gen_extract_i64(exp, xbh, 48, 15);
     tcg_gen_movi_i64(t0, 0x0001000000000000);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-    tcg_gen_andi_i64(xth, cpu_vsrh(rB(ctx->opcode) + 32), 0x0000FFFFFFFFFFFF);
+    tcg_gen_andi_i64(xth, xbh, 0x0000FFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
-    tcg_gen_mov_i64(xtl, cpu_vsrl(rB(ctx->opcode) + 32));
+    set_cpu_vsrh(rD(ctx->opcode) + 32, xth);
+    tcg_gen_mov_i64(xtl, xbl);
+    set_cpu_vsrl(rD(ctx->opcode) + 32, xtl);
 
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 #endif
 
 static void gen_xviexpsp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
-    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xah = tcg_temp_new_i64();
+    TCGv_i64 xal = tcg_temp_new_i64();
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, xA(ctx->opcode));
+    get_cpu_vsrl(xal, xA(ctx->opcode));
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -1383,21 +1670,36 @@ static void gen_xviexpsp(DisasContext *ctx)
     tcg_gen_andi_i64(t0, xbh, 0xFF000000FF);
     tcg_gen_shli_i64(t0, t0, 23);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_andi_i64(xtl, xal, 0x807FFFFF807FFFFF);
     tcg_gen_andi_i64(t0, xbl, 0xFF000000FF);
     tcg_gen_shli_i64(t0, t0, 23);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xviexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xah = cpu_vsrh(xA(ctx->opcode));
-    TCGv_i64 xal = cpu_vsrl(xA(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xah = tcg_temp_new_i64();
+    TCGv_i64 xal = tcg_temp_new_i64();
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xah, xA(ctx->opcode));
+    get_cpu_vsrl(xal, xA(ctx->opcode));
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
+
     TCGv_i64 t0;
 
     if (unlikely(!ctx->vsx_enabled)) {
@@ -1409,19 +1711,31 @@ static void gen_xviexpdp(DisasContext *ctx)
     tcg_gen_andi_i64(t0, xbh, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_andi_i64(xtl, xal, 0x800FFFFFFFFFFFFF);
     tcg_gen_andi_i64(t0, xbl, 0x7FF);
     tcg_gen_shli_i64(t0, t0, 52);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xah);
+    tcg_temp_free_i64(xal);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xvxexpsp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -1429,33 +1743,53 @@ static void gen_xvxexpsp(DisasContext *ctx)
     }
     tcg_gen_shri_i64(xth, xbh, 23);
     tcg_gen_andi_i64(xth, xth, 0xFF000000FF);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_shri_i64(xtl, xbl, 23);
     tcg_gen_andi_i64(xtl, xtl, 0xFF000000FF);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 static void gen_xvxexpdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
         return;
     }
     tcg_gen_extract_i64(xth, xbh, 52, 11);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
     tcg_gen_extract_i64(xtl, xbl, 52, 11);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
+
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
 
 static void gen_xvxsigdp(DisasContext *ctx)
 {
-    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
-    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
-    TCGv_i64 xbh = cpu_vsrh(xB(ctx->opcode));
-    TCGv_i64 xbl = cpu_vsrl(xB(ctx->opcode));
+    TCGv_i64 xth = tcg_temp_new_i64();
+    TCGv_i64 xtl = tcg_temp_new_i64();
+
+    TCGv_i64 xbh = tcg_temp_new_i64();
+    TCGv_i64 xbl = tcg_temp_new_i64();
+    get_cpu_vsrh(xbh, xB(ctx->opcode));
+    get_cpu_vsrl(xbl, xB(ctx->opcode));
 
     TCGv_i64 t0, zr, nan, exp;
 
@@ -1474,6 +1808,7 @@ static void gen_xvxsigdp(DisasContext *ctx)
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xth, xbh, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xth, xth, t0);
+    set_cpu_vsrh(xT(ctx->opcode), xth);
 
     tcg_gen_extract_i64(exp, xbl, 52, 11);
     tcg_gen_movi_i64(t0, 0x0010000000000000);
@@ -1481,11 +1816,16 @@ static void gen_xvxsigdp(DisasContext *ctx)
     tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
     tcg_gen_andi_i64(xtl, xbl, 0x000FFFFFFFFFFFFF);
     tcg_gen_or_i64(xtl, xtl, t0);
+    set_cpu_vsrl(xT(ctx->opcode), xtl);
 
     tcg_temp_free_i64(t0);
     tcg_temp_free_i64(exp);
     tcg_temp_free_i64(zr);
     tcg_temp_free_i64(nan);
+    tcg_temp_free_i64(xth);
+    tcg_temp_free_i64(xtl);
+    tcg_temp_free_i64(xbh);
+    tcg_temp_free_i64(xbl);
 }
 
 #undef GEN_XX2FORM
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (2 preceding siblings ...)
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
@ 2018-12-17 12:24 ` Mark Cave-Ayland
  2018-12-17 16:46   ` Richard Henderson
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 5/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Mark Cave-Ayland
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

During review of the previous patch, Richard pointed out an existing bug that
the writeback to the avr{l,h} registers should be delayed until after any
exceptions have been raised.

Perform both 64-bit loads into separate temporaries and then write them into
the avr{l,h} registers together to ensure that this is always the case.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 target/ppc/translate/vmx-impl.inc.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 30046c6e31..cd7d12265c 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -18,33 +18,35 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
     TCGv EA;                                                                  \
-    TCGv_i64 avr;                                                             \
+    TCGv_i64 avr1, avr2;                                                      \
     if (unlikely(!ctx->altivec_enabled)) {                                    \
         gen_exception(ctx, POWERPC_EXCP_VPU);                                 \
         return;                                                               \
     }                                                                         \
     gen_set_access_type(ctx, ACCESS_INT);                                     \
-    avr = tcg_temp_new_i64();                                                 \
+    avr1 = tcg_temp_new_i64();                                                \
+    avr2 = tcg_temp_new_i64();                                                \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
     /* We only need to swap high and low halves. gen_qemu_ld64_i64 does       \
        necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
-        set_avr64(rD(ctx->opcode), avr, false);                               \
+        gen_qemu_ld64_i64(ctx, avr1, EA);                                     \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
-        set_avr64(rD(ctx->opcode), avr, true);                                \
+        gen_qemu_ld64_i64(ctx, avr2, EA);                                     \
+        set_avr64(rD(ctx->opcode), avr1, false);                              \
+        set_avr64(rD(ctx->opcode), avr2, true);                               \
     } else {                                                                  \
-        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
-        set_avr64(rD(ctx->opcode), avr, true);                                \
+        gen_qemu_ld64_i64(ctx, avr1, EA);                                     \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64_i64(ctx, avr, EA);                                      \
-        set_avr64(rD(ctx->opcode), avr, false);                               \
+        gen_qemu_ld64_i64(ctx, avr2, EA);                                     \
+        set_avr64(rD(ctx->opcode), avr1, true);                               \
+        set_avr64(rD(ctx->opcode), avr2, false);                              \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
-    tcg_temp_free_i64(avr);                                                   \
+    tcg_temp_free_i64(avr1);                                                  \
+    tcg_temp_free_i64(avr2);                                                  \
 }
 
 #define GEN_VR_STX(name, opc2, opc3)                                          \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 5/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (3 preceding siblings ...)
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction Mark Cave-Ayland
@ 2018-12-17 12:24 ` Mark Cave-Ayland
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

Instead of accessing the FPR, VMX and VSX registers through static arrays of
TCGv_i64 globals, remove them and change the helpers to load/store data directly
within cpu_env.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c              | 59 ++++++++++---------------------------
 target/ppc/translate/vsx-impl.inc.c |  4 +--
 2 files changed, 18 insertions(+), 45 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index fa3e8dc114..5923c688cd 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -55,15 +55,9 @@
 /* global register indexes */
 static char cpu_reg_names[10*3 + 22*4 /* GPR */
     + 10*4 + 22*5 /* SPE GPRh */
-    + 10*4 + 22*5 /* FPR */
-    + 2*(10*6 + 22*7) /* AVRh, AVRl */
-    + 10*5 + 22*6 /* VSR */
     + 8*5 /* CRF */];
 static TCGv cpu_gpr[32];
 static TCGv cpu_gprh[32];
-static TCGv_i64 cpu_fpr[32];
-static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
-static TCGv_i64 cpu_vsr[32];
 static TCGv_i32 cpu_crf[8];
 static TCGv cpu_nip;
 static TCGv cpu_msr;
@@ -108,39 +102,6 @@ void ppc_translate_init(void)
                                          offsetof(CPUPPCState, gprh[i]), p);
         p += (i < 10) ? 4 : 5;
         cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-        snprintf(p, cpu_reg_names_size, "fp%d", i);
-        cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env,
-                                            offsetof(CPUPPCState, fpr[i]), p);
-        p += (i < 10) ? 4 : 5;
-        cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-        snprintf(p, cpu_reg_names_size, "avr%dH", i);
-#ifdef HOST_WORDS_BIGENDIAN
-        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
-#else
-        cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
-#endif
-        p += (i < 10) ? 6 : 7;
-        cpu_reg_names_size -= (i < 10) ? 6 : 7;
-
-        snprintf(p, cpu_reg_names_size, "avr%dL", i);
-#ifdef HOST_WORDS_BIGENDIAN
-        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[1]), p);
-#else
-        cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
-                                             offsetof(CPUPPCState, avr[i].u64[0]), p);
-#endif
-        p += (i < 10) ? 6 : 7;
-        cpu_reg_names_size -= (i < 10) ? 6 : 7;
-        snprintf(p, cpu_reg_names_size, "vsr%d", i);
-        cpu_vsr[i] = tcg_global_mem_new_i64(cpu_env,
-                                            offsetof(CPUPPCState, vsr[i]), p);
-        p += (i < 10) ? 5 : 6;
-        cpu_reg_names_size -= (i < 10) ? 5 : 6;
     }
 
     cpu_nip = tcg_global_mem_new(cpu_env,
@@ -6696,22 +6657,34 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-    tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-    tcg_gen_mov_i64(cpu_fpr[regno], src);
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
-    tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+#ifdef HOST_WORDS_BIGENDIAN
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 0 : 1)]));
+#else
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
-    tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+#ifdef HOST_WORDS_BIGENDIAN
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 0 : 1)]));
+#else
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+                                          avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 #include "translate/fp-impl.inc.c"
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index e9a05d66f7..20e1fd9324 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -2,12 +2,12 @@
 
 static inline void get_vsr(TCGv_i64 dst, int n)
 {
-    tcg_gen_mov_i64(dst, cpu_vsr[n]);
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
 }
 
 static inline void set_vsr(int n, TCGv_i64 src)
 {
-    tcg_gen_mov_i64(cpu_vsr[n], src);
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
 }
 
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (4 preceding siblings ...)
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 5/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Mark Cave-Ayland
@ 2018-12-17 12:24 ` Mark Cave-Ayland
  2018-12-17 16:47   ` Richard Henderson
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

Since the VSX registers are actually a superset of the VMX registers then they
can be represented by the same type. Merge ppc_avr_t into ppc_vsr_t and change
ppc_avr_t to be a simple typedef alias.

Note that due to a difference in the naming of the float32 member between
ppc_avr_t and ppc_vsr_t, references to the ppc_avr_t f member must be replaced
with f32 instead.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 target/ppc/cpu.h        | 17 ++++++++-------
 target/ppc/int_helper.c | 56 +++++++++++++++++++++++++------------------------
 target/ppc/internal.h   | 11 ----------
 3 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index ab68abe8a2..5445d4c3c1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -230,7 +230,6 @@ typedef struct opc_handler_t opc_handler_t;
 /* Types used to describe some PowerPC registers etc. */
 typedef struct DisasContext DisasContext;
 typedef struct ppc_spr_t ppc_spr_t;
-typedef union ppc_avr_t ppc_avr_t;
 typedef union ppc_tlb_t ppc_tlb_t;
 typedef struct ppc_hash_pte64 ppc_hash_pte64_t;
 
@@ -254,22 +253,26 @@ struct ppc_spr_t {
 #endif
 };
 
-/* Altivec registers (128 bits) */
-union ppc_avr_t {
-    float32 f[4];
+/* VSX/Altivec registers (128 bits) */
+typedef union _ppc_vsr_t {
     uint8_t u8[16];
     uint16_t u16[8];
     uint32_t u32[4];
+    uint64_t u64[2];
     int8_t s8[16];
     int16_t s16[8];
     int32_t s32[4];
-    uint64_t u64[2];
     int64_t s64[2];
+    float32 f32[4];
+    float64 f64[2];
+    float128 f128;
 #ifdef CONFIG_INT128
     __uint128_t u128;
 #endif
-    Int128 s128;
-};
+    Int128  s128;
+} ppc_vsr_t;
+
+typedef ppc_vsr_t ppc_avr_t;
 
 #if !defined(CONFIG_USER_ONLY)
 /* Software TLB cache */
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index fcac90a4a9..9d715be25c 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -548,8 +548,8 @@ VARITH_DO(muluwm, *, u32)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            r->f[i] = func(a->f[i], b->f[i], &env->vec_status);         \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            r->f32[i] = func(a->f32[i], b->f32[i], &env->vec_status);   \
         }                                                               \
     }
 VARITHFP(addfp, float32_add)
@@ -563,9 +563,9 @@ VARITHFP(maxfp, float32_max)
                            ppc_avr_t *b, ppc_avr_t *c)                  \
     {                                                                   \
         int i;                                                          \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            r->f[i] = float32_muladd(a->f[i], c->f[i], b->f[i],         \
-                                     type, &env->vec_status);           \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            r->f32[i] = float32_muladd(a->f32[i], c->f32[i], b->f32[i], \
+                                       type, &env->vec_status);         \
         }                                                               \
     }
 VARITHFPFMA(maddfp, 0);
@@ -670,9 +670,9 @@ VABSDU(w, u32)
     {                                                                   \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
             float32 t = cvt(b->element[i], &env->vec_status);           \
-            r->f[i] = float32_scalbn(t, -uim, &env->vec_status);        \
+            r->f32[i] = float32_scalbn(t, -uim, &env->vec_status);      \
         }                                                               \
     }
 VCF(ux, uint32_to_float32, u32)
@@ -782,9 +782,9 @@ VCMPNE(w, u32, uint32_t, 0)
         uint32_t none = 0;                                              \
         int i;                                                          \
                                                                         \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
             uint32_t result;                                            \
-            int rel = float32_compare_quiet(a->f[i], b->f[i],           \
+            int rel = float32_compare_quiet(a->f32[i], b->f32[i],       \
                                             &env->vec_status);          \
             if (rel == float_relation_unordered) {                      \
                 result = 0;                                             \
@@ -816,14 +816,16 @@ static inline void vcmpbfp_internal(CPUPPCState *env, ppc_avr_t *r,
     int i;
     int all_in = 0;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        int le_rel = float32_compare_quiet(a->f[i], b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        int le_rel = float32_compare_quiet(a->f32[i], b->f32[i],
+                                           &env->vec_status);
         if (le_rel == float_relation_unordered) {
             r->u32[i] = 0xc0000000;
             all_in = 1;
         } else {
-            float32 bneg = float32_chs(b->f[i]);
-            int ge_rel = float32_compare_quiet(a->f[i], bneg, &env->vec_status);
+            float32 bneg = float32_chs(b->f32[i]);
+            int ge_rel = float32_compare_quiet(a->f32[i], bneg,
+                                               &env->vec_status);
             int le = le_rel != float_relation_greater;
             int ge = ge_rel != float_relation_less;
 
@@ -856,11 +858,11 @@ void helper_vcmpbfp_dot(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
         float_status s = env->vec_status;                               \
                                                                         \
         set_float_rounding_mode(float_round_to_zero, &s);               \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                        \
-            if (float32_is_any_nan(b->f[i])) {                          \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {                      \
+            if (float32_is_any_nan(b->f32[i])) {                        \
                 r->element[i] = 0;                                      \
             } else {                                                    \
-                float64 t = float32_to_float64(b->f[i], &s);            \
+                float64 t = float32_to_float64(b->f32[i], &s);          \
                 int64_t j;                                              \
                                                                         \
                 t = float64_scalbn(t, uim, &s);                         \
@@ -1661,8 +1663,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_div(float32_one, b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_div(float32_one, b->f32[i], &env->vec_status);
     }
 }
 
@@ -1674,8 +1676,8 @@ void helper_vrefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
         float_status s = env->vec_status;                       \
                                                                 \
         set_float_rounding_mode(rounding, &s);                  \
-        for (i = 0; i < ARRAY_SIZE(r->f); i++) {                \
-            r->f[i] = float32_round_to_int (b->f[i], &s);       \
+        for (i = 0; i < ARRAY_SIZE(r->f32); i++) {              \
+            r->f32[i] = float32_round_to_int (b->f32[i], &s);   \
         }                                                       \
     }
 VRFI(n, float_round_nearest_even)
@@ -1705,10 +1707,10 @@ void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        float32 t = float32_sqrt(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        float32 t = float32_sqrt(b->f32[i], &env->vec_status);
 
-        r->f[i] = float32_div(float32_one, t, &env->vec_status);
+        r->f32[i] = float32_div(float32_one, t, &env->vec_status);
     }
 }
 
@@ -1751,8 +1753,8 @@ void helper_vexptefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_exp2(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_exp2(b->f32[i], &env->vec_status);
     }
 }
 
@@ -1760,8 +1762,8 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(r->f); i++) {
-        r->f[i] = float32_log2(b->f[i], &env->vec_status);
+    for (i = 0; i < ARRAY_SIZE(r->f32); i++) {
+        r->f32[i] = float32_log2(b->f32[i], &env->vec_status);
     }
 }
 
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index a9bcadff42..b4b1f7b3db 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -204,17 +204,6 @@ EXTRACT_HELPER(IMM8, 11, 8);
 EXTRACT_HELPER(DCMX, 16, 7);
 EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
-typedef union _ppc_vsr_t {
-    uint8_t u8[16];
-    uint16_t u16[8];
-    uint32_t u32[4];
-    uint64_t u64[2];
-    float32 f32[4];
-    float64 f64[2];
-    float128 f128;
-    Int128  s128;
-} ppc_vsr_t;
-
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VsrB(i) u8[i]
 #define VsrH(i) u16[i]
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (5 preceding siblings ...)
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
@ 2018-12-17 12:24 ` Mark Cave-Ayland
  2018-12-17 16:58   ` Richard Henderson
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 8/9] target/ppc: convert VMX logical instructions to use vector operations Mark Cave-Ayland
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

The VSX register array is a block of 64 128-bit registers where the first 32
registers consist of the existing 64-bit FP registers extended to 128-bit
using new VSR registers, and the last 32 registers are the VMX 128-bit
registers as show below:

            64-bit               64-bit
    +--------------------+--------------------+
    |        FP0         |                    |  VSR0
    +--------------------+--------------------+
    |        FP1         |                    |  VSR1
    +--------------------+--------------------+
    |        ...         |        ...         |  ...
    +--------------------+--------------------+
    |        FP30        |                    |  VSR30
    +--------------------+--------------------+
    |        FP31        |                    |  VSR31
    +--------------------+--------------------+
    |                  VMX0                   |  VSR32
    +-----------------------------------------+
    |                  VMX1                   |  VSR33
    +-----------------------------------------+
    |                  ...                    |  ...
    +-----------------------------------------+
    |                  VMX30                  |  VSR62
    +-----------------------------------------+
    |                  VMX31                  |  VSR63
    +-----------------------------------------+

In order to allow for future conversion of VSX instructions to use TCG vector
operations, recreate the same layout using an aligned version of the existing
vsr register array.

Since the old fpr and avr register arrays are removed, the existing callers
must also be updated to use the correct offset in the vsr register array. This
also includes switching the relevant VMState fields over to using subarrays
to make sure that migration is preserved.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 linux-user/ppc/signal.c             | 24 ++++++-------
 target/ppc/arch_dump.c              | 12 +++----
 target/ppc/cpu.h                    |  9 ++---
 target/ppc/gdbstub.c                |  8 ++---
 target/ppc/internal.h               | 18 +++-------
 target/ppc/machine.c                | 72 ++++++++++++++++++++++++++++++++++---
 target/ppc/monitor.c                |  4 +--
 target/ppc/translate.c              | 14 ++++----
 target/ppc/translate/dfp-impl.inc.c |  2 +-
 target/ppc/translate/vmx-impl.inc.c |  7 +++-
 target/ppc/translate/vsx-impl.inc.c |  4 +--
 target/ppc/translate_init.inc.c     | 24 ++++++-------
 12 files changed, 126 insertions(+), 72 deletions(-)

diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index 2ae120a2bc..a053dd5b84 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -258,8 +258,8 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
     /* Save Altivec registers if necessary.  */
     if (env->insns_flags & PPC_ALTIVEC) {
         uint32_t *vrsave;
-        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
-            ppc_avr_t *avr = &env->avr[i];
+        for (i = 0; i < 32; i++) {
+            ppc_avr_t *avr = &env->vsr[32 + i];
             ppc_avr_t *vreg = (ppc_avr_t *)&frame->mc_vregs.altivec[i];
 
             __put_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
@@ -281,15 +281,15 @@ static void save_user_regs(CPUPPCState *env, struct target_mcontext *frame)
     /* Save VSX second halves */
     if (env->insns_flags2 & PPC2_VSX) {
         uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
-        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
-            __put_user(env->vsr[i], &vsregs[i]);
+        for (i = 0; i < 32; i++) {
+            __put_user(env->vsr[i].u64[1], &vsregs[i]);
         }
     }
 
     /* Save floating point registers.  */
     if (env->insns_flags & PPC_FLOAT) {
-        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
-            __put_user(env->fpr[i], &frame->mc_fregs[i]);
+        for (i = 0; i < 32; i++) {
+            __put_user(env->vsr[i].u64[0], &frame->mc_fregs[i]);
         }
         __put_user((uint64_t) env->fpscr, &frame->mc_fregs[32]);
     }
@@ -373,8 +373,8 @@ static void restore_user_regs(CPUPPCState *env,
 #else
         v_regs = (ppc_avr_t *)frame->mc_vregs.altivec;
 #endif
-        for (i = 0; i < ARRAY_SIZE(env->avr); i++) {
-            ppc_avr_t *avr = &env->avr[i];
+        for (i = 0; i < 32; i++) {
+            ppc_avr_t *avr = &env->vsr[32 + i];
             ppc_avr_t *vreg = &v_regs[i];
 
             __get_user(avr->u64[PPC_VEC_HI], &vreg->u64[0]);
@@ -393,16 +393,16 @@ static void restore_user_regs(CPUPPCState *env,
     /* Restore VSX second halves */
     if (env->insns_flags2 & PPC2_VSX) {
         uint64_t *vsregs = (uint64_t *)&frame->mc_vregs.altivec[34];
-        for (i = 0; i < ARRAY_SIZE(env->vsr); i++) {
-            __get_user(env->vsr[i], &vsregs[i]);
+        for (i = 0; i < 32; i++) {
+            __get_user(env->vsr[i].u64[1], &vsregs[i]);
         }
     }
 
     /* Restore floating point registers.  */
     if (env->insns_flags & PPC_FLOAT) {
         uint64_t fpscr;
-        for (i = 0; i < ARRAY_SIZE(env->fpr); i++) {
-            __get_user(env->fpr[i], &frame->mc_fregs[i]);
+        for (i = 0; i < 32; i++) {
+            __get_user(env->vsr[i].u64[0], &frame->mc_fregs[i]);
         }
         __get_user(fpscr, &frame->mc_fregs[32]);
         env->fpscr = (uint32_t) fpscr;
diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
index cc1460e4e3..c272d0d3d4 100644
--- a/target/ppc/arch_dump.c
+++ b/target/ppc/arch_dump.c
@@ -140,7 +140,7 @@ static void ppc_write_elf_fpregset(NoteFuncArg *arg, PowerPCCPU *cpu)
     memset(fpregset, 0, sizeof(*fpregset));
 
     for (i = 0; i < 32; i++) {
-        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.fpr[i]);
+        fpregset->fpr[i] = cpu_to_dump64(s, cpu->env.vsr[i].u64[0]);
     }
     fpregset->fpscr = cpu_to_dump_reg(s, cpu->env.fpscr);
 }
@@ -166,11 +166,11 @@ static void ppc_write_elf_vmxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
 #endif
 
         if (needs_byteswap) {
-            vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
-            vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
+            vmxregset->avr[i].u64[0] = bswap64(cpu->env.vsr[32 + i].u64[1]);
+            vmxregset->avr[i].u64[1] = bswap64(cpu->env.vsr[32 + i].u64[0]);
         } else {
-            vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
-            vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
+            vmxregset->avr[i].u64[0] = cpu->env.vsr[32 + i].u64[0];
+            vmxregset->avr[i].u64[1] = cpu->env.vsr[32 + i].u64[1];
         }
     }
     vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);
@@ -188,7 +188,7 @@ static void ppc_write_elf_vsxregset(NoteFuncArg *arg, PowerPCCPU *cpu)
     memset(vsxregset, 0, sizeof(*vsxregset));
 
     for (i = 0; i < 32; i++) {
-        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i]);
+        vsxregset->vsr[i] = cpu_to_dump64(s, cpu->env.vsr[i].u64[1]);
     }
 }
 
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 5445d4c3c1..c8f449081d 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1016,8 +1016,6 @@ struct CPUPPCState {
 
     /* Floating point execution context */
     float_status fp_status;
-    /* floating point registers */
-    float64 fpr[32];
     /* floating point status and control register */
     target_ulong fpscr;
 
@@ -1067,11 +1065,10 @@ struct CPUPPCState {
     /* Special purpose registers */
     target_ulong spr[1024];
     ppc_spr_t spr_cb[1024];
-    /* Altivec registers */
-    ppc_avr_t avr[32];
+    /* Vector status and control register */
     uint32_t vscr;
-    /* VSX registers */
-    uint64_t vsr[32];
+    /* VSX registers (including FP and AVR) */
+    ppc_vsr_t vsr[64] QEMU_ALIGNED(16);
     /* SPE registers */
     uint64_t spe_acc;
     uint32_t spe_fscr;
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index b6f6693583..8c9dc284c4 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -126,7 +126,7 @@ int ppc_cpu_gdb_read_register(CPUState *cs, uint8_t *mem_buf, int n)
         gdb_get_regl(mem_buf, env->gpr[n]);
     } else if (n < 64) {
         /* fprs */
-        stfq_p(mem_buf, env->fpr[n-32]);
+        stfq_p(mem_buf, env->vsr[n - 32].u64[0]);
     } else {
         switch (n) {
         case 64:
@@ -178,7 +178,7 @@ int ppc_cpu_gdb_read_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
         gdb_get_reg64(mem_buf, env->gpr[n]);
     } else if (n < 64) {
         /* fprs */
-        stfq_p(mem_buf, env->fpr[n-32]);
+        stfq_p(mem_buf, env->vsr[n - 32].u64[0]);
     } else if (n < 96) {
         /* Altivec */
         stq_p(mem_buf, n - 64);
@@ -234,7 +234,7 @@ int ppc_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldtul_p(mem_buf);
     } else if (n < 64) {
         /* fprs */
-        env->fpr[n-32] = ldfq_p(mem_buf);
+        env->vsr[n - 32].u64[0] = ldfq_p(mem_buf);
     } else {
         switch (n) {
         case 64:
@@ -284,7 +284,7 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, uint8_t *mem_buf, int n)
         env->gpr[n] = ldq_p(mem_buf);
     } else if (n < 64) {
         /* fprs */
-        env->fpr[n-32] = ldfq_p(mem_buf);
+        env->vsr[n - 32].u64[0] = ldfq_p(mem_buf);
     } else {
         switch (n) {
         case 64 + 32:
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index b4b1f7b3db..b77d564a65 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -218,24 +218,14 @@ EXTRACT_HELPER_SPLIT_3(DCMX_XV, 5, 16, 0, 1, 2, 5, 1, 6, 6);
 
 static inline void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
-    if (n < 32) {
-        vsr->VsrD(0) = env->fpr[n];
-        vsr->VsrD(1) = env->vsr[n];
-    } else {
-        vsr->u64[0] = env->avr[n - 32].u64[0];
-        vsr->u64[1] = env->avr[n - 32].u64[1];
-    }
+    vsr->VsrD(0) = env->vsr[n].u64[0];
+    vsr->VsrD(1) = env->vsr[n].u64[1];
 }
 
 static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 {
-    if (n < 32) {
-        env->fpr[n] = vsr->VsrD(0);
-        env->vsr[n] = vsr->VsrD(1);
-    } else {
-        env->avr[n - 32].u64[0] = vsr->u64[0];
-        env->avr[n - 32].u64[1] = vsr->u64[1];
-    }
+    env->vsr[n].u64[0] = vsr->VsrD(0);
+    env->vsr[n].u64[1] = vsr->VsrD(1);
 }
 
 void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index e7b3725273..451cf376b4 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -45,7 +45,7 @@ static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
             uint64_t l;
         } u;
         u.l = qemu_get_be64(f);
-        env->fpr[i] = u.d;
+        env->vsr[i].u64[0] = u.d;
     }
     qemu_get_be32s(f, &fpscr);
     env->fpscr = fpscr;
@@ -138,11 +138,73 @@ static const VMStateInfo vmstate_info_avr = {
 };
 
 #define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
-    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
+    VMSTATE_SUB_ARRAY(_f, _s, 32, _n, _v, vmstate_info_avr, ppc_avr_t)
 
 #define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
     VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
 
+static int get_fpr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field)
+{
+    ppc_vsr_t *v = pv;
+
+    v->u64[0] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static int put_fpr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field, QJSON *vmdesc)
+{
+    ppc_vsr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[0]);
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_fpr = {
+    .name = "fpr",
+    .get  = get_fpr,
+    .put  = put_fpr,
+};
+
+#define VMSTATE_FPR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_fpr, ppc_vsr_t)
+
+#define VMSTATE_FPR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_FPR_ARRAY_V(_f, _s, _n, 0)
+
+static int get_vsr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field)
+{
+    ppc_vsr_t *v = pv;
+
+    v->u64[1] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static int put_vsr(QEMUFile *f, void *pv, size_t size,
+                   const VMStateField *field, QJSON *vmdesc)
+{
+    ppc_vsr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[1]);
+    return 0;
+}
+
+static const VMStateInfo vmstate_info_vsr = {
+    .name = "vsr",
+    .get  = get_vsr,
+    .put  = put_vsr,
+};
+
+#define VMSTATE_VSR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_SUB_ARRAY(_f, _s, 0, _n, _v, vmstate_info_vsr, ppc_vsr_t)
+
+#define VMSTATE_VSR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_VSR_ARRAY_V(_f, _s, _n, 0)
+
 static bool cpu_pre_2_8_migration(void *opaque, int version_id)
 {
     PowerPCCPU *cpu = opaque;
@@ -354,7 +416,7 @@ static const VMStateDescription vmstate_fpu = {
     .minimum_version_id = 1,
     .needed = fpu_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
+        VMSTATE_FPR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },
@@ -373,7 +435,7 @@ static const VMStateDescription vmstate_altivec = {
     .minimum_version_id = 1,
     .needed = altivec_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
+        VMSTATE_AVR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_UINT32(env.vscr, PowerPCCPU),
         VMSTATE_END_OF_LIST()
     },
@@ -392,7 +454,7 @@ static const VMStateDescription vmstate_vsx = {
     .minimum_version_id = 1,
     .needed = vsx_needed,
     .fields = (VMStateField[]) {
-        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
+        VMSTATE_VSR_ARRAY(env.vsr, PowerPCCPU, 32),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/ppc/monitor.c b/target/ppc/monitor.c
index 14915119fc..1db9396b2e 100644
--- a/target/ppc/monitor.c
+++ b/target/ppc/monitor.c
@@ -123,8 +123,8 @@ int target_get_monitor_def(CPUState *cs, const char *name, uint64_t *pval)
 
     /* Floating point registers */
     if ((qemu_tolower(name[0]) == 'f') &&
-        ppc_cpu_get_reg_num(name + 1, ARRAY_SIZE(env->fpr), &regnum)) {
-        *pval = env->fpr[regnum];
+        ppc_cpu_get_reg_num(name + 1, 32, &regnum)) {
+        *pval = env->vsr[regnum].u64[0];
         return 0;
     }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 5923c688cd..8e89aec14d 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6657,22 +6657,22 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[regno].u64[0]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
 #ifdef HOST_WORDS_BIGENDIAN
     tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 0 : 1)]));
+                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
 #else
     tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 1 : 0)]));
+                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
 #endif
 }
 
@@ -6680,10 +6680,10 @@ static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
 #ifdef HOST_WORDS_BIGENDIAN
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 0 : 1)]));
+                                          vsr[32 + regno].u64[(high ? 0 : 1)]));
 #else
     tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
-                                          avr[regno].u64[(high ? 1 : 0)]));
+                                          vsr[32 + regno].u64[(high ? 1 : 0)]));
 #endif
 }
 
@@ -7434,7 +7434,7 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
             if ((i & (RFPL - 1)) == 0) {
                 cpu_fprintf(f, "FPR%02d", i);
             }
-            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->fpr[i]));
+            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->vsr[i].u64[0]));
             if ((i & (RFPL - 1)) == (RFPL - 1)) {
                 cpu_fprintf(f, "\n");
             }
diff --git a/target/ppc/translate/dfp-impl.inc.c b/target/ppc/translate/dfp-impl.inc.c
index 634ef73b8a..6c556dc2e1 100644
--- a/target/ppc/translate/dfp-impl.inc.c
+++ b/target/ppc/translate/dfp-impl.inc.c
@@ -3,7 +3,7 @@
 static inline TCGv_ptr gen_fprp_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, fpr[reg]));
+    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[reg].u64[0]));
     return r;
 }
 
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index cd7d12265c..f2836fb750 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -10,10 +10,15 @@
 static inline TCGv_ptr gen_avr_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, avr[reg]));
+    tcg_gen_addi_ptr(r, cpu_env, offsetof(CPUPPCState, vsr[32 + reg].u64[0]));
     return r;
 }
 
+static inline long avr64_offset(int reg, bool high)
+{
+    return offsetof(CPUPPCState, vsr[32 + reg].u64[(high ? 0 : 1)]);
+}
+
 #define GEN_VR_LDX(name, opc2, opc3)                                          \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 20e1fd9324..1608ad48b1 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -2,12 +2,12 @@
 
 static inline void get_vsr(TCGv_i64 dst, int n)
 {
-    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n]));
+    tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
 static inline void set_vsr(int n, TCGv_i64 src)
 {
-    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n]));
+    tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, vsr[n].u64[1]));
 }
 
 static inline void get_cpu_vsrh(TCGv_i64 dst, int n)
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 168d0cec28..b83097141c 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -9486,7 +9486,7 @@ static bool avr_need_swap(CPUPPCState *env)
 static int gdb_get_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
-        stfq_p(mem_buf, env->fpr[n]);
+        stfq_p(mem_buf, env->vsr[n].u64[0]);
         ppc_maybe_bswap_register(env, mem_buf, 8);
         return 8;
     }
@@ -9502,7 +9502,7 @@ static int gdb_set_float_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         ppc_maybe_bswap_register(env, mem_buf, 8);
-        env->fpr[n] = ldfq_p(mem_buf);
+        env->vsr[n].u64[0] = ldfq_p(mem_buf);
         return 8;
     }
     if (n == 32) {
@@ -9517,11 +9517,11 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         if (!avr_need_swap(env)) {
-            stq_p(mem_buf, env->avr[n].u64[0]);
-            stq_p(mem_buf+8, env->avr[n].u64[1]);
+            stq_p(mem_buf, env->vsr[32 + n].u64[0]);
+            stq_p(mem_buf + 8, env->vsr[32 + n].u64[1]);
         } else {
-            stq_p(mem_buf, env->avr[n].u64[1]);
-            stq_p(mem_buf+8, env->avr[n].u64[0]);
+            stq_p(mem_buf, env->vsr[32 + n].u64[1]);
+            stq_p(mem_buf + 8, env->vsr[32 + n].u64[0]);
         }
         ppc_maybe_bswap_register(env, mem_buf, 8);
         ppc_maybe_bswap_register(env, mem_buf + 8, 8);
@@ -9546,11 +9546,11 @@ static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
         ppc_maybe_bswap_register(env, mem_buf, 8);
         ppc_maybe_bswap_register(env, mem_buf + 8, 8);
         if (!avr_need_swap(env)) {
-            env->avr[n].u64[0] = ldq_p(mem_buf);
-            env->avr[n].u64[1] = ldq_p(mem_buf+8);
+            env->vsr[32 + n].u64[0] = ldq_p(mem_buf);
+            env->vsr[32 + n].u64[1] = ldq_p(mem_buf + 8);
         } else {
-            env->avr[n].u64[1] = ldq_p(mem_buf);
-            env->avr[n].u64[0] = ldq_p(mem_buf+8);
+            env->vsr[32 + n].u64[1] = ldq_p(mem_buf);
+            env->vsr[32 + n].u64[0] = ldq_p(mem_buf + 8);
         }
         return 16;
     }
@@ -9623,7 +9623,7 @@ static int gdb_set_spe_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 static int gdb_get_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
-        stq_p(mem_buf, env->vsr[n]);
+        stq_p(mem_buf, env->vsr[n].u64[1]);
         ppc_maybe_bswap_register(env, mem_buf, 8);
         return 8;
     }
@@ -9634,7 +9634,7 @@ static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
 {
     if (n < 32) {
         ppc_maybe_bswap_register(env, mem_buf, 8);
-        env->vsr[n] = ldq_p(mem_buf);
+        env->vsr[n].u64[1] = ldq_p(mem_buf);
         return 8;
     }
     return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 8/9] target/ppc: convert VMX logical instructions to use vector operations
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (6 preceding siblings ...)
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
@ 2018-12-17 12:24 ` Mark Cave-Ayland
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 9/9] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Mark Cave-Ayland
  2018-12-17 17:39 ` [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG " Richard Henderson
  9 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c              |  1 +
 target/ppc/translate/vmx-impl.inc.c | 59 +++++++++++++++++++++----------------
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 8e89aec14d..1b61bfa093 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -24,6 +24,7 @@
 #include "disas/disas.h"
 #include "exec/exec-all.h"
 #include "tcg-op.h"
+#include "tcg-op-gvec.h"
 #include "qemu/host-utils.h"
 #include "exec/cpu_ldst.h"
 
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index f2836fb750..ca24676719 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -264,41 +264,50 @@ GEN_VX_VMUL10(vmul10euq, 1, 0);
 GEN_VX_VMUL10(vmul10cuq, 0, 1);
 GEN_VX_VMUL10(vmul10ecuq, 1, 1);
 
-/* Logical operations */
-#define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
-static void glue(gen_, name)(DisasContext *ctx)                                 \
+#define GEN_VXFORM_V(name, vece, tcg_op, opc2, opc3)                    \
+static void glue(gen_, name)(DisasContext *ctx)                         \
 {                                                                       \
-    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
-    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
-    TCGv_i64 avr = tcg_temp_new_i64();                                  \
+    if (unlikely(!ctx->altivec_enabled)) {                              \
+        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
+        return;                                                         \
+    }                                                                   \
                                                                         \
+    tcg_op(vece,                                                        \
+           avr64_offset(rD(ctx->opcode), true),                         \
+           avr64_offset(rA(ctx->opcode), true),                         \
+           avr64_offset(rB(ctx->opcode), true),                         \
+           16, 16);                                                     \
+}
+
+#define GEN_VXFORM_VN(name, vece, tcg_op, opc2, opc3)                   \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+{                                                                       \
     if (unlikely(!ctx->altivec_enabled)) {                              \
         gen_exception(ctx, POWERPC_EXCP_VPU);                           \
         return;                                                         \
     }                                                                   \
-    get_avr64(t0, rA(ctx->opcode), true);                               \
-    get_avr64(t1, rB(ctx->opcode), true);                               \
-    tcg_op(avr, t0, t1);                                                \
-    set_avr64(rD(ctx->opcode), avr, true);                              \
                                                                         \
-    get_avr64(t0, rA(ctx->opcode), false);                              \
-    get_avr64(t1, rB(ctx->opcode), false);                              \
-    tcg_op(avr, t0, t1);                                                \
-    set_avr64(rD(ctx->opcode), avr, false);                             \
+    tcg_op(vece,                                                        \
+           avr64_offset(rD(ctx->opcode), true),                         \
+           avr64_offset(rA(ctx->opcode), true),                         \
+           avr64_offset(rB(ctx->opcode), true),                         \
+           16, 16);                                                     \
                                                                         \
-    tcg_temp_free_i64(t0);                                              \
-    tcg_temp_free_i64(t1);                                              \
-    tcg_temp_free_i64(avr);                                             \
+    tcg_gen_gvec_not(vece,                                              \
+                     avr64_offset(rD(ctx->opcode), true),               \
+                     avr64_offset(rD(ctx->opcode), true),               \
+                     16, 16);                                           \
 }
 
-GEN_VX_LOGICAL(vand, tcg_gen_and_i64, 2, 16);
-GEN_VX_LOGICAL(vandc, tcg_gen_andc_i64, 2, 17);
-GEN_VX_LOGICAL(vor, tcg_gen_or_i64, 2, 18);
-GEN_VX_LOGICAL(vxor, tcg_gen_xor_i64, 2, 19);
-GEN_VX_LOGICAL(vnor, tcg_gen_nor_i64, 2, 20);
-GEN_VX_LOGICAL(veqv, tcg_gen_eqv_i64, 2, 26);
-GEN_VX_LOGICAL(vnand, tcg_gen_nand_i64, 2, 22);
-GEN_VX_LOGICAL(vorc, tcg_gen_orc_i64, 2, 21);
+/* Logical operations */
+GEN_VXFORM_V(vand, MO_64, tcg_gen_gvec_and, 2, 16);
+GEN_VXFORM_V(vandc, MO_64, tcg_gen_gvec_andc, 2, 17);
+GEN_VXFORM_V(vor, MO_64, tcg_gen_gvec_or, 2, 18);
+GEN_VXFORM_V(vxor, MO_64, tcg_gen_gvec_xor, 2, 19);
+GEN_VXFORM_VN(vnor, MO_64, tcg_gen_gvec_or, 2, 20);
+GEN_VXFORM_VN(veqv, MO_64, tcg_gen_gvec_xor, 2, 26);
+GEN_VXFORM_VN(vnand, MO_64, tcg_gen_gvec_and, 2, 22);
+GEN_VXFORM_V(vorc, MO_64, tcg_gen_gvec_orc, 2, 21);
 
 #define GEN_VXFORM(name, opc2, opc3)                                    \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH v2 9/9] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (7 preceding siblings ...)
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 8/9] target/ppc: convert VMX logical instructions to use vector operations Mark Cave-Ayland
@ 2018-12-17 12:24 ` Mark Cave-Ayland
  2018-12-17 17:39 ` [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG " Richard Henderson
  9 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc, david, richard.henderson, lvivier

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 |  8 --------
 target/ppc/int_helper.c             |  7 -------
 target/ppc/translate/vmx-impl.inc.c | 16 ++++++++--------
 3 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index c7de04e068..553ff500c8 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -108,14 +108,6 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
 #define dh_ctype_avr ppc_avr_t *
 #define dh_is_signed_avr dh_is_signed_ptr
 
-DEF_HELPER_3(vaddubm, void, avr, avr, avr)
-DEF_HELPER_3(vadduhm, void, avr, avr, avr)
-DEF_HELPER_3(vadduwm, void, avr, avr, avr)
-DEF_HELPER_3(vaddudm, void, avr, avr, avr)
-DEF_HELPER_3(vsububm, void, avr, avr, avr)
-DEF_HELPER_3(vsubuhm, void, avr, avr, avr)
-DEF_HELPER_3(vsubuwm, void, avr, avr, avr)
-DEF_HELPER_3(vsubudm, void, avr, avr, avr)
 DEF_HELPER_3(vavgub, void, avr, avr, avr)
 DEF_HELPER_3(vavguh, void, avr, avr, avr)
 DEF_HELPER_3(vavguw, void, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 9d715be25c..4547453ef1 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -531,13 +531,6 @@ void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b)
             r->element[i] = a->element[i] op b->element[i];             \
         }                                                               \
     }
-#define VARITH(suffix, element)                 \
-    VARITH_DO(add##suffix, +, element)          \
-    VARITH_DO(sub##suffix, -, element)
-VARITH(ubm, u8)
-VARITH(uhm, u16)
-VARITH(uwm, u32)
-VARITH(udm, u64)
 VARITH_DO(muluwm, *, u32)
 #undef VARITH_DO
 #undef VARITH
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index ca24676719..9a7fd4e8a5 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -413,18 +413,18 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
     tcg_temp_free_ptr(rb);                                              \
 }
 
-GEN_VXFORM(vaddubm, 0, 0);
+GEN_VXFORM_V(vaddubm, MO_8, tcg_gen_gvec_add, 0, 0);
 GEN_VXFORM_DUAL_EXT(vaddubm, PPC_ALTIVEC, PPC_NONE, 0,       \
                     vmul10cuq, PPC_NONE, PPC2_ISA300, 0x0000F800)
-GEN_VXFORM(vadduhm, 0, 1);
+GEN_VXFORM_V(vadduhm, MO_16, tcg_gen_gvec_add, 0, 1);
 GEN_VXFORM_DUAL(vadduhm, PPC_ALTIVEC, PPC_NONE,  \
                 vmul10ecuq, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM(vadduwm, 0, 2);
-GEN_VXFORM(vaddudm, 0, 3);
-GEN_VXFORM(vsububm, 0, 16);
-GEN_VXFORM(vsubuhm, 0, 17);
-GEN_VXFORM(vsubuwm, 0, 18);
-GEN_VXFORM(vsubudm, 0, 19);
+GEN_VXFORM_V(vadduwm, MO_32, tcg_gen_gvec_add, 0, 2);
+GEN_VXFORM_V(vaddudm, MO_64, tcg_gen_gvec_add, 0, 3);
+GEN_VXFORM_V(vsububm, MO_8, tcg_gen_gvec_sub, 0, 16);
+GEN_VXFORM_V(vsubuhm, MO_16, tcg_gen_gvec_sub, 0, 17);
+GEN_VXFORM_V(vsubuwm, MO_32, tcg_gen_gvec_sub, 0, 18);
+GEN_VXFORM_V(vsubudm, MO_64, tcg_gen_gvec_sub, 0, 19);
 GEN_VXFORM(vmaxub, 1, 0);
 GEN_VXFORM(vmaxuh, 1, 1);
 GEN_VXFORM(vmaxuw, 1, 2);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
@ 2018-12-17 16:40   ` Richard Henderson
  2018-12-17 17:20     ` Mark Cave-Ayland
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2018-12-17 16:40 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david, lvivier

On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
> These helpers allow us to move FP register values to/from the specified TCGv_i64
> argument in the VSR helpers to be introduced shortly.
> 
> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
> temporaries as required.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate.c             |  10 +
>  target/ppc/translate/fp-impl.inc.c | 490 ++++++++++++++++++++++++++++---------
>  2 files changed, 390 insertions(+), 110 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR register access
  2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
@ 2018-12-17 16:43   ` Richard Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2018-12-17 16:43 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david, lvivier

On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
> These helpers allow us to move VSR register values to/from the specified TCGv_i64
> argument.
> 
> To prevent VSX helpers accessing the cpu_vsr array directly, add extra TCG
> temporaries as required.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate/vsx-impl.inc.c | 782 ++++++++++++++++++++++++++----------
>  1 file changed, 561 insertions(+), 221 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction Mark Cave-Ayland
@ 2018-12-17 16:46   ` Richard Henderson
  2018-12-17 17:23     ` Mark Cave-Ayland
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2018-12-17 16:46 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david, lvivier

On 12/17/18 4:24 AM, Mark Cave-Ayland wrote:
> During review of the previous patch, Richard pointed out an existing bug that
> the writeback to the avr{l,h} registers should be delayed until after any
> exceptions have been raised.
> 
> Perform both 64-bit loads into separate temporaries and then write them into
> the avr{l,h} registers together to ensure that this is always the case.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/translate/vmx-impl.inc.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)

I feel a bit silly.  There was no bug, since the address is forced to be
aligned on a 16-byte boundary.  The second memory access will be to the same
page and cannot trap.

That said, I think the cleanup looks good.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
@ 2018-12-17 16:47   ` Richard Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2018-12-17 16:47 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david, lvivier

On 12/17/18 4:24 AM, Mark Cave-Ayland wrote:
> Since the VSX registers are actually a superset of the VMX registers then they
> can be represented by the same type. Merge ppc_avr_t into ppc_vsr_t and change
> ppc_avr_t to be a simple typedef alias.
> 
> Note that due to a difference in the naming of the float32 member between
> ppc_avr_t and ppc_vsr_t, references to the ppc_avr_t f member must be replaced
> with f32 instead.
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> ---
>  target/ppc/cpu.h        | 17 ++++++++-------
>  target/ppc/int_helper.c | 56 +++++++++++++++++++++++++------------------------
>  target/ppc/internal.h   | 11 ----------
>  3 files changed, 39 insertions(+), 45 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
@ 2018-12-17 16:58   ` Richard Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2018-12-17 16:58 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david, lvivier

On 12/17/18 4:24 AM, Mark Cave-Ayland wrote:
> -            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->fpr[i]));
> +            cpu_fprintf(f, " %016" PRIx64, *((uint64_t *)&env->vsr[i].u64[0]));

There's no longer a need for a cast.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access
  2018-12-17 16:40   ` Richard Henderson
@ 2018-12-17 17:20     ` Mark Cave-Ayland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 17:20 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc, david, lvivier

On 17/12/2018 16:40, Richard Henderson wrote:

> On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
>> These helpers allow us to move FP register values to/from the specified TCGv_i64
>> argument in the VSR helpers to be introduced shortly.
>>
>> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
>> temporaries as required.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> ---
>>  target/ppc/translate.c             |  10 +
>>  target/ppc/translate/fp-impl.inc.c | 490 ++++++++++++++++++++++++++++---------
>>  2 files changed, 390 insertions(+), 110 deletions(-)
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Howard has notified me off-list of a regression he noticed in OS X, which I've traced
down to an erroneous extra get_fpr() in the GEN_FLOAT_* macro changes in this patch.

I've pushed the fixed version of this patchset to
https://github.com/mcayland/qemu/tree/ppc-altivec for those who wish to experiment
before I send out a v3 later in the week (I feel reasonably confident now to drop the
RFC for the next iteration).


ATB,

Mark.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction
  2018-12-17 16:46   ` Richard Henderson
@ 2018-12-17 17:23     ` Mark Cave-Ayland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 17:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc, david, lvivier

On 17/12/2018 16:46, Richard Henderson wrote:

> On 12/17/18 4:24 AM, Mark Cave-Ayland wrote:
>> During review of the previous patch, Richard pointed out an existing bug that
>> the writeback to the avr{l,h} registers should be delayed until after any
>> exceptions have been raised.
>>
>> Perform both 64-bit loads into separate temporaries and then write them into
>> the avr{l,h} registers together to ensure that this is always the case.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> ---
>>  target/ppc/translate/vmx-impl.inc.c | 24 +++++++++++++-----------
>>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> I feel a bit silly.  There was no bug, since the address is forced to be
> aligned on a 16-byte boundary.  The second memory access will be to the same
> page and cannot trap.
> 
> That said, I think the cleanup looks good.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Ah okay. I'm inclined to drop this patch for now, since it feels strange introducing
a cleanup to just one of the load functions...


ATB,

Mark.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations
  2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
                   ` (8 preceding siblings ...)
  2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 9/9] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Mark Cave-Ayland
@ 2018-12-17 17:39 ` Richard Henderson
  2018-12-17 18:49   ` Mark Cave-Ayland
  9 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2018-12-17 17:39 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel, qemu-ppc, david, lvivier

On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
> NOTE: there are a lot of instructions that cannot (yet) be optimised to use TCG vector
> operations, however it struck me that there may be some potential for converting
> saturating add/sub and cmp instructions if there were a mechanism to return a set of
> flags indicating the result of the saturation/comparison.

There are also a lot of instructions that can be converted, but aren't:

* vspltis[bhw] can use tcg_gen_gvec_dup{8,16,32}i.

* vsplt{b,h,w} can use tcg_gen_gvec_dup_mem.

  Note that you'll need something like vec_reg_offset from
  target/arm/translate-a64.h to compute the offset of the
  specific byte/word/long from which we are to splat.

* vmr should be handled by having tcg_gen_gvec_or notice aofs == bofs.
  For ARM, we do special case this during translation.
  But since tcg/tcg-op.c does these things for tcg_gen_or_i64,
  we should probably handle the same set of transformations.

* vnot would need to be handled by actually adding a tcg_gen_gvec_nor
  and then also noticing aofs == bofs.

For saturation, I think the easiest thing to do is represent SAT as a
ppc_avr_t.  We notice saturation by also computing normal arithmetic and
comparing to see if they differ.  E.g.

    tcg_gen_gvec_add(vece, offsetof_avr_tmp,
                     offsetof(ra), offsetof(rb), 16, 16);
    tcg_gen_gvec_ssadd(vece, offsetof(rt),
                       offsetof(ra), offsetof(rb), 16, 16);
    tcg_gen_gvec_cmp(TCG_COND_NE, vece, offsetof_avr_tmp,
                     offsetof_avr_tmp, offsetof(rt), 16, 16);
    tcg_gen_gvec_or(vece, offsetof_avr_sat, offsetof_avr_sat,
                    offsetof_avr_tmp, 16, 16);

You only need to convert the ppc_avr_t to a single bit when reading VSCR.

For comparisons... that's tricky.  I wonder if there's anything better than

    tcg_gen_gvec_cmp(TCG_COND_FOO, vece, offsetof(rt),
                     offsetof(ra), offsetof(rb), 16, 16);
    if (rc) {
        TCGv_i64 hi, lo, t, f;

        tcg_gen_ld_i64(hi, cpu_env, offsetof(rt));
        tcg_gen_ld_i64(lo, cpu_env, offsetof(rt) + 8);

        tcg_gen_and_i64(t, hi, lo);
        tcg_gen_or_i64(f, hi, lo);
        tcg_gen_setcondi_i64(TCG_COND_EQ, t, t, -1);
        tcg_gen_setcondi_i64(TCG_COND_EQ, f, f, 0);

        // truncate to i32, shift, or, and set to cr6.
    }


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations
  2018-12-17 17:39 ` [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG " Richard Henderson
@ 2018-12-17 18:49   ` Mark Cave-Ayland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Cave-Ayland @ 2018-12-17 18:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc, david, lvivier

On 17/12/2018 17:39, Richard Henderson wrote:

> On 12/17/18 4:23 AM, Mark Cave-Ayland wrote:
>> NOTE: there are a lot of instructions that cannot (yet) be optimised to use TCG vector
>> operations, however it struck me that there may be some potential for converting
>> saturating add/sub and cmp instructions if there were a mechanism to return a set of
>> flags indicating the result of the saturation/comparison.
> 
> There are also a lot of instructions that can be converted, but aren't:
> 
> * vspltis[bhw] can use tcg_gen_gvec_dup{8,16,32}i.
> 
> * vsplt{b,h,w} can use tcg_gen_gvec_dup_mem.
> 
>   Note that you'll need something like vec_reg_offset from
>   target/arm/translate-a64.h to compute the offset of the
>   specific byte/word/long from which we are to splat.

Oh okay, thanks for the hints - I remember thinking that I couldn't much with splat,
but I'll go and look again.

> * vmr should be handled by having tcg_gen_gvec_or notice aofs == bofs.
>   For ARM, we do special case this during translation.
>   But since tcg/tcg-op.c does these things for tcg_gen_or_i64,
>   we should probably handle the same set of transformations.
> 
> * vnot would need to be handled by actually adding a tcg_gen_gvec_nor
>   and then also noticing aofs == bofs.

And I'll revisit these ones too.

> For saturation, I think the easiest thing to do is represent SAT as a
> ppc_avr_t.  We notice saturation by also computing normal arithmetic and
> comparing to see if they differ.  E.g.
> 
>     tcg_gen_gvec_add(vece, offsetof_avr_tmp,
>                      offsetof(ra), offsetof(rb), 16, 16);
>     tcg_gen_gvec_ssadd(vece, offsetof(rt),
>                        offsetof(ra), offsetof(rb), 16, 16);
>     tcg_gen_gvec_cmp(TCG_COND_NE, vece, offsetof_avr_tmp,
>                      offsetof_avr_tmp, offsetof(rt), 16, 16);
>     tcg_gen_gvec_or(vece, offsetof_avr_sat, offsetof_avr_sat,
>                     offsetof_avr_tmp, 16, 16);
> 
> You only need to convert the ppc_avr_t to a single bit when reading VSCR.

I actually had a PoC that looked somewhat similar to this, except that I abandoned
the idea as I thought that the penalty of doing the add twice (plus comparison) would
slow down everything by several orders of magnitude for backends that didn't support
vector instructions. What's the best way to handle this?

> For comparisons... that's tricky.  I wonder if there's anything better than
> 
>     tcg_gen_gvec_cmp(TCG_COND_FOO, vece, offsetof(rt),
>                      offsetof(ra), offsetof(rb), 16, 16);
>     if (rc) {
>         TCGv_i64 hi, lo, t, f;
> 
>         tcg_gen_ld_i64(hi, cpu_env, offsetof(rt));
>         tcg_gen_ld_i64(lo, cpu_env, offsetof(rt) + 8);
> 
>         tcg_gen_and_i64(t, hi, lo);
>         tcg_gen_or_i64(f, hi, lo);
>         tcg_gen_setcondi_i64(TCG_COND_EQ, t, t, -1);
>         tcg_gen_setcondi_i64(TCG_COND_EQ, f, f, 0);
> 
>         // truncate to i32, shift, or, and set to cr6.
>     }

Certainly I can look at this approach, but again my concern is that we end up heavily
penalising the backends without vector instruction support :/


ATB,

Mark.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-12-17 18:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-17 12:23 [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG vector operations Mark Cave-Ayland
2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 1/9] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access Mark Cave-Ayland
2018-12-17 16:40   ` Richard Henderson
2018-12-17 17:20     ` Mark Cave-Ayland
2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 2/9] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX " Mark Cave-Ayland
2018-12-17 12:23 ` [Qemu-devel] [RFC PATCH v2 3/9] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR " Mark Cave-Ayland
2018-12-17 16:43   ` Richard Henderson
2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 4/9] target/ppc: delay writeback of avr{l, h} during lvx instruction Mark Cave-Ayland
2018-12-17 16:46   ` Richard Henderson
2018-12-17 17:23     ` Mark Cave-Ayland
2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 5/9] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env Mark Cave-Ayland
2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 6/9] target/ppc: merge ppc_vsr_t and ppc_avr_t union types Mark Cave-Ayland
2018-12-17 16:47   ` Richard Henderson
2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 7/9] target/ppc: move FP and VMX registers into aligned vsr register array Mark Cave-Ayland
2018-12-17 16:58   ` Richard Henderson
2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 8/9] target/ppc: convert VMX logical instructions to use vector operations Mark Cave-Ayland
2018-12-17 12:24 ` [Qemu-devel] [RFC PATCH v2 9/9] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over " Mark Cave-Ayland
2018-12-17 17:39 ` [Qemu-devel] [RFC PATCH v2 0/9] target/ppc: convert VMX instructions to use TCG " Richard Henderson
2018-12-17 18:49   ` Mark Cave-Ayland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).