qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] target/ppc: Update vector insns to use 128 bit
@ 2024-06-21 11:46 Chinmay Rath
  2024-06-21 11:46 ` [PATCH 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc Chinmay Rath
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chinmay Rath @ 2024-06-21 11:46 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, npiggin, danielhb413, richard.henderson, harshpb

Updating a bunch of VMX and VSX storage access instructions to use
tcg_gen_qemu_ld/st_i128 instead of using tcg_gen_qemu_ld/st_i64 in
succession; as suggested by Richard, in my decodetree patches.
Plus some minor clean-ups to facilitate the above in case of VMX insns.

Chinmay Rath (3):
  target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc.
  target/ppc: Update VMX storage access insns to use
    tcg_gen_qemu_ld/st_i128.
  target/ppc : Update VSX storage access insns to use tcg_gen_qemu
    _ld/st_i128.

 target/ppc/translate.c              | 10 -----
 target/ppc/translate/vmx-impl.c.inc | 50 +++++++++++----------
 target/ppc/translate/vsx-impl.c.inc | 68 ++++++++++++-----------------
 3 files changed, 57 insertions(+), 71 deletions(-)

-- 
2.39.3



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc.
  2024-06-21 11:46 [PATCH 0/3] target/ppc: Update vector insns to use 128 bit Chinmay Rath
@ 2024-06-21 11:46 ` Chinmay Rath
  2024-06-21 11:46 ` [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128 Chinmay Rath
  2024-06-21 11:46 ` [PATCH 3/3] target/ppc : Update VSX storage access insns to use tcg_gen_qemu _ld/st_i128 Chinmay Rath
  2 siblings, 0 replies; 6+ messages in thread
From: Chinmay Rath @ 2024-06-21 11:46 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, npiggin, danielhb413, richard.henderson, harshpb

Those functions are used to ld/st data to and from Altivec registers,
in 64 bits chunks, and are only used in vmx-impl.c.inc file,
hence the clean-up movement.

Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
---
 target/ppc/translate.c              | 10 ----------
 target/ppc/translate/vmx-impl.c.inc | 10 ++++++++++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index ad512e1922..f7f2c2db9e 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6200,16 +6200,6 @@ static inline void set_fpr(int regno, TCGv_i64 src)
     tcg_gen_st_i64(tcg_constant_i64(0), tcg_env, vsr64_offset(regno, false));
 }
 
-static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
-{
-    tcg_gen_ld_i64(dst, tcg_env, avr64_offset(regno, high));
-}
-
-static inline void set_avr64(int regno, TCGv_i64 src, bool high)
-{
-    tcg_gen_st_i64(src, tcg_env, avr64_offset(regno, high));
-}
-
 /*
  * Helpers for decodetree used by !function for decoding arguments.
  */
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 152bcde0e3..a182d2cf81 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -14,6 +14,16 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
     return r;
 }
 
+static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
+{
+    tcg_gen_ld_i64(dst, tcg_env, avr64_offset(regno, high));
+}
+
+static inline void set_avr64(int regno, TCGv_i64 src, bool high)
+{
+    tcg_gen_st_i64(src, tcg_env, avr64_offset(regno, high));
+}
+
 static bool trans_LVX(DisasContext *ctx, arg_X *a)
 {
     TCGv EA;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128.
  2024-06-21 11:46 [PATCH 0/3] target/ppc: Update vector insns to use 128 bit Chinmay Rath
  2024-06-21 11:46 ` [PATCH 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc Chinmay Rath
@ 2024-06-21 11:46 ` Chinmay Rath
  2024-06-21 16:34   ` Richard Henderson
  2024-06-21 11:46 ` [PATCH 3/3] target/ppc : Update VSX storage access insns to use tcg_gen_qemu _ld/st_i128 Chinmay Rath
  2 siblings, 1 reply; 6+ messages in thread
From: Chinmay Rath @ 2024-06-21 11:46 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, npiggin, danielhb413, richard.henderson, harshpb

Updated instructions {l, st}vx to use tcg_gen_qemu_ld/st_i128,
instead of using 64 bits loads/stores in succession.
Introduced functions {get, set}_avr_full in vmx-impl.c.inc to
facilitate the above, and potential future usage.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
---
 target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++----------------
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index a182d2cf81..47f6952d69 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -24,25 +24,28 @@ static inline void set_avr64(int regno, TCGv_i64 src, bool high)
     tcg_gen_st_i64(src, tcg_env, avr64_offset(regno, high));
 }
 
+static inline void get_avr_full(TCGv_i128 dst, int regno)
+{
+    tcg_gen_ld_i128(dst, tcg_env, avr_full_offset(regno));
+}
+
+static inline void set_avr_full(int regno, TCGv_i128 src)
+{
+    tcg_gen_st_i128(src, tcg_env, avr_full_offset(regno));
+}
+
 static bool trans_LVX(DisasContext *ctx, arg_X *a)
 {
     TCGv EA;
-    TCGv_i64 avr;
+    TCGv_i128 avr;
     REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
     REQUIRE_VECTOR(ctx);
     gen_set_access_type(ctx, ACCESS_INT);
-    avr = tcg_temp_new_i64();
+    avr = tcg_temp_new_i128();
     EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
     tcg_gen_andi_tl(EA, EA, ~0xf);
-    /*
-     * We only need to swap high and low halves. gen_qemu_ld64_i64
-     * does necessary 64-bit byteswap already.
-     */
-    gen_qemu_ld64_i64(ctx, avr, EA);
-    set_avr64(a->rt, avr, !ctx->le_mode);
-    tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_ld64_i64(ctx, avr, EA);
-    set_avr64(a->rt, avr, ctx->le_mode);
+    tcg_gen_qemu_ld_i128(avr, EA, ctx->mem_idx, DEF_MEMOP(MO_128));
+    set_avr_full(a->rt, avr);
     return true;
 }
 
@@ -56,22 +59,15 @@ static bool trans_LVXL(DisasContext *ctx, arg_LVXL *a)
 static bool trans_STVX(DisasContext *ctx, arg_STVX *a)
 {
     TCGv EA;
-    TCGv_i64 avr;
+    TCGv_i128 avr;
     REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
     REQUIRE_VECTOR(ctx);
     gen_set_access_type(ctx, ACCESS_INT);
-    avr = tcg_temp_new_i64();
+    avr = tcg_temp_new_i128();
     EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
     tcg_gen_andi_tl(EA, EA, ~0xf);
-    /*
-     * We only need to swap high and low halves. gen_qemu_st64_i64
-     * does necessary 64-bit byteswap already.
-     */
-    get_avr64(avr, a->rt, !ctx->le_mode);
-    gen_qemu_st64_i64(ctx, avr, EA);
-    tcg_gen_addi_tl(EA, EA, 8);
-    get_avr64(avr, a->rt, ctx->le_mode);
-    gen_qemu_st64_i64(ctx, avr, EA);
+    get_avr_full(avr, a->rt);
+    tcg_gen_qemu_st_i128(avr, EA, ctx->mem_idx, DEF_MEMOP(MO_128));
     return true;
 }
 
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] target/ppc : Update VSX storage access insns to use tcg_gen_qemu _ld/st_i128.
  2024-06-21 11:46 [PATCH 0/3] target/ppc: Update vector insns to use 128 bit Chinmay Rath
  2024-06-21 11:46 ` [PATCH 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc Chinmay Rath
  2024-06-21 11:46 ` [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128 Chinmay Rath
@ 2024-06-21 11:46 ` Chinmay Rath
  2 siblings, 0 replies; 6+ messages in thread
From: Chinmay Rath @ 2024-06-21 11:46 UTC (permalink / raw)
  To: qemu-ppc; +Cc: qemu-devel, npiggin, danielhb413, richard.henderson, harshpb

Updated many VSX instructions to use tcg_gen_qemu_ld/st_i128, instead of using
tcg_gen_qemu_ld/st_i64 consecutively.
Introduced functions {get,set}_vsr_full to facilitate the above & for future use.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
---
 target/ppc/translate/vsx-impl.c.inc | 68 ++++++++++++-----------------
 1 file changed, 29 insertions(+), 39 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 26ebf3fedf..a42fbf7c12 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -10,6 +10,16 @@ static inline void set_cpu_vsr(int n, TCGv_i64 src, bool high)
     tcg_gen_st_i64(src, tcg_env, vsr64_offset(n, high));
 }
 
+static inline void get_vsr_full(TCGv_i128 dst, int reg)
+{
+    tcg_gen_ld_i128(dst, tcg_env, vsr_full_offset(reg));
+}
+
+static inline void set_vsr_full(int reg, TCGv_i128 src)
+{
+    tcg_gen_st_i128(src, tcg_env, vsr_full_offset(reg));
+}
+
 static inline TCGv_ptr gen_vsr_ptr(int reg)
 {
     TCGv_ptr r = tcg_temp_new_ptr();
@@ -196,20 +206,16 @@ static bool trans_LXVH8X(DisasContext *ctx, arg_LXVH8X *a)
 static bool trans_LXVB16X(DisasContext *ctx, arg_LXVB16X *a)
 {
     TCGv EA;
-    TCGv_i64 xth, xtl;
+    TCGv_i128 data;
 
     REQUIRE_VSX(ctx);
     REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 
-    xth = tcg_temp_new_i64();
-    xtl = tcg_temp_new_i64();
+    data = tcg_temp_new_i128();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
-    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEUQ);
-    tcg_gen_addi_tl(EA, EA, 8);
-    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEUQ);
-    set_cpu_vsr(a->rt, xth, true);
-    set_cpu_vsr(a->rt, xtl, false);
+    tcg_gen_qemu_ld_i128(data, EA, ctx->mem_idx, MO_BE | MO_128);
+    set_vsr_full(a->rt, data);
     return true;
 }
 
@@ -385,20 +391,16 @@ static bool trans_STXVH8X(DisasContext *ctx, arg_STXVH8X *a)
 static bool trans_STXVB16X(DisasContext *ctx, arg_STXVB16X *a)
 {
     TCGv EA;
-    TCGv_i64 xsh, xsl;
+    TCGv_i128 data;
 
     REQUIRE_VSX(ctx);
     REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 
-    xsh = tcg_temp_new_i64();
-    xsl = tcg_temp_new_i64();
-    get_cpu_vsr(xsh, a->rt, true);
-    get_cpu_vsr(xsl, a->rt, false);
+    data = tcg_temp_new_i128();
     gen_set_access_type(ctx, ACCESS_INT);
     EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
-    tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEUQ);
-    tcg_gen_addi_tl(EA, EA, 8);
-    tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEUQ);
+    get_vsr_full(data, a->rt);
+    tcg_gen_qemu_st_i128(data, EA, ctx->mem_idx, MO_BE | MO_128);
     return true;
 }
 
@@ -2175,13 +2177,13 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv displ,
                      int rt, bool store, bool paired)
 {
     TCGv ea;
-    TCGv_i64 xt;
+    TCGv_i128 data;
     MemOp mop;
     int rt1, rt2;
 
-    xt = tcg_temp_new_i64();
+    data = tcg_temp_new_i128();
 
-    mop = DEF_MEMOP(MO_UQ);
+    mop = DEF_MEMOP(MO_128);
 
     gen_set_access_type(ctx, ACCESS_INT);
     ea = do_ea_calc(ctx, ra, displ);
@@ -2195,32 +2197,20 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv displ,
     }
 
     if (store) {
-        get_cpu_vsr(xt, rt1, !ctx->le_mode);
-        tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
-        gen_addr_add(ctx, ea, ea, 8);
-        get_cpu_vsr(xt, rt1, ctx->le_mode);
-        tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+        get_vsr_full(data, rt1);
+        tcg_gen_qemu_st_i128(data, ea, ctx->mem_idx, mop);
         if (paired) {
             gen_addr_add(ctx, ea, ea, 8);
-            get_cpu_vsr(xt, rt2, !ctx->le_mode);
-            tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
-            gen_addr_add(ctx, ea, ea, 8);
-            get_cpu_vsr(xt, rt2, ctx->le_mode);
-            tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+            get_vsr_full(data, rt2);
+            tcg_gen_qemu_st_i128(data, ea, ctx->mem_idx, mop);
         }
     } else {
-        tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-        set_cpu_vsr(rt1, xt, !ctx->le_mode);
-        gen_addr_add(ctx, ea, ea, 8);
-        tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-        set_cpu_vsr(rt1, xt, ctx->le_mode);
+        tcg_gen_qemu_ld_i128(data, ea, ctx->mem_idx, mop);
+        set_vsr_full(rt1, data);
         if (paired) {
             gen_addr_add(ctx, ea, ea, 8);
-            tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-            set_cpu_vsr(rt2, xt, !ctx->le_mode);
-            gen_addr_add(ctx, ea, ea, 8);
-            tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-            set_cpu_vsr(rt2, xt, ctx->le_mode);
+            tcg_gen_qemu_ld_i128(data, ea, ctx->mem_idx, mop);
+            set_vsr_full(rt2, data);
         }
     }
     return true;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128.
  2024-06-21 11:46 ` [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128 Chinmay Rath
@ 2024-06-21 16:34   ` Richard Henderson
  2024-06-21 16:56     ` Richard Henderson
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2024-06-21 16:34 UTC (permalink / raw)
  To: Chinmay Rath, qemu-ppc; +Cc: qemu-devel, npiggin, danielhb413, harshpb

On 6/21/24 04:46, Chinmay Rath wrote:
> +    tcg_gen_qemu_ld_i128(avr, EA, ctx->mem_idx, DEF_MEMOP(MO_128));
> +    set_avr_full(a->rt, avr);

This needs to specify atomicity as well.  This is much more important to for 16 byte 
operations than smaller accesses, as this might require stop-the-world semantics depending 
on the host.

According to section 1.4 Storage Atomicity, we need no more than 8-byte atomicity for 
these vector operations, and then the following the alignment bits down.

So: MO_128 | MO_ATOM_IFALIGN_PAIR,


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128.
  2024-06-21 16:34   ` Richard Henderson
@ 2024-06-21 16:56     ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-06-21 16:56 UTC (permalink / raw)
  To: Chinmay Rath, qemu-ppc; +Cc: qemu-devel, npiggin, danielhb413, harshpb

On 6/21/24 09:34, Richard Henderson wrote:
> On 6/21/24 04:46, Chinmay Rath wrote:
>> +    tcg_gen_qemu_ld_i128(avr, EA, ctx->mem_idx, DEF_MEMOP(MO_128));
>> +    set_avr_full(a->rt, avr);
> 
> This needs to specify atomicity as well.  This is much more important to for 16 byte 
> operations than smaller accesses, as this might require stop-the-world semantics depending 
> on the host.
> 
> According to section 1.4 Storage Atomicity, we need no more than 8-byte atomicity for 
> these vector operations, and then the following the alignment bits down.
> 
> So: MO_128 | MO_ATOM_IFALIGN_PAIR,

Actually, you need MO_ATOM_SUBALIGN semantics, maxing out at MO_64, which hasn't been 
implemented.  But since none of the rest of target/ppc has been updated to use SUBALIGN, 
using IFALIGN is not a regression.


r~



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-21 16:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-21 11:46 [PATCH 0/3] target/ppc: Update vector insns to use 128 bit Chinmay Rath
2024-06-21 11:46 ` [PATCH 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc Chinmay Rath
2024-06-21 11:46 ` [PATCH 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128 Chinmay Rath
2024-06-21 16:34   ` Richard Henderson
2024-06-21 16:56     ` Richard Henderson
2024-06-21 11:46 ` [PATCH 3/3] target/ppc : Update VSX storage access insns to use tcg_gen_qemu _ld/st_i128 Chinmay Rath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).