qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4
@ 2016-09-28 18:41 Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 1/9] target-ppc: Implement mfvsrld instruction Nikunj A Dadhania
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

This series contains 7 new instructions for POWER9 ISA3.0
Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.

GCC was adding epilogue for every VSX instructions causing change in 
behaviour. For testing the load vector instructions used mfvsrld/mfvsrd 
for loading vsr to register. And for testing store vector, used mtvsrdd 
instructions. This helped in getting rid of the epilogue added by gcc.

Patches:
    01:  mfvsrld: Move From VSR Lower Doubleword
    02:  mtvsrdd: Move To VSR Double Doubleword
    03:  mtvsrws: Move To VSR Word & Splat
    05:  lxvw4x: improve implementation
    05:  stxv4x: improve implementation
    06:  lxvh8x: Load VSX Vector Halfword*8
    07:  stxvh8x: Store VSX Vector Halfword*8
    08:  lxvb16x: Load VSX Vector Byte*16
    09:  stxvb16x: Store VSX Vector Byte*16

Changelog:
v4:
* Added gen_bswap16x8 inline for lxvh8x and stxvh8x in tcg
* Dropped helper_bswap16x4
* Use temporaries in stxvh8x and not clobber the register

v3:
* Added 3 new VSR instructions.
* Fixed all the vector load/store instructions for BE/LE.
* Added detailed commit messages to patches.
* Dropped deposit32x2 and implemented it using tcg ops

v2: 
* Fix lxvw4x/stxv4x translation as LE/BE were both similar 
  one in tcg and other as helper
* Rename bswap32x2 to deposit32x2 as it does not need to 
  swap content(32bit)
* stxvh8x had a bug as David suggested.

v1: 
* More load/store cleanups in byte reverse routines
* ld64/st64 converted to newer macro and updated call sites
* Cleanup load with reservation and store conditional
* Return invalid random for darn instruction

v0:
* darn - read /dev/random to get the random number
* xxspltib - make is PPC64 only
* Consolidate load/store operations and use macros to generate qemu_st/ld
* Simplify load/store vsx endian manipulation

Nikunj A Dadhania (6):
  target-ppc: improve lxvw4x implementation
  target-ppc: improve stxvw4x implementation
  target-ppc: add lxvh8x instruction
  target-ppc: add stxvh8x instruction
  target-ppc: add lxvb16x instruction
  target-ppc: add stxvb16x instruction

Ravi Bangoria (3):
  target-ppc: Implement mfvsrld instruction
  target-ppc: Implement mtvsrdd instruction
  target-ppc: Implement mtvsrws instruction

 target-ppc/translate/vsx-impl.inc.c | 238 ++++++++++++++++++++++++++++++++----
 target-ppc/translate/vsx-ops.inc.c  |   7 ++
 2 files changed, 221 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 1/9] target-ppc: Implement mfvsrld instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 2/9] target-ppc: Implement mtvsrdd instruction Nikunj A Dadhania
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh, Ravi Bangoria

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

mfvsrld: Move From VSR Lower Doubleword

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 17 +++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 18 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index eee6052..b669e8c 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -217,6 +217,23 @@ static void gen_##name(DisasContext *ctx)                       \
 MV_VSRD(mfvsrd, cpu_gpr[rA(ctx->opcode)], cpu_vsrh(xS(ctx->opcode)))
 MV_VSRD(mtvsrd, cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)])
 
+static void gen_mfvsrld(DisasContext *ctx)
+{
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->vsx_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VSXU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+
+    tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_vsrl(xS(ctx->opcode)));
+}
+
 #endif
 
 static void gen_xxpermdi(DisasContext *ctx)
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 414b73b..3b296f8 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -22,6 +22,7 @@ GEN_HANDLER_E(mtvsrwz, 0x1F, 0x13, 0x07, 0x0000F800, PPC_NONE, PPC2_VSX207),
 #if defined(TARGET_PPC64)
 GEN_HANDLER_E(mfvsrd, 0x1F, 0x13, 0x01, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrd, 0x1F, 0x13, 0x05, 0x0000F800, PPC_NONE, PPC2_VSX207),
+GEN_HANDLER_E(mfvsrld, 0X1F, 0x13, 0x09, 0x0000F800, PPC_NONE, PPC2_ISA300),
 #endif
 
 #define GEN_XX1FORM(name, opc2, opc3, fl2)                              \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 2/9] target-ppc: Implement mtvsrdd instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 1/9] target-ppc: Implement mfvsrld instruction Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction Nikunj A Dadhania
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh, Ravi Bangoria

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

mtvsrdd: Move To VSR Double Doubleword

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 23 +++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 24 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index b669e8c..c4c50dd 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -234,6 +234,29 @@ static void gen_mfvsrld(DisasContext *ctx)
     tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_vsrl(xS(ctx->opcode)));
 }
 
+static void gen_mtvsrdd(DisasContext *ctx)
+{
+    if (xT(ctx->opcode) < 32) {
+        if (unlikely(!ctx->vsx_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VSXU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+
+    if (!rA(ctx->opcode)) {
+        tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), 0);
+    } else {
+        tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_gpr[rA(ctx->opcode)]);
+    }
+
+    tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rB(ctx->opcode)]);
+}
+
 #endif
 
 static void gen_xxpermdi(DisasContext *ctx)
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 3b296f8..1287973 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -23,6 +23,7 @@ GEN_HANDLER_E(mtvsrwz, 0x1F, 0x13, 0x07, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mfvsrd, 0x1F, 0x13, 0x01, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrd, 0x1F, 0x13, 0x05, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mfvsrld, 0X1F, 0x13, 0x09, 0x0000F800, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(mtvsrdd, 0X1F, 0x13, 0x0D, 0x0, PPC_NONE, PPC2_ISA300),
 #endif
 
 #define GEN_XX1FORM(name, opc2, opc3, fl2)                              \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 1/9] target-ppc: Implement mfvsrld instruction Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 2/9] target-ppc: Implement mtvsrdd instruction Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 20:21   ` Richard Henderson
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 4/9] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh, Ravi Bangoria

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

mtvsrws: Move To VSR Word & Splat

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 23 +++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 24 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index c4c50dd..fa8240f 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -257,6 +257,29 @@ static void gen_mtvsrdd(DisasContext *ctx)
     tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_gpr[rB(ctx->opcode)]);
 }
 
+static void gen_mtvsrws(DisasContext *ctx)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+
+    if (xT(ctx->opcode) < 32) {
+        if (unlikely(!ctx->vsx_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VSXU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    }
+
+    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
+    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), t0, t0, 32, 32);
+    tcg_gen_mov_i64(cpu_vsrh(xT(ctx->opcode)), cpu_vsrl(xT(ctx->opcode)));
+
+    tcg_temp_free_i64(t0);
+}
+
 #endif
 
 static void gen_xxpermdi(DisasContext *ctx)
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 1287973..d5f5b87 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -24,6 +24,7 @@ GEN_HANDLER_E(mfvsrd, 0x1F, 0x13, 0x01, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrd, 0x1F, 0x13, 0x05, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mfvsrld, 0X1F, 0x13, 0x09, 0x0000F800, PPC_NONE, PPC2_ISA300),
 GEN_HANDLER_E(mtvsrdd, 0X1F, 0x13, 0x0D, 0x0, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(mtvsrws, 0x1F, 0x13, 0x0C, 0x0000F800, PPC_NONE, PPC2_ISA300),
 #endif
 
 #define GEN_XX1FORM(name, opc2, opc3, fl2)                              \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 4/9] target-ppc: improve lxvw4x implementation
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 5/9] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Load 8byte at a time and manipulate.

Big-Endian Storage
+-------------+-------------+-------------+-------------+
| 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
+-------------+-------------+-------------+-------------+

Little-Endian Storage
+-------------+-------------+-------------+-------------+
| 33 22 11 00 | 77 66 55 44 | BB AA 99 88 | FF EE DD CC |
+-------------+-------------+-------------+-------------+

Vector load results in:
+-------------+-------------+-------------+-------------+
| 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
+-------------+-------------+-------------+-------------+

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index fa8240f..3bc3f6f 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
 static void gen_lxvw4x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 tmp;
     TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
     TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
     if (unlikely(!ctx->vsx_enabled)) {
@@ -84,22 +83,27 @@ static void gen_lxvw4x(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
-    tmp = tcg_temp_new_i64();
 
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld32u_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, xth, EA);
-    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
-
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, xtl, EA);
-    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
-
+    if (ctx->le_mode) {
+        TCGv_i64 t0 = tcg_temp_new_i64();
+        TCGv_i64 t1 = tcg_temp_new_i64();
+
+        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_shri_i64(t1, t0, 32);
+        tcg_gen_deposit_i64(xth, t1, t0, 32, 32);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_shri_i64(t1, t0, 32);
+        tcg_gen_deposit_i64(xtl, t1, t0, 32, 32);
+        tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+    } else {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+    }
     tcg_temp_free(EA);
-    tcg_temp_free_i64(tmp);
 }
 
 #define VSX_STORE_SCALAR(name, operation)                     \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 5/9] target-ppc: improve stxvw4x implementation
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (3 preceding siblings ...)
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 4/9] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 6/9] target-ppc: add lxvh8x instruction Nikunj A Dadhania
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Manipulate data and store 8bytes instead of 4bytes.

Vector:
+-------------+-------------+-------------+-------------+
| 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
+-------------+-------------+-------------+-------------+

Store results in following:

Big-Endian Storage
+-------------+-------------+-------------+-------------+
| 00 11 22 33 | 44 55 66 77 | 88 99 AA BB | CC DD EE FF |
+-------------+-------------+-------------+-------------+

Little-Endian Storage
+-------------+-------------+-------------+-------------+
| 33 22 11 00 | 77 66 55 44 | BB AA 99 88 | FF EE DD CC |
+-------------+-------------+-------------+-------------+

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 3bc3f6f..dbe483f 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -146,7 +146,8 @@ static void gen_stxvd2x(DisasContext *ctx)
 
 static void gen_stxvw4x(DisasContext *ctx)
 {
-    TCGv_i64 tmp;
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -155,21 +156,25 @@ static void gen_stxvw4x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    tmp = tcg_temp_new_i64();
-
-    tcg_gen_shri_i64(tmp, cpu_vsrh(xS(ctx->opcode)), 32);
-    gen_qemu_st32_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
-
-    tcg_gen_shri_i64(tmp, cpu_vsrl(xS(ctx->opcode)), 32);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
+    if (ctx->le_mode) {
+        TCGv_i64 t0 = tcg_temp_new_i64();
+        TCGv_i64 t1 = tcg_temp_new_i64();
 
+        tcg_gen_shri_i64(t0, xsh, 32);
+        tcg_gen_deposit_i64(t1, t0, xsh, 32, 32);
+        tcg_gen_qemu_st_i64(t1, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_shri_i64(t0, xsl, 32);
+        tcg_gen_deposit_i64(t1, t0, xsl, 32, 32);
+        tcg_gen_qemu_st_i64(t1, EA, ctx->mem_idx, MO_LEQ);
+        tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+    } else {
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    }
     tcg_temp_free(EA);
-    tcg_temp_free_i64(tmp);
 }
 
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 6/9] target-ppc: add lxvh8x instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (4 preceding siblings ...)
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 5/9] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 20:22   ` Richard Henderson
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 7/9] target-ppc: add stxvh8x instruction Nikunj A Dadhania
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

lxvh8x:  Load VSX Vector Halfword*8

Big-Endian Storage
+-------+-------+-------+-------+-------+-------+-------+-------+
| 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
+-------+-------+-------+-------+-------+-------+-------+-------+

Little-Endian Storage
+-------+-------+-------+-------+-------+-------+-------+-------+
| 01 00 | 11 10 | 21 20 | 31 30 | 41 40 | 51 50 | 61 60 | 71 70 |
+-------+-------+-------+-------+-------+-------+-------+-------+

Vector load results in:
+-------+-------+-------+-------+-------+-------+-------+-------+
| 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
+-------+-------+-------+-------+-------+-------+-------+-------+

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vsx-impl.inc.c | 49 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 50 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index dbe483f..25b5ce4 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -106,6 +106,55 @@ static void gen_lxvw4x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_bswap16x8(TCGv_i64 outh, TCGv_i64 outl,
+                          TCGv_i64 inh, TCGv_i64 inl)
+{
+    TCGv_i64 mask = tcg_const_i64(0x00FF00FF00FF00FF);
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    /* outh = ((inh & mask) << 8) | ((inh >> 8) & mask) */
+    tcg_gen_and_i64(t0, inh, mask);
+    tcg_gen_shli_i64(t0, t0, 8);
+    tcg_gen_shri_i64(t1, inh, 8);
+    tcg_gen_and_i64(t1, t1, mask);
+    tcg_gen_or_i64(outh, t0, t1);
+
+    /* outl = ((inl & mask) << 8) | ((inl >> 8) & mask) */
+    tcg_gen_and_i64(t0, inl, mask);
+    tcg_gen_shli_i64(t0, t0, 8);
+    tcg_gen_shri_i64(t1, inl, 8);
+    tcg_gen_and_i64(t1, t1, mask);
+    tcg_gen_or_i64(outl, t0, t1);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(mask);
+}
+
+static void gen_lxvh8x(DisasContext *ctx)
+{
+    TCGv EA;
+    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+    tcg_gen_addi_tl(EA, EA, 8);
+    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+    if (ctx->le_mode) {
+        gen_bswap16x8(xth, xtl, xth, xtl);
+    }
+    tcg_temp_free(EA);
+}
+
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index d5f5b87..c52e6ff 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -7,6 +7,7 @@ GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 7/9] target-ppc: add stxvh8x instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (5 preceding siblings ...)
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 6/9] target-ppc: add lxvh8x instruction Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 20:23   ` Richard Henderson
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 8/9] target-ppc: add lxvb16x instruction Nikunj A Dadhania
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

stxvh8x:  Store VSX Vector Halfword*8

Vector:
+-------+-------+-------+-------+-------+-------+-------+-------+
| 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
+-------+-------+-------+-------+-------+-------+-------+-------+

Store results in following:

Big-Endian Storage
+-------+-------+-------+-------+-------+-------+-------+-------+
| 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
+-------+-------+-------+-------+-------+-------+-------+-------+

Little-Endian Storage
+-------+-------+-------+-------+-------+-------+-------+-------+
| 01 00 | 11 10 | 21 20 | 31 30 | 41 40 | 51 50 | 61 60 | 71 70 |
+-------+-------+-------+-------+-------+-------+-------+-------+

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vsx-impl.inc.c | 31 +++++++++++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 32 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 25b5ce4..e762c0a 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -226,6 +226,37 @@ static void gen_stxvw4x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_stxvh8x(DisasContext *ctx)
+{
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
+    TCGv EA;
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    if (ctx->le_mode) {
+        TCGv_i64 outh = tcg_temp_new_i64();
+        TCGv_i64 outl = tcg_temp_new_i64();
+
+        gen_bswap16x8(outh, outl, xsh, xsl);
+        tcg_gen_qemu_st_i64(outh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(outl, EA, ctx->mem_idx, MO_BEQ);
+        tcg_temp_free_i64(outh);
+        tcg_temp_free_i64(outl);
+    } else {
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    }
+    tcg_temp_free(EA);
+}
+
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
 static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index c52e6ff..17975ec 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -16,6 +16,7 @@ GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
 
 GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 8/9] target-ppc: add lxvb16x instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (6 preceding siblings ...)
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 7/9] target-ppc: add stxvh8x instruction Nikunj A Dadhania
@ 2016-09-28 18:41 ` Nikunj A Dadhania
  2016-09-28 18:42 ` [Qemu-devel] [PATCH v5 9/9] target-ppc: add stxvb16x instruction Nikunj A Dadhania
  2016-09-29  1:51 ` [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 David Gibson
  9 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

lxvb16x: Load VSX Vector Byte*16

Little/Big-endian Storage
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|F0|F1|F2|F3|F4|F5|F6|F7|E0|E1|E2|E3|E4|E5|E6|E7|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Vector load results in:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|F0|F1|F2|F3|F4|F5|F6|F7|E0|E1|E2|E3|E4|E5|E6|E7|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 19 +++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 20 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index e762c0a..40fba6e 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -155,6 +155,25 @@ static void gen_lxvh8x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_lxvb16x(DisasContext *ctx)
+{
+    TCGv EA;
+    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+    tcg_gen_addi_tl(EA, EA, 8);
+    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+    tcg_temp_free(EA);
+}
+
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 17975ec..3274859 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -8,6 +8,7 @@ GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
+GEN_HANDLER_E(lxvb16x, 0x1F, 0x0C, 0x1B, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH v5 9/9] target-ppc: add stxvb16x instruction
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (7 preceding siblings ...)
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 8/9] target-ppc: add lxvb16x instruction Nikunj A Dadhania
@ 2016-09-28 18:42 ` Nikunj A Dadhania
  2016-09-29  1:51 ` [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 David Gibson
  9 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-28 18:42 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

stxvb16x: Store VSX Vector Byte*16

Vector:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|F0|F1|F2|F3|F4|F5|F6|F7|E0|E1|E2|E3|E4|E5|E6|E7|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Store results in following:

Little/Big-endian Storage
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|F0|F1|F2|F3|F4|F5|F6|F7|E0|E1|E2|E3|E4|E5|E6|E7|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate/vsx-impl.inc.c | 19 +++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  1 +
 2 files changed, 20 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 40fba6e..01f2157 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -276,6 +276,25 @@ static void gen_stxvh8x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_stxvb16x(DisasContext *ctx)
+{
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
+    TCGv EA;
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+    tcg_gen_addi_tl(EA, EA, 8);
+    tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    tcg_temp_free(EA);
+}
+
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
 static void gen_##name(DisasContext *ctx)                       \
 {                                                               \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 3274859..10eb4b9 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -18,6 +18,7 @@ GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
+GEN_HANDLER_E(stxvb16x, 0x1F, 0x0C, 0x1F, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction Nikunj A Dadhania
@ 2016-09-28 20:21   ` Richard Henderson
  2016-09-29  1:53     ` David Gibson
  2016-09-29  2:19     ` Nikunj A Dadhania
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Henderson @ 2016-09-28 20:21 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, benh, Ravi Bangoria

On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
> +    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
> +    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), t0, t0, 32, 32);

Why are you using t0?


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/9] target-ppc: add lxvh8x instruction
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 6/9] target-ppc: add lxvh8x instruction Nikunj A Dadhania
@ 2016-09-28 20:22   ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2016-09-28 20:22 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, benh

On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
> lxvh8x:  Load VSX Vector Halfword*8
> 
> Big-Endian Storage
> +-------+-------+-------+-------+-------+-------+-------+-------+
> | 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
> +-------+-------+-------+-------+-------+-------+-------+-------+
> 
> Little-Endian Storage
> +-------+-------+-------+-------+-------+-------+-------+-------+
> | 01 00 | 11 10 | 21 20 | 31 30 | 41 40 | 51 50 | 61 60 | 71 70 |
> +-------+-------+-------+-------+-------+-------+-------+-------+
> 
> Vector load results in:
> +-------+-------+-------+-------+-------+-------+-------+-------+
> | 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
> +-------+-------+-------+-------+-------+-------+-------+-------+
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate/vsx-impl.inc.c | 49 +++++++++++++++++++++++++++++++++++++
>  target-ppc/translate/vsx-ops.inc.c  |  1 +
>  2 files changed, 50 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 7/9] target-ppc: add stxvh8x instruction
  2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 7/9] target-ppc: add stxvh8x instruction Nikunj A Dadhania
@ 2016-09-28 20:23   ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2016-09-28 20:23 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, benh

On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
> stxvh8x:  Store VSX Vector Halfword*8
> 
> Vector:
> +-------+-------+-------+-------+-------+-------+-------+-------+
> | 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
> +-------+-------+-------+-------+-------+-------+-------+-------+
> 
> Store results in following:
> 
> Big-Endian Storage
> +-------+-------+-------+-------+-------+-------+-------+-------+
> | 00 01 | 10 11 | 20 21 | 30 31 | 40 41 | 50 51 | 60 61 | 70 71 |
> +-------+-------+-------+-------+-------+-------+-------+-------+
> 
> Little-Endian Storage
> +-------+-------+-------+-------+-------+-------+-------+-------+
> | 01 00 | 11 10 | 21 20 | 31 30 | 41 40 | 51 50 | 61 60 | 71 70 |
> +-------+-------+-------+-------+-------+-------+-------+-------+
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate/vsx-impl.inc.c | 31 +++++++++++++++++++++++++++++++
>  target-ppc/translate/vsx-ops.inc.c  |  1 +
>  2 files changed, 32 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4
  2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (8 preceding siblings ...)
  2016-09-28 18:42 ` [Qemu-devel] [PATCH v5 9/9] target-ppc: add stxvb16x instruction Nikunj A Dadhania
@ 2016-09-29  1:51 ` David Gibson
  9 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-09-29  1:51 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]

On Thu, Sep 29, 2016 at 12:11:51AM +0530, Nikunj A Dadhania wrote:
> This series contains 7 new instructions for POWER9 ISA3.0
> Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.
> 
> GCC was adding epilogue for every VSX instructions causing change in 
> behaviour. For testing the load vector instructions used mfvsrld/mfvsrd 
> for loading vsr to register. And for testing store vector, used mtvsrdd 
> instructions. This helped in getting rid of the epilogue added by gcc.
> 
> Patches:
>     01:  mfvsrld: Move From VSR Lower Doubleword
>     02:  mtvsrdd: Move To VSR Double Doubleword
>     03:  mtvsrws: Move To VSR Word & Splat
>     05:  lxvw4x: improve implementation
>     05:  stxv4x: improve implementation
>     06:  lxvh8x: Load VSX Vector Halfword*8
>     07:  stxvh8x: Store VSX Vector Halfword*8
>     08:  lxvb16x: Load VSX Vector Byte*16
>     09:  stxvb16x: Store VSX Vector Byte*16

I've applied everything that rth reviewed to ppc-for-2.8.

I've tweaked the ascii art diagrams describing the endianness
transformations.  Specifically I removed the within-element spaces for
each element on the vector (not memory) side.  That's to emphasise the
fact that in-register there's no endianness, just numbers.

> 
> Changelog:
> v4:
> * Added gen_bswap16x8 inline for lxvh8x and stxvh8x in tcg
> * Dropped helper_bswap16x4
> * Use temporaries in stxvh8x and not clobber the register
> 
> v3:
> * Added 3 new VSR instructions.
> * Fixed all the vector load/store instructions for BE/LE.
> * Added detailed commit messages to patches.
> * Dropped deposit32x2 and implemented it using tcg ops
> 
> v2: 
> * Fix lxvw4x/stxv4x translation as LE/BE were both similar 
>   one in tcg and other as helper
> * Rename bswap32x2 to deposit32x2 as it does not need to 
>   swap content(32bit)
> * stxvh8x had a bug as David suggested.
> 
> v1: 
> * More load/store cleanups in byte reverse routines
> * ld64/st64 converted to newer macro and updated call sites
> * Cleanup load with reservation and store conditional
> * Return invalid random for darn instruction
> 
> v0:
> * darn - read /dev/random to get the random number
> * xxspltib - make is PPC64 only
> * Consolidate load/store operations and use macros to generate qemu_st/ld
> * Simplify load/store vsx endian manipulation
> 
> Nikunj A Dadhania (6):
>   target-ppc: improve lxvw4x implementation
>   target-ppc: improve stxvw4x implementation
>   target-ppc: add lxvh8x instruction
>   target-ppc: add stxvh8x instruction
>   target-ppc: add lxvb16x instruction
>   target-ppc: add stxvb16x instruction
> 
> Ravi Bangoria (3):
>   target-ppc: Implement mfvsrld instruction
>   target-ppc: Implement mtvsrdd instruction
>   target-ppc: Implement mtvsrws instruction
> 
>  target-ppc/translate/vsx-impl.inc.c | 238 ++++++++++++++++++++++++++++++++----
>  target-ppc/translate/vsx-ops.inc.c  |   7 ++
>  2 files changed, 221 insertions(+), 24 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction
  2016-09-28 20:21   ` Richard Henderson
@ 2016-09-29  1:53     ` David Gibson
  2016-09-29  4:07       ` Richard Henderson
  2016-09-29  2:19     ` Nikunj A Dadhania
  1 sibling, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-09-29  1:53 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Nikunj A Dadhania, qemu-ppc, qemu-devel, benh, Ravi Bangoria

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

On Wed, Sep 28, 2016 at 01:21:00PM -0700, Richard Henderson wrote:
> On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
> > +    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
> > +    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), t0, t0, 32, 32);
> 
> Why are you using t0?

Richard, I don't quite understand your question.  This looks correct
to me.  It's duplicating the low 32-bits of rA into both the low-and
high 32-bits of t0, which will then be store to both the low and high
64-bit elements of the VSR.  That matches the instruction definition
which puts the low 32-bits of RA into every 32-bit element of the
vector.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction
  2016-09-28 20:21   ` Richard Henderson
  2016-09-29  1:53     ` David Gibson
@ 2016-09-29  2:19     ` Nikunj A Dadhania
  2016-09-29  4:08       ` Richard Henderson
  1 sibling, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2016-09-29  2:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david; +Cc: qemu-devel, benh, Ravi Bangoria

Richard Henderson <rth@twiddle.net> writes:

> On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
>> +    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
>> +    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), t0, t0, 32, 32);
>
> Why are you using t0?

Thought about dropping it, but wasn't sure if deposit_i64 would change it.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction
  2016-09-29  1:53     ` David Gibson
@ 2016-09-29  4:07       ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2016-09-29  4:07 UTC (permalink / raw)
  To: David Gibson; +Cc: Nikunj A Dadhania, qemu-ppc, qemu-devel, benh, Ravi Bangoria

On 09/28/2016 06:53 PM, David Gibson wrote:
> On Wed, Sep 28, 2016 at 01:21:00PM -0700, Richard Henderson wrote:
>> On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
>>> +    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
>>> +    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), t0, t0, 32, 32);
>>
>> Why are you using t0?
>
> Richard, I don't quite understand your question.

There's no need for the copy into t0 -- just put rA into those two arguments.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction
  2016-09-29  2:19     ` Nikunj A Dadhania
@ 2016-09-29  4:08       ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2016-09-29  4:08 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david; +Cc: qemu-devel, benh, Ravi Bangoria

On 09/28/2016 07:19 PM, Nikunj A Dadhania wrote:
> Richard Henderson <rth@twiddle.net> writes:
>
>> On 09/28/2016 11:41 AM, Nikunj A Dadhania wrote:
>>> +    tcg_gen_mov_i64(t0, cpu_gpr[rA(ctx->opcode)]);
>>> +    tcg_gen_deposit_i64(cpu_vsrl(xT(ctx->opcode)), t0, t0, 32, 32);
>>
>> Why are you using t0?
>
> Thought about dropping it, but wasn't sure if deposit_i64 would change it.

Nope, all of the tcg-op.c functions are safe that way, only modifying the outputs.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-09-29  4:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-28 18:41 [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 Nikunj A Dadhania
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 1/9] target-ppc: Implement mfvsrld instruction Nikunj A Dadhania
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 2/9] target-ppc: Implement mtvsrdd instruction Nikunj A Dadhania
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 3/9] target-ppc: Implement mtvsrws instruction Nikunj A Dadhania
2016-09-28 20:21   ` Richard Henderson
2016-09-29  1:53     ` David Gibson
2016-09-29  4:07       ` Richard Henderson
2016-09-29  2:19     ` Nikunj A Dadhania
2016-09-29  4:08       ` Richard Henderson
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 4/9] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 5/9] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 6/9] target-ppc: add lxvh8x instruction Nikunj A Dadhania
2016-09-28 20:22   ` Richard Henderson
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 7/9] target-ppc: add stxvh8x instruction Nikunj A Dadhania
2016-09-28 20:23   ` Richard Henderson
2016-09-28 18:41 ` [Qemu-devel] [PATCH v5 8/9] target-ppc: add lxvb16x instruction Nikunj A Dadhania
2016-09-28 18:42 ` [Qemu-devel] [PATCH v5 9/9] target-ppc: add stxvb16x instruction Nikunj A Dadhania
2016-09-29  1:51 ` [Qemu-devel] [PATCH v5 0/9] POWER9 TCG enablements - part4 David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).