[Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3
@ 2016-08-11  7:36 Rajalakshmi Srinivasaraghavan
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions Rajalakshmi Srinivasaraghavan
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-11  7:36 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, benh, Rajalakshmi Srinivasaraghavan

This series contains 14 new instructions for POWER9 described in ISA3.0.

Patches:
        01: Adds vector insert instructions.
            vinsertb - Vector Insert Byte
            vinserth - Vector Insert Halfword
            vinsertw - Vector Insert Word
            vinsertd - Vector Insert Doubleword
        02: Adds vector extract instructions.
            vextractub - Vector Extract Unsigned Byte
            vextractuh - Vector Extract Unsigned Halfword
            vextractuw - Vector Extract Unsigned Word
            vextractd - Vector Extract Unsigned Doubleword
        03: Adds vector count trailing zeros instructions.
            vctzb - Vector Count Trailing Zeros Byte
            vctzh - Vector Count Trailing Zeros Halfword
            vctzw - Vector Count Trailing Zeros Word
            vctzd - Vector Count Trailing Zeros Doubleword
        04: Adds vbpermd-vector bit permute doubleword instruction.
        05: Adds vpermr-vector permute right indexed instruction.

Changelog:
v0:
* Rename GEN_VXFORM_300_EXT1 to GEN_VXFORM_300_EO.
* Rename GEN_VXFORM_DUAL1 to GEN_VXFORM_DUAL_INV.
* Remove undef GEN_VXFORM_DUAL1.

v1:
* Correct SPLAT and handle src = dest for vinsert and vextract.
* Correct typecast for vctz.
* Computation of index rearranged for vpermr.
* Assignment of perm moved out of inner loop in vbpermd.

v2:
* Check splat in transate code for vinsert and vextract.
* Use memcpy for vinsert and vextract.
* Handle src = dest for vbpermd.

 target-ppc/helper.h             |   14 +++++
 target-ppc/int_helper.c         |  113 +++++++++++++++++++++++++++++++++++++++
 target-ppc/translate.c          |    2 +
 target-ppc/translate/vmx-impl.c |   80 +++++++++++++++++++++++++++
 target-ppc/translate/vmx-ops.c  |   38 ++++++++++---
 5 files changed, 239 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions
  2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
@ 2016-08-11  7:36 ` Rajalakshmi Srinivasaraghavan
  2016-08-16  4:18   ` David Gibson
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions Rajalakshmi Srinivasaraghavan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-11  7:36 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, benh, Rajalakshmi Srinivasaraghavan

The following vector insert instructions are added from ISA 3.0.

vinsertb - Vector Insert Byte
vinserth - Vector Insert Halfword
vinsertw - Vector Insert Word
vinsertd - Vector Insert Doubleword

Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
---
 target-ppc/helper.h             |    4 ++++
 target-ppc/int_helper.c         |   27 +++++++++++++++++++++++++++
 target-ppc/translate.c          |    2 ++
 target-ppc/translate/vmx-impl.c |   32 ++++++++++++++++++++++++++++++++
 target-ppc/translate/vmx-ops.c  |   18 +++++++++++++-----
 5 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 93ac9e1..0923779 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
 DEF_HELPER_3(vspltb, void, avr, avr, i32)
 DEF_HELPER_3(vsplth, void, avr, avr, i32)
 DEF_HELPER_3(vspltw, void, avr, avr, i32)
+DEF_HELPER_3(vinsertb, void, avr, avr, i32)
+DEF_HELPER_3(vinserth, void, avr, avr, i32)
+DEF_HELPER_3(vinsertw, void, avr, avr, i32)
+DEF_HELPER_3(vinsertd, void, avr, avr, i32)
 DEF_HELPER_2(vupkhpx, void, avr, avr)
 DEF_HELPER_2(vupklpx, void, avr, avr)
 DEF_HELPER_2(vupkhsb, void, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 552b2e0..3f8e439 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1792,6 +1792,33 @@ VSPLT(w, u32)
 #undef VSPLT
 #undef SPLAT_ELEMENT
 #undef _SPLAT_MASKED
+#if defined(HOST_WORDS_BIGENDIAN)
+#define VINSERT(suffix, element, index)                                     \
+    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
+    {                                                                       \
+        ppc_avr_t result;                                                   \
+        result = *r;                                                        \
+        memcpy(&result.u8[splat], &b->element[index],                       \
+               sizeof(result.element[0]));                                  \
+        *r = result;                                                        \
+    }
+#else
+#define VINSERT(suffix, element, index)                                     \
+    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
+    {                                                                       \
+        ppc_avr_t result;                                                   \
+        result = *r;                                                        \
+        uint32_t s = (ARRAY_SIZE(r->element) - index) - 1;                  \
+        uint32_t d = (16 - splat) - sizeof(r->element[0]);                  \
+        memcpy(&result.u8[d], &b->element[s], sizeof(result.element[0]));   \
+        *r = result;                                                        \
+    }
+#endif
+VINSERT(b, u8, 7)
+VINSERT(h, u16, 3)
+VINSERT(w, u32, 1)
+VINSERT(d, u64, 0)
+#undef VINSERT
 
 #define VSPLTI(suffix, element, splat_type)                     \
     void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index fc3d371..dbe952e 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -498,6 +498,8 @@ EXTRACT_HELPER(UIMM, 0, 16);
 EXTRACT_HELPER(SIMM5, 16, 5);
 /* 5 bits signed immediate value */
 EXTRACT_HELPER(UIMM5, 16, 5);
+/* 4 bits unsigned immediate value */
+EXTRACT_HELPER(UIMM4, 16, 4);
 /* Bit count */
 EXTRACT_HELPER(NB, 11, 5);
 /* Shift count */
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index ac78caf..f6a97ac 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -623,13 +623,45 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
         tcg_temp_free_ptr(rd);                                          \
     }
 
+#define GEN_VXFORM_UIMM_SPLAT(name, opc2, opc3, splat_max)              \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+    {                                                                   \
+        TCGv_ptr rb, rd;                                                \
+        uint8_t uimm = UIMM4(ctx->opcode);                              \
+        TCGv_i32 t0 = tcg_temp_new_i32();                               \
+        if (unlikely(!ctx->altivec_enabled)) {                          \
+            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
+            return;                                                     \
+        }                                                               \
+        if (uimm > splat_max) {                                         \
+            uimm = 0;                                                   \
+        }                                                               \
+        tcg_gen_movi_i32(t0, uimm);                                     \
+        rb = gen_avr_ptr(rB(ctx->opcode));                              \
+        rd = gen_avr_ptr(rD(ctx->opcode));                              \
+        gen_helper_##name(rd, rb, t0);                                  \
+        tcg_temp_free_i32(t0);                                          \
+        tcg_temp_free_ptr(rb);                                          \
+        tcg_temp_free_ptr(rd);                                          \
+    }
+
 GEN_VXFORM_UIMM(vspltb, 6, 8);
 GEN_VXFORM_UIMM(vsplth, 6, 9);
 GEN_VXFORM_UIMM(vspltw, 6, 10);
+GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
+GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
+GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
+GEN_VXFORM_UIMM_SPLAT(vinsertd, 6, 15, 8);
 GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
 GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
 GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
 GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
+GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
+                      vinsertb, PPC_NONE, PPC2_ISA300);
+GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
+                      vinserth, PPC_NONE, PPC2_ISA300);
+GEN_VXFORM_DUAL(vspltisw, PPC_NONE, PPC2_ALTIVEC_207,
+                      vinsertw, PPC_NONE, PPC2_ISA300);
 
 static void gen_vsldoi(DisasContext *ctx)
 {
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index 7449396..ca69e56 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -41,6 +41,9 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ALTIVEC_207)
 #define GEN_VXFORM_300(name, opc2, opc3)                                \
 GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
 
+#define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
+GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
+
 #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
 GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
 
@@ -191,11 +194,16 @@ GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM_DUAL(vcmpgtfp, vcmpgtud, 3, 11, PPC_ALTIVEC, PPC_NONE)
 GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
 
-#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
-    GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
-GEN_VXFORM_SIMM(vspltisb, 6, 12),
-GEN_VXFORM_SIMM(vspltish, 6, 13),
-GEN_VXFORM_SIMM(vspltisw, 6, 14),
+#define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
+GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
+                                                               PPC_NONE)
+GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
+                                               PPC2_ALTIVEC_207),
+GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
+                                               PPC2_ALTIVEC_207),
+GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
+                                               PPC2_ALTIVEC_207),
+GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
     GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions Rajalakshmi Srinivasaraghavan
@ 2016-08-16  4:18   ` David Gibson
  2016-08-19  5:46     ` Rajalakshmi Srinivasaraghavan
  0 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2016-08-16  4:18 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh

[-- Attachment #1: Type: text/plain, Size: 9433 bytes --]

On Thu, Aug 11, 2016 at 01:06:44PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> The following vector insert instructions are added from ISA 3.0.
> 
> vinsertb - Vector Insert Byte
> vinserth - Vector Insert Halfword
> vinsertw - Vector Insert Word
> vinsertd - Vector Insert Doubleword
> 
> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h             |    4 ++++
>  target-ppc/int_helper.c         |   27 +++++++++++++++++++++++++++
>  target-ppc/translate.c          |    2 ++
>  target-ppc/translate/vmx-impl.c |   32 ++++++++++++++++++++++++++++++++
>  target-ppc/translate/vmx-ops.c  |   18 +++++++++++++-----
>  5 files changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 93ac9e1..0923779 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
>  DEF_HELPER_3(vspltb, void, avr, avr, i32)
>  DEF_HELPER_3(vsplth, void, avr, avr, i32)
>  DEF_HELPER_3(vspltw, void, avr, avr, i32)
> +DEF_HELPER_3(vinsertb, void, avr, avr, i32)
> +DEF_HELPER_3(vinserth, void, avr, avr, i32)
> +DEF_HELPER_3(vinsertw, void, avr, avr, i32)
> +DEF_HELPER_3(vinsertd, void, avr, avr, i32)
>  DEF_HELPER_2(vupkhpx, void, avr, avr)
>  DEF_HELPER_2(vupklpx, void, avr, avr)
>  DEF_HELPER_2(vupkhsb, void, avr, avr)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 552b2e0..3f8e439 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -1792,6 +1792,33 @@ VSPLT(w, u32)
>  #undef VSPLT
>  #undef SPLAT_ELEMENT
>  #undef _SPLAT_MASKED
> +#if defined(HOST_WORDS_BIGENDIAN)
> +#define VINSERT(suffix, element, index)                                     \
> +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> +    {                                                                       \
> +        ppc_avr_t result;                                                   \
> +        result = *r;                                                        \
> +        memcpy(&result.u8[splat], &b->element[index],                       \
> +               sizeof(result.element[0]));                                  \
> +        *r = result;                                                        \

Using a temporary for the result means two extra full vector copies,
which seems unfortunate.  Couldn't you just use memmove() instead of
memcpy() to handle the overlapping cases?

> +    }
> +#else
> +#define VINSERT(suffix, element, index)                                     \
> +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> +    {                                                                       \
> +        ppc_avr_t result;                                                   \
> +        result = *r;                                                        \
> +        uint32_t s = (ARRAY_SIZE(r->element) - index) - 1;                  \

The logic with index seems a bit convoluted.  AFAICT the index is
always the least significant element of the most-significant half of
the vector b.  So for BE &result.u8[8] should always be right and for
LE &result.u8[8 - sizeof(r->element)].

> +        uint32_t d = (16 - splat) - sizeof(r->element[0]);                  \
> +        memcpy(&result.u8[d], &b->element[s], sizeof(result.element[0]));   \
> +        *r = result;                                                        \
> +    }
> +#endif
> +VINSERT(b, u8, 7)
> +VINSERT(h, u16, 3)
> +VINSERT(w, u32, 1)
> +VINSERT(d, u64, 0)
> +#undef VINSERT
>  
>  #define VSPLTI(suffix, element, splat_type)                     \
>      void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index fc3d371..dbe952e 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -498,6 +498,8 @@ EXTRACT_HELPER(UIMM, 0, 16);
>  EXTRACT_HELPER(SIMM5, 16, 5);
>  /* 5 bits signed immediate value */
>  EXTRACT_HELPER(UIMM5, 16, 5);
> +/* 4 bits unsigned immediate value */
> +EXTRACT_HELPER(UIMM4, 16, 4);
>  /* Bit count */
>  EXTRACT_HELPER(NB, 11, 5);
>  /* Shift count */
> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> index ac78caf..f6a97ac 100644
> --- a/target-ppc/translate/vmx-impl.c
> +++ b/target-ppc/translate/vmx-impl.c
> @@ -623,13 +623,45 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>          tcg_temp_free_ptr(rd);                                          \
>      }
>  
> +#define GEN_VXFORM_UIMM_SPLAT(name, opc2, opc3, splat_max)              \
> +static void glue(gen_, name)(DisasContext *ctx)                         \
> +    {                                                                   \
> +        TCGv_ptr rb, rd;                                                \
> +        uint8_t uimm = UIMM4(ctx->opcode);                              \
> +        TCGv_i32 t0 = tcg_temp_new_i32();                               \
> +        if (unlikely(!ctx->altivec_enabled)) {                          \
> +            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
> +            return;                                                     \
> +        }                                                               \
> +        if (uimm > splat_max) {                                         \
> +            uimm = 0;                                                   \
> +        }                                                               \
> +        tcg_gen_movi_i32(t0, uimm);                                     \
> +        rb = gen_avr_ptr(rB(ctx->opcode));                              \
> +        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> +        gen_helper_##name(rd, rb, t0);                                  \
> +        tcg_temp_free_i32(t0);                                          \
> +        tcg_temp_free_ptr(rb);                                          \
> +        tcg_temp_free_ptr(rd);                                          \
> +    }
> +
>  GEN_VXFORM_UIMM(vspltb, 6, 8);
>  GEN_VXFORM_UIMM(vsplth, 6, 9);
>  GEN_VXFORM_UIMM(vspltw, 6, 10);
> +GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
> +GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
> +GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
> +GEN_VXFORM_UIMM_SPLAT(vinsertd, 6, 15, 8);
>  GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
>  GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
>  GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
>  GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
> +GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vinsertb, PPC_NONE, PPC2_ISA300);
> +GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vinserth, PPC_NONE, PPC2_ISA300);
> +GEN_VXFORM_DUAL(vspltisw, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vinsertw, PPC_NONE, PPC2_ISA300);
>  
>  static void gen_vsldoi(DisasContext *ctx)
>  {
> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> index 7449396..ca69e56 100644
> --- a/target-ppc/translate/vmx-ops.c
> +++ b/target-ppc/translate/vmx-ops.c
> @@ -41,6 +41,9 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ALTIVEC_207)
>  #define GEN_VXFORM_300(name, opc2, opc3)                                \
>  GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
>  
> +#define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
> +GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
> +
>  #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
>  GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
>  
> @@ -191,11 +194,16 @@ GEN_VXRFORM(vcmpgefp, 3, 7)
>  GEN_VXRFORM_DUAL(vcmpgtfp, vcmpgtud, 3, 11, PPC_ALTIVEC, PPC_NONE)
>  GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
>  
> -#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
> -    GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
> -GEN_VXFORM_SIMM(vspltisb, 6, 12),
> -GEN_VXFORM_SIMM(vspltish, 6, 13),
> -GEN_VXFORM_SIMM(vspltisw, 6, 14),
> +#define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
> +GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
> +                                                               PPC_NONE)
> +GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
>  
>  #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
>      GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions
  2016-08-16  4:18   ` David Gibson
@ 2016-08-19  5:46     ` Rajalakshmi Srinivasaraghavan
  2016-08-23 15:08       ` David Gibson
  0 siblings, 1 reply; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-19  5:46 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh



On 08/16/2016 09:48 AM, David Gibson wrote:
> On Thu, Aug 11, 2016 at 01:06:44PM +0530, Rajalakshmi Srinivasaraghavan wrote:
>> The following vector insert instructions are added from ISA 3.0.
>>
>> vinsertb - Vector Insert Byte
>> vinserth - Vector Insert Halfword
>> vinsertw - Vector Insert Word
>> vinsertd - Vector Insert Doubleword
>>
>> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
>> ---
>>   target-ppc/helper.h             |    4 ++++
>>   target-ppc/int_helper.c         |   27 +++++++++++++++++++++++++++
>>   target-ppc/translate.c          |    2 ++
>>   target-ppc/translate/vmx-impl.c |   32 ++++++++++++++++++++++++++++++++
>>   target-ppc/translate/vmx-ops.c  |   18 +++++++++++++-----
>>   5 files changed, 78 insertions(+), 5 deletions(-)
>>
>> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
>> index 93ac9e1..0923779 100644
>> --- a/target-ppc/helper.h
>> +++ b/target-ppc/helper.h
>> @@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
>>   DEF_HELPER_3(vspltb, void, avr, avr, i32)
>>   DEF_HELPER_3(vsplth, void, avr, avr, i32)
>>   DEF_HELPER_3(vspltw, void, avr, avr, i32)
>> +DEF_HELPER_3(vinsertb, void, avr, avr, i32)
>> +DEF_HELPER_3(vinserth, void, avr, avr, i32)
>> +DEF_HELPER_3(vinsertw, void, avr, avr, i32)
>> +DEF_HELPER_3(vinsertd, void, avr, avr, i32)
>>   DEF_HELPER_2(vupkhpx, void, avr, avr)
>>   DEF_HELPER_2(vupklpx, void, avr, avr)
>>   DEF_HELPER_2(vupkhsb, void, avr, avr)
>> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
>> index 552b2e0..3f8e439 100644
>> --- a/target-ppc/int_helper.c
>> +++ b/target-ppc/int_helper.c
>> @@ -1792,6 +1792,33 @@ VSPLT(w, u32)
>>   #undef VSPLT
>>   #undef SPLAT_ELEMENT
>>   #undef _SPLAT_MASKED
>> +#if defined(HOST_WORDS_BIGENDIAN)
>> +#define VINSERT(suffix, element, index)                                     \
>> +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
>> +    {                                                                       \
>> +        ppc_avr_t result;                                                   \
>> +        result = *r;                                                        \
>> +        memcpy(&result.u8[splat], &b->element[index],                       \
>> +               sizeof(result.element[0]));                                  \
>> +        *r = result;                                                        \
> Using a temporary for the result means two extra full vector copies,
> which seems unfortunate.  Couldn't you just use memmove() instead of
> memcpy() to handle the overlapping cases?
If the registers r and b are same, using memove() can overwrite
some of the values depending on index(third arg).
>> +    }
>> +#else
>> +#define VINSERT(suffix, element, index)                                     \
>> +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
>> +    {                                                                       \
>> +        ppc_avr_t result;                                                   \
>> +        result = *r;                                                        \
>> +        uint32_t s = (ARRAY_SIZE(r->element) - index) - 1;                  \
> The logic with index seems a bit convoluted.  AFAICT the index is
> always the least significant element of the most-significant half of
> the vector b.  So for BE &result.u8[8] should always be right and for
> LE &result.u8[8 - sizeof(r->element)].
Ack. One change in your comment for BE its u8[8 - sizeof(r->element)]
and LE its u8[8].

>
>> +        uint32_t d = (16 - splat) - sizeof(r->element[0]);                  \
>> +        memcpy(&result.u8[d], &b->element[s], sizeof(result.element[0]));   \
>> +        *r = result;                                                        \
>> +    }
>> +#endif
>> +VINSERT(b, u8, 7)
>> +VINSERT(h, u16, 3)
>> +VINSERT(w, u32, 1)
>> +VINSERT(d, u64, 0)
>> +#undef VINSERT
>>   
>>   #define VSPLTI(suffix, element, splat_type)                     \
>>       void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index fc3d371..dbe952e 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -498,6 +498,8 @@ EXTRACT_HELPER(UIMM, 0, 16);
>>   EXTRACT_HELPER(SIMM5, 16, 5);
>>   /* 5 bits signed immediate value */
>>   EXTRACT_HELPER(UIMM5, 16, 5);
>> +/* 4 bits unsigned immediate value */
>> +EXTRACT_HELPER(UIMM4, 16, 4);
>>   /* Bit count */
>>   EXTRACT_HELPER(NB, 11, 5);
>>   /* Shift count */
>> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
>> index ac78caf..f6a97ac 100644
>> --- a/target-ppc/translate/vmx-impl.c
>> +++ b/target-ppc/translate/vmx-impl.c
>> @@ -623,13 +623,45 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>>           tcg_temp_free_ptr(rd);                                          \
>>       }
>>   
>> +#define GEN_VXFORM_UIMM_SPLAT(name, opc2, opc3, splat_max)              \
>> +static void glue(gen_, name)(DisasContext *ctx)                         \
>> +    {                                                                   \
>> +        TCGv_ptr rb, rd;                                                \
>> +        uint8_t uimm = UIMM4(ctx->opcode);                              \
>> +        TCGv_i32 t0 = tcg_temp_new_i32();                               \
>> +        if (unlikely(!ctx->altivec_enabled)) {                          \
>> +            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
>> +            return;                                                     \
>> +        }                                                               \
>> +        if (uimm > splat_max) {                                         \
>> +            uimm = 0;                                                   \
>> +        }                                                               \
>> +        tcg_gen_movi_i32(t0, uimm);                                     \
>> +        rb = gen_avr_ptr(rB(ctx->opcode));                              \
>> +        rd = gen_avr_ptr(rD(ctx->opcode));                              \
>> +        gen_helper_##name(rd, rb, t0);                                  \
>> +        tcg_temp_free_i32(t0);                                          \
>> +        tcg_temp_free_ptr(rb);                                          \
>> +        tcg_temp_free_ptr(rd);                                          \
>> +    }
>> +
>>   GEN_VXFORM_UIMM(vspltb, 6, 8);
>>   GEN_VXFORM_UIMM(vsplth, 6, 9);
>>   GEN_VXFORM_UIMM(vspltw, 6, 10);
>> +GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
>> +GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
>> +GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
>> +GEN_VXFORM_UIMM_SPLAT(vinsertd, 6, 15, 8);
>>   GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
>>   GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
>>   GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
>>   GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
>> +GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
>> +                      vinsertb, PPC_NONE, PPC2_ISA300);
>> +GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
>> +                      vinserth, PPC_NONE, PPC2_ISA300);
>> +GEN_VXFORM_DUAL(vspltisw, PPC_NONE, PPC2_ALTIVEC_207,
>> +                      vinsertw, PPC_NONE, PPC2_ISA300);
>>   
>>   static void gen_vsldoi(DisasContext *ctx)
>>   {
>> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
>> index 7449396..ca69e56 100644
>> --- a/target-ppc/translate/vmx-ops.c
>> +++ b/target-ppc/translate/vmx-ops.c
>> @@ -41,6 +41,9 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ALTIVEC_207)
>>   #define GEN_VXFORM_300(name, opc2, opc3)                                \
>>   GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
>>   
>> +#define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
>> +GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
>> +
>>   #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
>>   GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
>>   
>> @@ -191,11 +194,16 @@ GEN_VXRFORM(vcmpgefp, 3, 7)
>>   GEN_VXRFORM_DUAL(vcmpgtfp, vcmpgtud, 3, 11, PPC_ALTIVEC, PPC_NONE)
>>   GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
>>   
>> -#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
>> -    GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
>> -GEN_VXFORM_SIMM(vspltisb, 6, 12),
>> -GEN_VXFORM_SIMM(vspltish, 6, 13),
>> -GEN_VXFORM_SIMM(vspltisw, 6, 14),
>> +#define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
>> +GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
>> +                                                               PPC_NONE)
>> +GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
>> +                                               PPC2_ALTIVEC_207),
>> +GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
>> +                                               PPC2_ALTIVEC_207),
>> +GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
>> +                                               PPC2_ALTIVEC_207),
>> +GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
>>   
>>   #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
>>       GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)

-- 
Thanks
Rajalakshmi S

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions
  2016-08-19  5:46     ` Rajalakshmi Srinivasaraghavan
@ 2016-08-23 15:08       ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2016-08-23 15:08 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh

[-- Attachment #1: Type: text/plain, Size: 10649 bytes --]

On Fri, Aug 19, 2016 at 11:16:03AM +0530, Rajalakshmi Srinivasaraghavan wrote:
> 
> 
> On 08/16/2016 09:48 AM, David Gibson wrote:
> > On Thu, Aug 11, 2016 at 01:06:44PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> > > The following vector insert instructions are added from ISA 3.0.
> > > 
> > > vinsertb - Vector Insert Byte
> > > vinserth - Vector Insert Halfword
> > > vinsertw - Vector Insert Word
> > > vinsertd - Vector Insert Doubleword
> > > 
> > > Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
> > > ---
> > >   target-ppc/helper.h             |    4 ++++
> > >   target-ppc/int_helper.c         |   27 +++++++++++++++++++++++++++
> > >   target-ppc/translate.c          |    2 ++
> > >   target-ppc/translate/vmx-impl.c |   32 ++++++++++++++++++++++++++++++++
> > >   target-ppc/translate/vmx-ops.c  |   18 +++++++++++++-----
> > >   5 files changed, 78 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> > > index 93ac9e1..0923779 100644
> > > --- a/target-ppc/helper.h
> > > +++ b/target-ppc/helper.h
> > > @@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
> > >   DEF_HELPER_3(vspltb, void, avr, avr, i32)
> > >   DEF_HELPER_3(vsplth, void, avr, avr, i32)
> > >   DEF_HELPER_3(vspltw, void, avr, avr, i32)
> > > +DEF_HELPER_3(vinsertb, void, avr, avr, i32)
> > > +DEF_HELPER_3(vinserth, void, avr, avr, i32)
> > > +DEF_HELPER_3(vinsertw, void, avr, avr, i32)
> > > +DEF_HELPER_3(vinsertd, void, avr, avr, i32)
> > >   DEF_HELPER_2(vupkhpx, void, avr, avr)
> > >   DEF_HELPER_2(vupklpx, void, avr, avr)
> > >   DEF_HELPER_2(vupkhsb, void, avr, avr)
> > > diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> > > index 552b2e0..3f8e439 100644
> > > --- a/target-ppc/int_helper.c
> > > +++ b/target-ppc/int_helper.c
> > > @@ -1792,6 +1792,33 @@ VSPLT(w, u32)
> > >   #undef VSPLT
> > >   #undef SPLAT_ELEMENT
> > >   #undef _SPLAT_MASKED
> > > +#if defined(HOST_WORDS_BIGENDIAN)
> > > +#define VINSERT(suffix, element, index)                                     \
> > > +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> > > +    {                                                                       \
> > > +        ppc_avr_t result;                                                   \
> > > +        result = *r;                                                        \
> > > +        memcpy(&result.u8[splat], &b->element[index],                       \
> > > +               sizeof(result.element[0]));                                  \
> > > +        *r = result;                                                        \
> > Using a temporary for the result means two extra full vector copies,
> > which seems unfortunate.  Couldn't you just use memmove() instead of
> > memcpy() to handle the overlapping cases?
> If the registers r and b are same, using memove() can overwrite
> some of the values depending on index(third arg).

You're only transferring a single element, and memmove() can handle
overlapping source and destination, so I'm not seeing the problem even
if r == b.

> > > +    }
> > > +#else
> > > +#define VINSERT(suffix, element, index)                                     \
> > > +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> > > +    {                                                                       \
> > > +        ppc_avr_t result;                                                   \
> > > +        result = *r;                                                        \
> > > +        uint32_t s = (ARRAY_SIZE(r->element) - index) - 1;                  \
> > The logic with index seems a bit convoluted.  AFAICT the index is
> > always the least significant element of the most-significant half of
> > the vector b.  So for BE &result.u8[8] should always be right and for
> > LE &result.u8[8 - sizeof(r->element)].
> Ack. One change in your comment for BE its u8[8 - sizeof(r->element)]
> and LE its u8[8].

Ah, yes, sorry.

> 
> > 
> > > +        uint32_t d = (16 - splat) - sizeof(r->element[0]);                  \
> > > +        memcpy(&result.u8[d], &b->element[s], sizeof(result.element[0]));   \
> > > +        *r = result;                                                        \
> > > +    }
> > > +#endif
> > > +VINSERT(b, u8, 7)
> > > +VINSERT(h, u16, 3)
> > > +VINSERT(w, u32, 1)
> > > +VINSERT(d, u64, 0)
> > > +#undef VINSERT
> > >   #define VSPLTI(suffix, element, splat_type)                     \
> > >       void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
> > > diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> > > index fc3d371..dbe952e 100644
> > > --- a/target-ppc/translate.c
> > > +++ b/target-ppc/translate.c
> > > @@ -498,6 +498,8 @@ EXTRACT_HELPER(UIMM, 0, 16);
> > >   EXTRACT_HELPER(SIMM5, 16, 5);
> > >   /* 5 bits signed immediate value */
> > >   EXTRACT_HELPER(UIMM5, 16, 5);
> > > +/* 4 bits unsigned immediate value */
> > > +EXTRACT_HELPER(UIMM4, 16, 4);
> > >   /* Bit count */
> > >   EXTRACT_HELPER(NB, 11, 5);
> > >   /* Shift count */
> > > diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> > > index ac78caf..f6a97ac 100644
> > > --- a/target-ppc/translate/vmx-impl.c
> > > +++ b/target-ppc/translate/vmx-impl.c
> > > @@ -623,13 +623,45 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
> > >           tcg_temp_free_ptr(rd);                                          \
> > >       }
> > > +#define GEN_VXFORM_UIMM_SPLAT(name, opc2, opc3, splat_max)              \
> > > +static void glue(gen_, name)(DisasContext *ctx)                         \
> > > +    {                                                                   \
> > > +        TCGv_ptr rb, rd;                                                \
> > > +        uint8_t uimm = UIMM4(ctx->opcode);                              \
> > > +        TCGv_i32 t0 = tcg_temp_new_i32();                               \
> > > +        if (unlikely(!ctx->altivec_enabled)) {                          \
> > > +            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
> > > +            return;                                                     \
> > > +        }                                                               \
> > > +        if (uimm > splat_max) {                                         \
> > > +            uimm = 0;                                                   \
> > > +        }                                                               \
> > > +        tcg_gen_movi_i32(t0, uimm);                                     \
> > > +        rb = gen_avr_ptr(rB(ctx->opcode));                              \
> > > +        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> > > +        gen_helper_##name(rd, rb, t0);                                  \
> > > +        tcg_temp_free_i32(t0);                                          \
> > > +        tcg_temp_free_ptr(rb);                                          \
> > > +        tcg_temp_free_ptr(rd);                                          \
> > > +    }
> > > +
> > >   GEN_VXFORM_UIMM(vspltb, 6, 8);
> > >   GEN_VXFORM_UIMM(vsplth, 6, 9);
> > >   GEN_VXFORM_UIMM(vspltw, 6, 10);
> > > +GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
> > > +GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
> > > +GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
> > > +GEN_VXFORM_UIMM_SPLAT(vinsertd, 6, 15, 8);
> > >   GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
> > >   GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
> > >   GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
> > >   GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
> > > +GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
> > > +                      vinsertb, PPC_NONE, PPC2_ISA300);
> > > +GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
> > > +                      vinserth, PPC_NONE, PPC2_ISA300);
> > > +GEN_VXFORM_DUAL(vspltisw, PPC_NONE, PPC2_ALTIVEC_207,
> > > +                      vinsertw, PPC_NONE, PPC2_ISA300);
> > >   static void gen_vsldoi(DisasContext *ctx)
> > >   {
> > > diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> > > index 7449396..ca69e56 100644
> > > --- a/target-ppc/translate/vmx-ops.c
> > > +++ b/target-ppc/translate/vmx-ops.c
> > > @@ -41,6 +41,9 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ALTIVEC_207)
> > >   #define GEN_VXFORM_300(name, opc2, opc3)                                \
> > >   GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
> > > +#define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
> > > +GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
> > > +
> > >   #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
> > >   GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
> > > @@ -191,11 +194,16 @@ GEN_VXRFORM(vcmpgefp, 3, 7)
> > >   GEN_VXRFORM_DUAL(vcmpgtfp, vcmpgtud, 3, 11, PPC_ALTIVEC, PPC_NONE)
> > >   GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
> > > -#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
> > > -    GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
> > > -GEN_VXFORM_SIMM(vspltisb, 6, 12),
> > > -GEN_VXFORM_SIMM(vspltish, 6, 13),
> > > -GEN_VXFORM_SIMM(vspltisw, 6, 14),
> > > +#define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
> > > +GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
> > > +                                                               PPC_NONE)
> > > +GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
> > > +                                               PPC2_ALTIVEC_207),
> > > +GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
> > > +                                               PPC2_ALTIVEC_207),
> > > +GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
> > > +                                               PPC2_ALTIVEC_207),
> > > +GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
> > >   #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
> > >       GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions
  2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions Rajalakshmi Srinivasaraghavan
@ 2016-08-11  7:36 ` Rajalakshmi Srinivasaraghavan
  2016-08-16  4:21   ` David Gibson
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions Rajalakshmi Srinivasaraghavan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-11  7:36 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, benh, Rajalakshmi Srinivasaraghavan

The following vector extract instructions are added from ISA 3.0.

vextractub - Vector Extract Unsigned Byte
vextractuh - Vector Extract Unsigned Halfword
vextractuw - Vector Extract Unsigned Word
vextractd - Vector Extract Unsigned Doubleword

Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
---
 target-ppc/helper.h             |    4 ++++
 target-ppc/int_helper.c         |   26 ++++++++++++++++++++++++++
 target-ppc/translate/vmx-impl.c |   10 ++++++++++
 target-ppc/translate/vmx-ops.c  |   10 +++++++---
 4 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 0923779..59e7b88 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
 DEF_HELPER_3(vspltb, void, avr, avr, i32)
 DEF_HELPER_3(vsplth, void, avr, avr, i32)
 DEF_HELPER_3(vspltw, void, avr, avr, i32)
+DEF_HELPER_3(vextractub, void, avr, avr, i32)
+DEF_HELPER_3(vextractuh, void, avr, avr, i32)
+DEF_HELPER_3(vextractuw, void, avr, avr, i32)
+DEF_HELPER_3(vextractd, void, avr, avr, i32)
 DEF_HELPER_3(vinsertb, void, avr, avr, i32)
 DEF_HELPER_3(vinserth, void, avr, avr, i32)
 DEF_HELPER_3(vinsertw, void, avr, avr, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 3f8e439..a917bd5 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1819,6 +1819,32 @@ VINSERT(h, u16, 3)
 VINSERT(w, u32, 1)
 VINSERT(d, u64, 0)
 #undef VINSERT
+#if defined(HOST_WORDS_BIGENDIAN)
+#define VEXTRACT(suffix, element, index)                                     \
+    void helper_vextract##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
+    {                                                                        \
+        uint32_t s = sizeof(r->element[0]) * index;                          \
+        ppc_avr_t result = { .u64 = { 0, 0 } };                              \
+        memcpy(&result.element[index], &b->u8[splat],                        \
+               sizeof(result.element[0]));                                   \
+        *r = result;                                                         \
+    }
+#else
+#define VEXTRACT(suffix, element, index)                                     \
+    void helper_vextract##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
+    {                                                                        \
+        ppc_avr_t result = { .u64 = { 0, 0 } };                              \
+        uint32_t s = (16 - splat) - sizeof(r->element[0]);                   \
+        uint32_t d = (ARRAY_SIZE(r->element) - index) - 1;                   \
+        memcpy(&result.element[d], &b->u8[s], sizeof(result.element[0]));    \
+        *r = result;                                                         \
+    }
+#endif
+VEXTRACT(ub, u8, 7)
+VEXTRACT(uh, u16, 3)
+VEXTRACT(uw, u32, 1)
+VEXTRACT(d, u64, 0)
+#undef VEXTRACT
 
 #define VSPLTI(suffix, element, splat_type)                     \
     void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index f6a97ac..766a645 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -648,6 +648,10 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
 GEN_VXFORM_UIMM(vspltb, 6, 8);
 GEN_VXFORM_UIMM(vsplth, 6, 9);
 GEN_VXFORM_UIMM(vspltw, 6, 10);
+GEN_VXFORM_UIMM_SPLAT(vextractub, 6, 8, 15);
+GEN_VXFORM_UIMM_SPLAT(vextractuh, 6, 9, 14);
+GEN_VXFORM_UIMM_SPLAT(vextractuw, 6, 10, 12);
+GEN_VXFORM_UIMM_SPLAT(vextractd, 6, 11, 8);
 GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
 GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
 GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
@@ -656,6 +660,12 @@ GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
 GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
 GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
 GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
+GEN_VXFORM_DUAL(vspltb, PPC_NONE, PPC2_ALTIVEC_207,
+                      vextractub, PPC_NONE, PPC2_ISA300);
+GEN_VXFORM_DUAL(vsplth, PPC_NONE, PPC2_ALTIVEC_207,
+                      vextractuh, PPC_NONE, PPC2_ISA300);
+GEN_VXFORM_DUAL(vspltw, PPC_NONE, PPC2_ALTIVEC_207,
+                      vextractuw, PPC_NONE, PPC2_ISA300);
 GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
                       vinsertb, PPC_NONE, PPC2_ISA300);
 GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index ca69e56..aafe70b 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -197,6 +197,13 @@ GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
 #define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
 GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
                                                                PPC_NONE)
+GEN_VXFORM_DUAL_INV(vspltb, vextractub, 6, 8, 0x00000000, 0x100000,
+                                               PPC2_ALTIVEC_207),
+GEN_VXFORM_DUAL_INV(vsplth, vextractuh, 6, 9, 0x00000000, 0x100000,
+                                               PPC2_ALTIVEC_207),
+GEN_VXFORM_DUAL_INV(vspltw, vextractuw, 6, 10, 0x00000000, 0x100000,
+                                               PPC2_ALTIVEC_207),
+GEN_VXFORM_300_EXT(vextractd, 6, 11, 0x100000),
 GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
                                                PPC2_ALTIVEC_207),
 GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
@@ -226,9 +233,6 @@ GEN_VXFORM_NOA(vrfiz, 5, 9),
 
 #define GEN_VXFORM_UIMM(name, opc2, opc3)                               \
     GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
-GEN_VXFORM_UIMM(vspltb, 6, 8),
-GEN_VXFORM_UIMM(vsplth, 6, 9),
-GEN_VXFORM_UIMM(vspltw, 6, 10),
 GEN_VXFORM_UIMM(vcfux, 5, 12),
 GEN_VXFORM_UIMM(vcfsx, 5, 13),
 GEN_VXFORM_UIMM(vctuxs, 5, 14),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions Rajalakshmi Srinivasaraghavan
@ 2016-08-16  4:21   ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2016-08-16  4:21 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh

[-- Attachment #1: Type: text/plain, Size: 6773 bytes --]

On Thu, Aug 11, 2016 at 01:06:45PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> The following vector extract instructions are added from ISA 3.0.
> 
> vextractub - Vector Extract Unsigned Byte
> vextractuh - Vector Extract Unsigned Halfword
> vextractuw - Vector Extract Unsigned Word
> vextractd - Vector Extract Unsigned Doubleword
> 
> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h             |    4 ++++
>  target-ppc/int_helper.c         |   26 ++++++++++++++++++++++++++
>  target-ppc/translate/vmx-impl.c |   10 ++++++++++
>  target-ppc/translate/vmx-ops.c  |   10 +++++++---
>  4 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 0923779..59e7b88 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
>  DEF_HELPER_3(vspltb, void, avr, avr, i32)
>  DEF_HELPER_3(vsplth, void, avr, avr, i32)
>  DEF_HELPER_3(vspltw, void, avr, avr, i32)
> +DEF_HELPER_3(vextractub, void, avr, avr, i32)
> +DEF_HELPER_3(vextractuh, void, avr, avr, i32)
> +DEF_HELPER_3(vextractuw, void, avr, avr, i32)
> +DEF_HELPER_3(vextractd, void, avr, avr, i32)
>  DEF_HELPER_3(vinsertb, void, avr, avr, i32)
>  DEF_HELPER_3(vinserth, void, avr, avr, i32)
>  DEF_HELPER_3(vinsertw, void, avr, avr, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 3f8e439..a917bd5 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -1819,6 +1819,32 @@ VINSERT(h, u16, 3)
>  VINSERT(w, u32, 1)
>  VINSERT(d, u64, 0)
>  #undef VINSERT
> +#if defined(HOST_WORDS_BIGENDIAN)
> +#define VEXTRACT(suffix, element, index)                                     \
> +    void helper_vextract##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \

splat is not a good parameter name here, since the element is being
extracted, rather than splatted.

> +    {                                                                        \
> +        uint32_t s = sizeof(r->element[0]) * index;                          \
> +        ppc_avr_t result = { .u64 = { 0, 0 } };                              \
> +        memcpy(&result.element[index], &b->u8[splat],                        \
> +               sizeof(result.element[0]));                                   \
> +        *r = result;                                                         \
> +    }
> +#else
> +#define VEXTRACT(suffix, element, index)                                     \
> +    void helper_vextract##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> +    {                                                                        \
> +        ppc_avr_t result = { .u64 = { 0, 0 } };                              \
> +        uint32_t s = (16 - splat) - sizeof(r->element[0]);                   \
> +        uint32_t d = (ARRAY_SIZE(r->element) - index) - 1;
> \

Same comments on the index value as for vinsert*.

> +        memcpy(&result.element[d], &b->u8[s], sizeof(result.element[0]));    \
> +        *r = result;                                                         \
> +    }
> +#endif
> +VEXTRACT(ub, u8, 7)
> +VEXTRACT(uh, u16, 3)
> +VEXTRACT(uw, u32, 1)
> +VEXTRACT(d, u64, 0)
> +#undef VEXTRACT
>  
>  #define VSPLTI(suffix, element, splat_type)                     \
>      void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> index f6a97ac..766a645 100644
> --- a/target-ppc/translate/vmx-impl.c
> +++ b/target-ppc/translate/vmx-impl.c
> @@ -648,6 +648,10 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>  GEN_VXFORM_UIMM(vspltb, 6, 8);
>  GEN_VXFORM_UIMM(vsplth, 6, 9);
>  GEN_VXFORM_UIMM(vspltw, 6, 10);
> +GEN_VXFORM_UIMM_SPLAT(vextractub, 6, 8, 15);
> +GEN_VXFORM_UIMM_SPLAT(vextractuh, 6, 9, 14);
> +GEN_VXFORM_UIMM_SPLAT(vextractuw, 6, 10, 12);
> +GEN_VXFORM_UIMM_SPLAT(vextractd, 6, 11, 8);
>  GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
>  GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
>  GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
> @@ -656,6 +660,12 @@ GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
>  GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
>  GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
>  GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
> +GEN_VXFORM_DUAL(vspltb, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vextractub, PPC_NONE, PPC2_ISA300);
> +GEN_VXFORM_DUAL(vsplth, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vextractuh, PPC_NONE, PPC2_ISA300);
> +GEN_VXFORM_DUAL(vspltw, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vextractuw, PPC_NONE, PPC2_ISA300);
>  GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
>                        vinsertb, PPC_NONE, PPC2_ISA300);
>  GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> index ca69e56..aafe70b 100644
> --- a/target-ppc/translate/vmx-ops.c
> +++ b/target-ppc/translate/vmx-ops.c
> @@ -197,6 +197,13 @@ GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
>  #define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
>  GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
>                                                                 PPC_NONE)
> +GEN_VXFORM_DUAL_INV(vspltb, vextractub, 6, 8, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_DUAL_INV(vsplth, vextractuh, 6, 9, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_DUAL_INV(vspltw, vextractuw, 6, 10, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_300_EXT(vextractd, 6, 11, 0x100000),
>  GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
>                                                 PPC2_ALTIVEC_207),
>  GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
> @@ -226,9 +233,6 @@ GEN_VXFORM_NOA(vrfiz, 5, 9),
>  
>  #define GEN_VXFORM_UIMM(name, opc2, opc3)                               \
>      GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
> -GEN_VXFORM_UIMM(vspltb, 6, 8),
> -GEN_VXFORM_UIMM(vsplth, 6, 9),
> -GEN_VXFORM_UIMM(vspltw, 6, 10),
>  GEN_VXFORM_UIMM(vcfux, 5, 12),
>  GEN_VXFORM_UIMM(vcfsx, 5, 13),
>  GEN_VXFORM_UIMM(vctuxs, 5, 14),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions
  2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions Rajalakshmi Srinivasaraghavan
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions Rajalakshmi Srinivasaraghavan
@ 2016-08-11  7:36 ` Rajalakshmi Srinivasaraghavan
  2016-08-16  4:46   ` David Gibson
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction Rajalakshmi Srinivasaraghavan
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction Rajalakshmi Srinivasaraghavan
  4 siblings, 1 reply; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-11  7:36 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, benh, Rajalakshmi Srinivasaraghavan

The following vector count trailing zeros instructions are
added from ISA 3.0.

vctzb - Vector Count Trailing Zeros Byte
vctzh - Vector Count Trailing Zeros Halfword
vctzw - Vector Count Trailing Zeros Word
vctzd - Vector Count Trailing Zeros Doubleword

Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
---
 target-ppc/helper.h             |    4 ++++
 target-ppc/int_helper.c         |   15 +++++++++++++++
 target-ppc/translate/vmx-impl.c |   19 +++++++++++++++++++
 target-ppc/translate/vmx-ops.c  |    8 ++++++++
 4 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 59e7b88..6e6e7b3 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -327,6 +327,10 @@ DEF_HELPER_2(vclzb, void, avr, avr)
 DEF_HELPER_2(vclzh, void, avr, avr)
 DEF_HELPER_2(vclzw, void, avr, avr)
 DEF_HELPER_2(vclzd, void, avr, avr)
+DEF_HELPER_2(vctzb, void, avr, avr)
+DEF_HELPER_2(vctzh, void, avr, avr)
+DEF_HELPER_2(vctzw, void, avr, avr)
+DEF_HELPER_2(vctzd, void, avr, avr)
 DEF_HELPER_2(vpopcntb, void, avr, avr)
 DEF_HELPER_2(vpopcnth, void, avr, avr)
 DEF_HELPER_2(vpopcntw, void, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index a917bd5..162f1e9 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -2091,6 +2091,21 @@ VGENERIC_DO(clzd, u64)
 #undef clzw
 #undef clzd
 
+#define ctzb(v) ((v) ? ctz32(v) : 8)
+#define ctzh(v) ((v) ? ctz32(v) : 16)
+#define ctzw(v) ctz32((v))
+#define ctzd(v) ctz64((v))
+
+VGENERIC_DO(ctzb, u8)
+VGENERIC_DO(ctzh, u16)
+VGENERIC_DO(ctzw, u32)
+VGENERIC_DO(ctzd, u64)
+
+#undef ctzb
+#undef ctzh
+#undef ctzw
+#undef ctzd
+
 #define popcntb(v) ctpop8(v)
 #define popcnth(v) ctpop16(v)
 #define popcntw(v) ctpop32(v)
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index 766a645..ebf123f 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -553,6 +553,21 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
         tcg_temp_free_ptr(rd);                                          \
     }
 
+#define GEN_VXFORM_NOA_2(name, opc2, opc3, opc4)                        \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+    {                                                                   \
+        TCGv_ptr rb, rd;                                                \
+        if (unlikely(!ctx->altivec_enabled)) {                          \
+            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
+            return;                                                     \
+        }                                                               \
+        rb = gen_avr_ptr(rB(ctx->opcode));                              \
+        rd = gen_avr_ptr(rD(ctx->opcode));                              \
+        gen_helper_##name(rd, rb);                                      \
+        tcg_temp_free_ptr(rb);                                          \
+        tcg_temp_free_ptr(rd);                                          \
+    }
+
 GEN_VXFORM_NOA(vupkhsb, 7, 8);
 GEN_VXFORM_NOA(vupkhsh, 7, 9);
 GEN_VXFORM_NOA(vupkhsw, 7, 25);
@@ -745,6 +760,10 @@ GEN_VXFORM_NOA(vclzb, 1, 28)
 GEN_VXFORM_NOA(vclzh, 1, 29)
 GEN_VXFORM_NOA(vclzw, 1, 30)
 GEN_VXFORM_NOA(vclzd, 1, 31)
+GEN_VXFORM_NOA_2(vctzb, 1, 24, 28)
+GEN_VXFORM_NOA_2(vctzh, 1, 24, 29)
+GEN_VXFORM_NOA_2(vctzw, 1, 24, 30)
+GEN_VXFORM_NOA_2(vctzd, 1, 24, 31)
 GEN_VXFORM_NOA(vpopcntb, 1, 28)
 GEN_VXFORM_NOA(vpopcnth, 1, 29)
 GEN_VXFORM_NOA(vpopcntw, 1, 30)
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index aafe70b..5b2826e 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -44,6 +44,10 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
 #define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
 GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
 
+#define GEN_VXFORM_300_EO(name, opc2, opc3, opc4)                     \
+GEN_HANDLER_E_2(name, 0x04, opc2, opc3, opc4, 0x00000000, PPC_NONE,     \
+                                                       PPC2_ISA300)
+
 #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
 GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
 
@@ -211,6 +215,10 @@ GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
 GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
                                                PPC2_ALTIVEC_207),
 GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
+GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C),
+GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D),
+GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E),
+GEN_VXFORM_300_EO(vctzd, 0x01, 0x18, 0x1F),
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
     GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions Rajalakshmi Srinivasaraghavan
@ 2016-08-16  4:46   ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2016-08-16  4:46 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh

[-- Attachment #1: Type: text/plain, Size: 5706 bytes --]

On Thu, Aug 11, 2016 at 01:06:46PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> The following vector count trailing zeros instructions are
> added from ISA 3.0.
> 
> vctzb - Vector Count Trailing Zeros Byte
> vctzh - Vector Count Trailing Zeros Halfword
> vctzw - Vector Count Trailing Zeros Word
> vctzd - Vector Count Trailing Zeros Doubleword
> 
> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

However, it needs a rebase.

> ---
>  target-ppc/helper.h             |    4 ++++
>  target-ppc/int_helper.c         |   15 +++++++++++++++
>  target-ppc/translate/vmx-impl.c |   19 +++++++++++++++++++
>  target-ppc/translate/vmx-ops.c  |    8 ++++++++
>  4 files changed, 46 insertions(+), 0 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 59e7b88..6e6e7b3 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -327,6 +327,10 @@ DEF_HELPER_2(vclzb, void, avr, avr)
>  DEF_HELPER_2(vclzh, void, avr, avr)
>  DEF_HELPER_2(vclzw, void, avr, avr)
>  DEF_HELPER_2(vclzd, void, avr, avr)
> +DEF_HELPER_2(vctzb, void, avr, avr)
> +DEF_HELPER_2(vctzh, void, avr, avr)
> +DEF_HELPER_2(vctzw, void, avr, avr)
> +DEF_HELPER_2(vctzd, void, avr, avr)
>  DEF_HELPER_2(vpopcntb, void, avr, avr)
>  DEF_HELPER_2(vpopcnth, void, avr, avr)
>  DEF_HELPER_2(vpopcntw, void, avr, avr)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index a917bd5..162f1e9 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -2091,6 +2091,21 @@ VGENERIC_DO(clzd, u64)
>  #undef clzw
>  #undef clzd
>  
> +#define ctzb(v) ((v) ? ctz32(v) : 8)
> +#define ctzh(v) ((v) ? ctz32(v) : 16)
> +#define ctzw(v) ctz32((v))
> +#define ctzd(v) ctz64((v))
> +
> +VGENERIC_DO(ctzb, u8)
> +VGENERIC_DO(ctzh, u16)
> +VGENERIC_DO(ctzw, u32)
> +VGENERIC_DO(ctzd, u64)
> +
> +#undef ctzb
> +#undef ctzh
> +#undef ctzw
> +#undef ctzd
> +
>  #define popcntb(v) ctpop8(v)
>  #define popcnth(v) ctpop16(v)
>  #define popcntw(v) ctpop32(v)
> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> index 766a645..ebf123f 100644
> --- a/target-ppc/translate/vmx-impl.c
> +++ b/target-ppc/translate/vmx-impl.c
> @@ -553,6 +553,21 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>          tcg_temp_free_ptr(rd);                                          \
>      }
>  
> +#define GEN_VXFORM_NOA_2(name, opc2, opc3, opc4)                        \
> +static void glue(gen_, name)(DisasContext *ctx)                         \
> +    {                                                                   \
> +        TCGv_ptr rb, rd;                                                \
> +        if (unlikely(!ctx->altivec_enabled)) {                          \
> +            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
> +            return;                                                     \
> +        }                                                               \
> +        rb = gen_avr_ptr(rB(ctx->opcode));                              \
> +        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> +        gen_helper_##name(rd, rb);                                      \
> +        tcg_temp_free_ptr(rb);                                          \
> +        tcg_temp_free_ptr(rd);                                          \
> +    }
> +
>  GEN_VXFORM_NOA(vupkhsb, 7, 8);
>  GEN_VXFORM_NOA(vupkhsh, 7, 9);
>  GEN_VXFORM_NOA(vupkhsw, 7, 25);
> @@ -745,6 +760,10 @@ GEN_VXFORM_NOA(vclzb, 1, 28)
>  GEN_VXFORM_NOA(vclzh, 1, 29)
>  GEN_VXFORM_NOA(vclzw, 1, 30)
>  GEN_VXFORM_NOA(vclzd, 1, 31)
> +GEN_VXFORM_NOA_2(vctzb, 1, 24, 28)
> +GEN_VXFORM_NOA_2(vctzh, 1, 24, 29)
> +GEN_VXFORM_NOA_2(vctzw, 1, 24, 30)
> +GEN_VXFORM_NOA_2(vctzd, 1, 24, 31)
>  GEN_VXFORM_NOA(vpopcntb, 1, 28)
>  GEN_VXFORM_NOA(vpopcnth, 1, 29)
>  GEN_VXFORM_NOA(vpopcntw, 1, 30)
> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> index aafe70b..5b2826e 100644
> --- a/target-ppc/translate/vmx-ops.c
> +++ b/target-ppc/translate/vmx-ops.c
> @@ -44,6 +44,10 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
>  #define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
>  GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
>  
> +#define GEN_VXFORM_300_EO(name, opc2, opc3, opc4)                     \
> +GEN_HANDLER_E_2(name, 0x04, opc2, opc3, opc4, 0x00000000, PPC_NONE,     \
> +                                                       PPC2_ISA300)
> +
>  #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
>  GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
>  
> @@ -211,6 +215,10 @@ GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
>  GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
>                                                 PPC2_ALTIVEC_207),
>  GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
> +GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C),
> +GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D),
> +GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E),
> +GEN_VXFORM_300_EO(vctzd, 0x01, 0x18, 0x1F),
>  
>  #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
>      GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction
  2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
                   ` (2 preceding siblings ...)
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions Rajalakshmi Srinivasaraghavan
@ 2016-08-11  7:36 ` Rajalakshmi Srinivasaraghavan
  2016-08-16  4:33   ` David Gibson
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction Rajalakshmi Srinivasaraghavan
  4 siblings, 1 reply; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-11  7:36 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, benh, Rajalakshmi Srinivasaraghavan

Add vbpermd instruction from ISA 3.0.

Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
---
 target-ppc/helper.h             |    1 +
 target-ppc/int_helper.c         |   22 ++++++++++++++++++++++
 target-ppc/translate/vmx-impl.c |    1 +
 target-ppc/translate/vmx-ops.c  |    1 +
 4 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 6e6e7b3..d1d9418 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -335,6 +335,7 @@ DEF_HELPER_2(vpopcntb, void, avr, avr)
 DEF_HELPER_2(vpopcnth, void, avr, avr)
 DEF_HELPER_2(vpopcntw, void, avr, avr)
 DEF_HELPER_2(vpopcntd, void, avr, avr)
+DEF_HELPER_3(vbpermd, void, avr, avr, avr)
 DEF_HELPER_3(vbpermq, void, avr, avr, avr)
 DEF_HELPER_2(vgbbd, void, avr, avr)
 DEF_HELPER_3(vpmsumb, void, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 162f1e9..6bed3b6 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1134,6 +1134,28 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
 #define VBPERMQ_DW(index) (((index) & 0x40) == 0)
 #endif
 
+void helper_vbpermd(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+    int i, j;
+    uint64_t perm = 0;
+    ppc_avr_t result;
+
+    VECTOR_FOR_INORDER_I(i, u64) {
+        perm = 0;
+        for (j = 0; j < 8; j++) {
+            int index = VBPERMQ_INDEX(b, (i * 8) + j);
+            if (index < 64) {
+                uint64_t mask = (1ull << (63 - (index & 0x3F)));
+                if (a->u64[VBPERMQ_DW(index)] & mask) {
+                    perm |= (0x80 >> j);
+                }
+            }
+        }
+        result.u64[i] = perm;
+    }
+    *r = result;
+}
+
 void helper_vbpermq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     int i;
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index ebf123f..38f8ad7 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -776,6 +776,7 @@ GEN_VXFORM_DUAL(vclzw, PPC_NONE, PPC2_ALTIVEC_207, \
                 vpopcntw, PPC_NONE, PPC2_ALTIVEC_207)
 GEN_VXFORM_DUAL(vclzd, PPC_NONE, PPC2_ALTIVEC_207, \
                 vpopcntd, PPC_NONE, PPC2_ALTIVEC_207)
+GEN_VXFORM(vbpermd, 6, 23);
 GEN_VXFORM(vbpermq, 6, 21);
 GEN_VXFORM_NOA(vgbbd, 6, 20);
 GEN_VXFORM(vpmsumb, 4, 16)
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index 5b2826e..32bd533 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -261,6 +261,7 @@ GEN_VXFORM_DUAL(vclzh, vpopcnth, 1, 29, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_DUAL(vclzw, vpopcntw, 1, 30, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_DUAL(vclzd, vpopcntd, 1, 31, PPC_NONE, PPC2_ALTIVEC_207),
 
+GEN_VXFORM_300(vbpermd, 6, 23),
 GEN_VXFORM_207(vbpermq, 6, 21),
 GEN_VXFORM_207(vgbbd, 6, 20),
 GEN_VXFORM_207(vpmsumb, 4, 16),
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction Rajalakshmi Srinivasaraghavan
@ 2016-08-16  4:33   ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2016-08-16  4:33 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

On Thu, Aug 11, 2016 at 01:06:47PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> Add vbpermd instruction from ISA 3.0.
> 
> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h             |    1 +
>  target-ppc/int_helper.c         |   22 ++++++++++++++++++++++
>  target-ppc/translate/vmx-impl.c |    1 +
>  target-ppc/translate/vmx-ops.c  |    1 +
>  4 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 6e6e7b3..d1d9418 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -335,6 +335,7 @@ DEF_HELPER_2(vpopcntb, void, avr, avr)
>  DEF_HELPER_2(vpopcnth, void, avr, avr)
>  DEF_HELPER_2(vpopcntw, void, avr, avr)
>  DEF_HELPER_2(vpopcntd, void, avr, avr)
> +DEF_HELPER_3(vbpermd, void, avr, avr, avr)
>  DEF_HELPER_3(vbpermq, void, avr, avr, avr)
>  DEF_HELPER_2(vgbbd, void, avr, avr)
>  DEF_HELPER_3(vpmsumb, void, avr, avr, avr)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 162f1e9..6bed3b6 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -1134,6 +1134,28 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
>  #define VBPERMQ_DW(index) (((index) & 0x40) == 0)
>  #endif
>  
> +void helper_vbpermd(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
> +{
> +    int i, j;
> +    uint64_t perm = 0;
> +    ppc_avr_t result;
> +
> +    VECTOR_FOR_INORDER_I(i, u64) {
> +        perm = 0;

Since you already have a temporary for the whole result vector, you
shouldn't need a temporary for the individual result dwords.

> +        for (j = 0; j < 8; j++) {
> +            int index = VBPERMQ_INDEX(b, (i * 8) + j);
> +            if (index < 64) {
> +                uint64_t mask = (1ull << (63 - (index & 0x3F)));
> +                if (a->u64[VBPERMQ_DW(index)] & mask) {
> +                    perm |= (0x80 >> j);
> +                }

It would probably be nice to avoid the conditional branch probably
created by this innermost if, which should be possible given you can
extract the actual value of the bit you're inserting.

> +            }
> +        }
> +        result.u64[i] = perm;
> +    }
> +    *r = result;
> +}
> +
>  void helper_vbpermq(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>  {
>      int i;
> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> index ebf123f..38f8ad7 100644
> --- a/target-ppc/translate/vmx-impl.c
> +++ b/target-ppc/translate/vmx-impl.c
> @@ -776,6 +776,7 @@ GEN_VXFORM_DUAL(vclzw, PPC_NONE, PPC2_ALTIVEC_207, \
>                  vpopcntw, PPC_NONE, PPC2_ALTIVEC_207)
>  GEN_VXFORM_DUAL(vclzd, PPC_NONE, PPC2_ALTIVEC_207, \
>                  vpopcntd, PPC_NONE, PPC2_ALTIVEC_207)
> +GEN_VXFORM(vbpermd, 6, 23);
>  GEN_VXFORM(vbpermq, 6, 21);
>  GEN_VXFORM_NOA(vgbbd, 6, 20);
>  GEN_VXFORM(vpmsumb, 4, 16)
> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> index 5b2826e..32bd533 100644
> --- a/target-ppc/translate/vmx-ops.c
> +++ b/target-ppc/translate/vmx-ops.c
> @@ -261,6 +261,7 @@ GEN_VXFORM_DUAL(vclzh, vpopcnth, 1, 29, PPC_NONE, PPC2_ALTIVEC_207),
>  GEN_VXFORM_DUAL(vclzw, vpopcntw, 1, 30, PPC_NONE, PPC2_ALTIVEC_207),
>  GEN_VXFORM_DUAL(vclzd, vpopcntd, 1, 31, PPC_NONE, PPC2_ALTIVEC_207),
>  
> +GEN_VXFORM_300(vbpermd, 6, 23),
>  GEN_VXFORM_207(vbpermq, 6, 21),
>  GEN_VXFORM_207(vgbbd, 6, 20),
>  GEN_VXFORM_207(vpmsumb, 4, 16),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction
  2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
                   ` (3 preceding siblings ...)
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction Rajalakshmi Srinivasaraghavan
@ 2016-08-11  7:36 ` Rajalakshmi Srinivasaraghavan
  2016-08-16  4:45   ` David Gibson
  4 siblings, 1 reply; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-11  7:36 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, benh, Rajalakshmi Srinivasaraghavan

Add vpermr instruction from ISA 3.0.

Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
---
 target-ppc/helper.h             |    1 +
 target-ppc/int_helper.c         |   23 +++++++++++++++++++++++
 target-ppc/translate/vmx-impl.c |   18 ++++++++++++++++++
 target-ppc/translate/vmx-ops.c  |    1 +
 4 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index d1d9418..3c476c9 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -270,6 +270,7 @@ DEF_HELPER_5(vmsumubm, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsummbm, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vsel, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vperm, void, env, avr, avr, avr, avr)
+DEF_HELPER_5(vpermr, void, env, avr, avr, avr, avr)
 DEF_HELPER_4(vpkshss, void, env, avr, avr, avr)
 DEF_HELPER_4(vpkshus, void, env, avr, avr, avr)
 DEF_HELPER_4(vpkswss, void, env, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 6bed3b6..a35355f 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1126,6 +1126,29 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
     *r = result;
 }
 
+void helper_vpermr(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
+                  ppc_avr_t *c)
+{
+    ppc_avr_t result;
+    int i;
+
+    VECTOR_FOR_INORDER_I(i, u8) {
+        int s = c->u8[i] & 0x1f;
+#if defined(HOST_WORDS_BIGENDIAN)
+        int index = 15 - (s & 0xf);
+#else
+        int index = s & 0xf;
+#endif
+
+        if (s & 0x10) {
+            result.u8[i] = a->u8[index];
+        } else {
+            result.u8[i] = b->u8[index];
+        }
+    }
+    *r = result;
+}
+
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VBPERMQ_INDEX(avr, i) ((avr)->u8[(i)])
 #define VBPERMQ_DW(index) (((index) & 0x40) != 0)
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index 38f8ad7..feb10de 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -750,6 +750,24 @@ static void gen_vmladduhm(DisasContext *ctx)
     tcg_temp_free_ptr(rd);
 }
 
+static void gen_vpermr(DisasContext *ctx)
+{
+    TCGv_ptr ra, rb, rc, rd;
+    if (unlikely(!ctx->altivec_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VPU);
+        return;
+    }
+    ra = gen_avr_ptr(rA(ctx->opcode));
+    rb = gen_avr_ptr(rB(ctx->opcode));
+    rc = gen_avr_ptr(rC(ctx->opcode));
+    rd = gen_avr_ptr(rD(ctx->opcode));
+    gen_helper_vpermr(cpu_env, rd, ra, rb, rc);
+    tcg_temp_free_ptr(ra);
+    tcg_temp_free_ptr(rb);
+    tcg_temp_free_ptr(rc);
+    tcg_temp_free_ptr(rd);
+}
+
 GEN_VAFORM_PAIRED(vmsumubm, vmsummbm, 18)
 GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19)
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20)
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index 32bd533..ad72db5 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -219,6 +219,7 @@ GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C),
 GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D),
 GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E),
 GEN_VXFORM_300_EO(vctzd, 0x01, 0x18, 0x1F),
+GEN_VXFORM_300(vpermr, 0x1D, 0xFF),
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
     GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction
  2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction Rajalakshmi Srinivasaraghavan
@ 2016-08-16  4:45   ` David Gibson
  2016-08-16 10:14     ` Rajalakshmi Srinivasaraghavan
  0 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2016-08-16  4:45 UTC (permalink / raw)
  To: Rajalakshmi Srinivasaraghavan; +Cc: qemu-ppc, rth, qemu-devel, nikunj, benh

[-- Attachment #1: Type: text/plain, Size: 4009 bytes --]

On Thu, Aug 11, 2016 at 01:06:48PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> Add vpermr instruction from ISA 3.0.
> 
> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h             |    1 +
>  target-ppc/int_helper.c         |   23 +++++++++++++++++++++++
>  target-ppc/translate/vmx-impl.c |   18 ++++++++++++++++++
>  target-ppc/translate/vmx-ops.c  |    1 +
>  4 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index d1d9418..3c476c9 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -270,6 +270,7 @@ DEF_HELPER_5(vmsumubm, void, env, avr, avr, avr, avr)
>  DEF_HELPER_5(vmsummbm, void, env, avr, avr, avr, avr)
>  DEF_HELPER_5(vsel, void, env, avr, avr, avr, avr)
>  DEF_HELPER_5(vperm, void, env, avr, avr, avr, avr)
> +DEF_HELPER_5(vpermr, void, env, avr, avr, avr, avr)
>  DEF_HELPER_4(vpkshss, void, env, avr, avr, avr)
>  DEF_HELPER_4(vpkshus, void, env, avr, avr, avr)
>  DEF_HELPER_4(vpkswss, void, env, avr, avr, avr)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 6bed3b6..a35355f 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -1126,6 +1126,29 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
>      *r = result;
>  }
>  
> +void helper_vpermr(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
> +                  ppc_avr_t *c)
> +{
> +    ppc_avr_t result;
> +    int i;
> +
> +    VECTOR_FOR_INORDER_I(i, u8) {
> +        int s = c->u8[i] & 0x1f;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +        int index = 15 - (s & 0xf);
> +#else
> +        int index = s & 0xf;
> +#endif
> +
> +        if (s & 0x10) {
> +            result.u8[i] = a->u8[index];
> +        } else {
> +            result.u8[i] = b->u8[index];
> +        }
> +    }
> +    *r = result;
> +}
> +
>  #if defined(HOST_WORDS_BIGENDIAN)
>  #define VBPERMQ_INDEX(avr, i) ((avr)->u8[(i)])
>  #define VBPERMQ_DW(index) (((index) & 0x40) != 0)
> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> index 38f8ad7..feb10de 100644
> --- a/target-ppc/translate/vmx-impl.c
> +++ b/target-ppc/translate/vmx-impl.c
> @@ -750,6 +750,24 @@ static void gen_vmladduhm(DisasContext *ctx)
>      tcg_temp_free_ptr(rd);
>  }
>  
> +static void gen_vpermr(DisasContext *ctx)
> +{
> +    TCGv_ptr ra, rb, rc, rd;
> +    if (unlikely(!ctx->altivec_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VPU);
> +        return;
> +    }
> +    ra = gen_avr_ptr(rA(ctx->opcode));
> +    rb = gen_avr_ptr(rB(ctx->opcode));
> +    rc = gen_avr_ptr(rC(ctx->opcode));
> +    rd = gen_avr_ptr(rD(ctx->opcode));
> +    gen_helper_vpermr(cpu_env, rd, ra, rb, rc);
> +    tcg_temp_free_ptr(ra);
> +    tcg_temp_free_ptr(rb);
> +    tcg_temp_free_ptr(rc);
> +    tcg_temp_free_ptr(rd);
> +}

Why do you need this gen_vpermr() function while there isn't a
matching gen_vperm()?

> +
>  GEN_VAFORM_PAIRED(vmsumubm, vmsummbm, 18)
>  GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19)
>  GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20)
> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> index 32bd533..ad72db5 100644
> --- a/target-ppc/translate/vmx-ops.c
> +++ b/target-ppc/translate/vmx-ops.c
> @@ -219,6 +219,7 @@ GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C),
>  GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D),
>  GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E),
>  GEN_VXFORM_300_EO(vctzd, 0x01, 0x18, 0x1F),
> +GEN_VXFORM_300(vpermr, 0x1D, 0xFF),
>  
>  #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
>      GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction
  2016-08-16  4:45   ` David Gibson
@ 2016-08-16 10:14     ` Rajalakshmi Srinivasaraghavan
  0 siblings, 0 replies; 14+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2016-08-16 10:14 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, nikunj, rth



On 08/16/2016 10:15 AM, David Gibson wrote:
> Why do you need this gen_vpermr() function while there isn't a
> matching gen_vperm()?
vperm is handled as part of GEN_VAFORM_PAIRED(vsel, vperm, 21)
However the opcode format  of vpermr  cannot be combined with
any other instruction of VA form.

-- 
Thanks
Rajalakshmi S

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-23 15:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions Rajalakshmi Srinivasaraghavan
2016-08-16  4:18   ` David Gibson
2016-08-19  5:46     ` Rajalakshmi Srinivasaraghavan
2016-08-23 15:08       ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions Rajalakshmi Srinivasaraghavan
2016-08-16  4:21   ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions Rajalakshmi Srinivasaraghavan
2016-08-16  4:46   ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction Rajalakshmi Srinivasaraghavan
2016-08-16  4:33   ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction Rajalakshmi Srinivasaraghavan
2016-08-16  4:45   ` David Gibson
2016-08-16 10:14     ` Rajalakshmi Srinivasaraghavan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).