All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Cc: qemu-ppc@nongnu.org, rth@twiddle.net, qemu-devel@nongnu.org,
	nikunj@linux.vnet.ibm.com, benh@kernel.crashing.org
Subject: Re: [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions
Date: Tue, 16 Aug 2016 14:18:15 +1000	[thread overview]
Message-ID: <20160816041815.GF14530@voom.fritz.box> (raw)
In-Reply-To: <1470901008-3284-2-git-send-email-raji@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 9433 bytes --]

On Thu, Aug 11, 2016 at 01:06:44PM +0530, Rajalakshmi Srinivasaraghavan wrote:
> The following vector insert instructions are added from ISA 3.0.
> 
> vinsertb - Vector Insert Byte
> vinserth - Vector Insert Halfword
> vinsertw - Vector Insert Word
> vinsertd - Vector Insert Doubleword
> 
> Signed-off-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h             |    4 ++++
>  target-ppc/int_helper.c         |   27 +++++++++++++++++++++++++++
>  target-ppc/translate.c          |    2 ++
>  target-ppc/translate/vmx-impl.c |   32 ++++++++++++++++++++++++++++++++
>  target-ppc/translate/vmx-ops.c  |   18 +++++++++++++-----
>  5 files changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 93ac9e1..0923779 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -250,6 +250,10 @@ DEF_HELPER_2(vspltisw, void, avr, i32)
>  DEF_HELPER_3(vspltb, void, avr, avr, i32)
>  DEF_HELPER_3(vsplth, void, avr, avr, i32)
>  DEF_HELPER_3(vspltw, void, avr, avr, i32)
> +DEF_HELPER_3(vinsertb, void, avr, avr, i32)
> +DEF_HELPER_3(vinserth, void, avr, avr, i32)
> +DEF_HELPER_3(vinsertw, void, avr, avr, i32)
> +DEF_HELPER_3(vinsertd, void, avr, avr, i32)
>  DEF_HELPER_2(vupkhpx, void, avr, avr)
>  DEF_HELPER_2(vupklpx, void, avr, avr)
>  DEF_HELPER_2(vupkhsb, void, avr, avr)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 552b2e0..3f8e439 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -1792,6 +1792,33 @@ VSPLT(w, u32)
>  #undef VSPLT
>  #undef SPLAT_ELEMENT
>  #undef _SPLAT_MASKED
> +#if defined(HOST_WORDS_BIGENDIAN)
> +#define VINSERT(suffix, element, index)                                     \
> +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> +    {                                                                       \
> +        ppc_avr_t result;                                                   \
> +        result = *r;                                                        \
> +        memcpy(&result.u8[splat], &b->element[index],                       \
> +               sizeof(result.element[0]));                                  \
> +        *r = result;                                                        \

Using a temporary for the result means two extra full vector copies,
which seems unfortunate.  Couldn't you just use memmove() instead of
memcpy() to handle the overlapping cases?

> +    }
> +#else
> +#define VINSERT(suffix, element, index)                                     \
> +    void helper_vinsert##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t splat) \
> +    {                                                                       \
> +        ppc_avr_t result;                                                   \
> +        result = *r;                                                        \
> +        uint32_t s = (ARRAY_SIZE(r->element) - index) - 1;                  \

The logic with index seems a bit convoluted.  AFAICT the index is
always the least significant element of the most-significant half of
the vector b.  So for BE &result.u8[8] should always be right and for
LE &result.u8[8 - sizeof(r->element)].

> +        uint32_t d = (16 - splat) - sizeof(r->element[0]);                  \
> +        memcpy(&result.u8[d], &b->element[s], sizeof(result.element[0]));   \
> +        *r = result;                                                        \
> +    }
> +#endif
> +VINSERT(b, u8, 7)
> +VINSERT(h, u16, 3)
> +VINSERT(w, u32, 1)
> +VINSERT(d, u64, 0)
> +#undef VINSERT
>  
>  #define VSPLTI(suffix, element, splat_type)                     \
>      void helper_vspltis##suffix(ppc_avr_t *r, uint32_t splat)   \
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index fc3d371..dbe952e 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -498,6 +498,8 @@ EXTRACT_HELPER(UIMM, 0, 16);
>  EXTRACT_HELPER(SIMM5, 16, 5);
>  /* 5 bits signed immediate value */
>  EXTRACT_HELPER(UIMM5, 16, 5);
> +/* 4 bits unsigned immediate value */
> +EXTRACT_HELPER(UIMM4, 16, 4);
>  /* Bit count */
>  EXTRACT_HELPER(NB, 11, 5);
>  /* Shift count */
> diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
> index ac78caf..f6a97ac 100644
> --- a/target-ppc/translate/vmx-impl.c
> +++ b/target-ppc/translate/vmx-impl.c
> @@ -623,13 +623,45 @@ static void glue(gen_, name)(DisasContext *ctx)                         \
>          tcg_temp_free_ptr(rd);                                          \
>      }
>  
> +#define GEN_VXFORM_UIMM_SPLAT(name, opc2, opc3, splat_max)              \
> +static void glue(gen_, name)(DisasContext *ctx)                         \
> +    {                                                                   \
> +        TCGv_ptr rb, rd;                                                \
> +        uint8_t uimm = UIMM4(ctx->opcode);                              \
> +        TCGv_i32 t0 = tcg_temp_new_i32();                               \
> +        if (unlikely(!ctx->altivec_enabled)) {                          \
> +            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
> +            return;                                                     \
> +        }                                                               \
> +        if (uimm > splat_max) {                                         \
> +            uimm = 0;                                                   \
> +        }                                                               \
> +        tcg_gen_movi_i32(t0, uimm);                                     \
> +        rb = gen_avr_ptr(rB(ctx->opcode));                              \
> +        rd = gen_avr_ptr(rD(ctx->opcode));                              \
> +        gen_helper_##name(rd, rb, t0);                                  \
> +        tcg_temp_free_i32(t0);                                          \
> +        tcg_temp_free_ptr(rb);                                          \
> +        tcg_temp_free_ptr(rd);                                          \
> +    }
> +
>  GEN_VXFORM_UIMM(vspltb, 6, 8);
>  GEN_VXFORM_UIMM(vsplth, 6, 9);
>  GEN_VXFORM_UIMM(vspltw, 6, 10);
> +GEN_VXFORM_UIMM_SPLAT(vinsertb, 6, 12, 15);
> +GEN_VXFORM_UIMM_SPLAT(vinserth, 6, 13, 14);
> +GEN_VXFORM_UIMM_SPLAT(vinsertw, 6, 14, 12);
> +GEN_VXFORM_UIMM_SPLAT(vinsertd, 6, 15, 8);
>  GEN_VXFORM_UIMM_ENV(vcfux, 5, 12);
>  GEN_VXFORM_UIMM_ENV(vcfsx, 5, 13);
>  GEN_VXFORM_UIMM_ENV(vctuxs, 5, 14);
>  GEN_VXFORM_UIMM_ENV(vctsxs, 5, 15);
> +GEN_VXFORM_DUAL(vspltisb, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vinsertb, PPC_NONE, PPC2_ISA300);
> +GEN_VXFORM_DUAL(vspltish, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vinserth, PPC_NONE, PPC2_ISA300);
> +GEN_VXFORM_DUAL(vspltisw, PPC_NONE, PPC2_ALTIVEC_207,
> +                      vinsertw, PPC_NONE, PPC2_ISA300);
>  
>  static void gen_vsldoi(DisasContext *ctx)
>  {
> diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
> index 7449396..ca69e56 100644
> --- a/target-ppc/translate/vmx-ops.c
> +++ b/target-ppc/translate/vmx-ops.c
> @@ -41,6 +41,9 @@ GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ALTIVEC_207)
>  #define GEN_VXFORM_300(name, opc2, opc3)                                \
>  GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
>  
> +#define GEN_VXFORM_300_EXT(name, opc2, opc3, inval)                     \
> +GEN_HANDLER_E(name, 0x04, opc2, opc3, inval, PPC_NONE, PPC2_ISA300)
> +
>  #define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
>  GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
>  
> @@ -191,11 +194,16 @@ GEN_VXRFORM(vcmpgefp, 3, 7)
>  GEN_VXRFORM_DUAL(vcmpgtfp, vcmpgtud, 3, 11, PPC_ALTIVEC, PPC_NONE)
>  GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
>  
> -#define GEN_VXFORM_SIMM(name, opc2, opc3)                               \
> -    GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
> -GEN_VXFORM_SIMM(vspltisb, 6, 12),
> -GEN_VXFORM_SIMM(vspltish, 6, 13),
> -GEN_VXFORM_SIMM(vspltisw, 6, 14),
> +#define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
> +GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
> +                                                               PPC_NONE)
> +GEN_VXFORM_DUAL_INV(vspltisb, vinsertb, 6, 12, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_DUAL_INV(vspltish, vinserth, 6, 13, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_DUAL_INV(vspltisw, vinsertw, 6, 14, 0x00000000, 0x100000,
> +                                               PPC2_ALTIVEC_207),
> +GEN_VXFORM_300_EXT(vinsertd, 6, 15, 0x100000),
>  
>  #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
>      GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-08-16  4:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11  7:36 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablement - part3 Rajalakshmi Srinivasaraghavan
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: add vector insert instructions Rajalakshmi Srinivasaraghavan
2016-08-16  4:18   ` David Gibson [this message]
2016-08-19  5:46     ` Rajalakshmi Srinivasaraghavan
2016-08-23 15:08       ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: add vector extract instructions Rajalakshmi Srinivasaraghavan
2016-08-16  4:21   ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: add vector count trailing zeros instructions Rajalakshmi Srinivasaraghavan
2016-08-16  4:46   ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add vector bit permute doubleword instruction Rajalakshmi Srinivasaraghavan
2016-08-16  4:33   ` David Gibson
2016-08-11  7:36 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add vector permute right indexed instruction Rajalakshmi Srinivasaraghavan
2016-08-16  4:45   ` David Gibson
2016-08-16 10:14     ` Rajalakshmi Srinivasaraghavan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160816041815.GF14530@voom.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=benh@kernel.crashing.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=raji@linux.vnet.ibm.com \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.