qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: "Lucas Mateus Castro(alqotel)" <lucas.araujo@eldorado.org.br>,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Cc: "Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"Cédric Le Goater" <clg@kaod.org>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Greg Kurz" <groug@kaod.org>
Subject: Re: [PATCH v2 05/12] target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec
Date: Mon, 10 Oct 2022 12:26:23 -0700	[thread overview]
Message-ID: <d4c812fc-08ba-9475-ec2b-972182cb1906@linaro.org> (raw)
In-Reply-To: <20221010191356.83659-6-lucas.araujo@eldorado.org.br>

On 10/10/22 12:13, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
> 
> Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to
> decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8,
> respectively.
> 
> vprtybw:
> rept    loop    master             patch
> 8       12500   0,00991200         0,00626300 (-36.8%)
> 25      4000    0,01040600         0,00550600 (-47.1%)
> 100     1000    0,01084500         0,00601100 (-44.6%)
> 500     200     0,01490600         0,01394100 (-6.5%)
> 2500    40      0,03285100         0,05143000 (+56.6%)
> 8000    12      0,08971500         0,14662500 (+63.4%)
> 
> vprtybd:
> rept    loop    master             patch
> 8       12500   0,00665800         0,00652800 (-2.0%)
> 25      4000    0,00589300         0,00670400 (+13.8%)
> 100     1000    0,00646800         0,00743900 (+15.0%)
> 500     200     0,01065800         0,01586400 (+48.8%)
> 2500    40      0,03497000         0,07180100 (+105.3%)
> 8000    12      0,09242200         0,21566600 (+133.3%)
> 
> vprtybq:
> rept    loop    master             patch
> 8       12500   0,00656200         0,00665800 (+1.5%)
> 25      4000    0,00620500         0,00644900 (+3.9%)
> 100     1000    0,00707500         0,00764900 (+8.1%)
> 500     200     0,01203500         0,01349500 (+12.1%)
> 2500    40      0,03505700         0,04123100 (+17.6%)
> 8000    12      0,09590600         0,11586700 (+20.8%)
> 
> I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ,
> I'm not sure if it's worth to move those instructions. Comparing the
> assembly of the helper with the TCGop they are pretty similar, so
> I'm not sure why vprtybd took so much more time.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
> ---
>   target/ppc/helper.h                 |  4 +-
>   target/ppc/insn32.decode            |  4 ++
>   target/ppc/int_helper.c             | 25 +--------
>   target/ppc/translate/vmx-impl.c.inc | 80 +++++++++++++++++++++++++++--
>   target/ppc/translate/vmx-ops.c.inc  |  3 --
>   5 files changed, 83 insertions(+), 33 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index b2e910b089..a06193bc67 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -193,9 +193,7 @@ DEF_HELPER_FLAGS_3(vslo, TCG_CALL_NO_RWG, void, avr, avr, avr)
>   DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, avr, avr)
>   DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr)
>   DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr)
> -DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr)
> -DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr)
> -DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr)
> +DEF_HELPER_FLAGS_3(VPRTYBQ, TCG_CALL_NO_RWG, void, avr, avr, i32)
>   DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
>   DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
>   DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> index 2658dd3395..aa4968e6b9 100644
> --- a/target/ppc/insn32.decode
> +++ b/target/ppc/insn32.decode
> @@ -529,6 +529,10 @@ VCTZDM          000100 ..... ..... ..... 11111000100    @VX
>   VPDEPD          000100 ..... ..... ..... 10111001101    @VX
>   VPEXTD          000100 ..... ..... ..... 10110001101    @VX
>   
> +VPRTYBD         000100 ..... 01001 ..... 11000000010    @VX_tb
> +VPRTYBQ         000100 ..... 01010 ..... 11000000010    @VX_tb
> +VPRTYBW         000100 ..... 01000 ..... 11000000010    @VX_tb
> +
>   ## Vector Permute and Formatting Instruction
>   
>   VEXTDUBVLX      000100 ..... ..... ..... ..... 011000   @VA
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index c7fd0d1faa..c6ce4665fa 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -492,31 +492,8 @@ static inline void set_vscr_sat(CPUPPCState *env)
>       env->vscr_sat.u32[0] = 1;
>   }
>   
> -/* vprtybw */
> -void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b)
> -{
> -    int i;
> -    for (i = 0; i < ARRAY_SIZE(r->u32); i++) {
> -        uint64_t res = b->u32[i] ^ (b->u32[i] >> 16);
> -        res ^= res >> 8;
> -        r->u32[i] = res & 1;
> -    }
> -}
> -
> -/* vprtybd */
> -void helper_vprtybd(ppc_avr_t *r, ppc_avr_t *b)
> -{
> -    int i;
> -    for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
> -        uint64_t res = b->u64[i] ^ (b->u64[i] >> 32);
> -        res ^= res >> 16;
> -        res ^= res >> 8;
> -        r->u64[i] = res & 1;
> -    }
> -}
> -
>   /* vprtybq */
> -void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b)
> +void helper_VPRTYBQ(ppc_avr_t *r, ppc_avr_t *b, uint32_t v)
>   {
>       uint64_t res = b->u64[0] ^ b->u64[1];
>       res ^= res >> 32;
> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
> index b9a9e83ab3..23601942bc 100644
> --- a/target/ppc/translate/vmx-impl.c.inc
> +++ b/target/ppc/translate/vmx-impl.c.inc
> @@ -1659,9 +1659,83 @@ GEN_VXFORM_NOA_ENV(vrfim, 5, 11);
>   GEN_VXFORM_NOA_ENV(vrfin, 5, 8);
>   GEN_VXFORM_NOA_ENV(vrfip, 5, 10);
>   GEN_VXFORM_NOA_ENV(vrfiz, 5, 9);
> -GEN_VXFORM_NOA(vprtybw, 1, 24);
> -GEN_VXFORM_NOA(vprtybd, 1, 24);
> -GEN_VXFORM_NOA(vprtybq, 1, 24);
> +
> +static void gen_vprtyb_vec(unsigned vece, TCGv_vec t, TCGv_vec b)
> +{
> +    int i;
> +    TCGv_vec tmp = tcg_temp_new_vec_matching(b);
> +    /* MO_32 is 2, so 2 iteractions for MO_32 and 3 for MO_64 */
> +    for (i = 0; i < vece; i++) {
> +        tcg_gen_shri_vec(vece, tmp, b, (4 << (vece - i)));
> +        tcg_gen_xor_vec(vece, b, tmp, b);
> +    }
> +    tcg_gen_and_vec(vece, t, b, tcg_constant_vec_matching(t, vece, 1));
> +    tcg_temp_free_vec(tmp);
> +}
> +
> +/* vprtybw */
> +static void gen_vprtyb_i32(TCGv_i32 t, TCGv_i32 b)
> +{
> +    TCGv_i32 tmp = tcg_temp_new_i32();
> +    tcg_gen_shri_i32(tmp, b, 16);
> +    tcg_gen_xor_i32(b, tmp, b);
> +    tcg_gen_shri_i32(tmp, b, 8);
> +    tcg_gen_xor_i32(b, tmp, b);
> +    tcg_gen_and_i32(t, b, tcg_constant_i32(1));
> +    tcg_temp_free_i32(tmp);

tcg_gen_ctpop_i32(t, b);
tcg_gen_andi_i32(t, t, 1);

> +}
> +
> +/* vprtybd */
> +static void gen_vprtyb_i64(TCGv_i64 t, TCGv_i64 b)
> +{
> +    TCGv_i64 tmp = tcg_temp_new_i64();
> +    tcg_gen_shri_i64(tmp, b, 32);
> +    tcg_gen_xor_i64(b, tmp, b);
> +    tcg_gen_shri_i64(tmp, b, 16);
> +    tcg_gen_xor_i64(b, tmp, b);
> +    tcg_gen_shri_i64(tmp, b, 8);
> +    tcg_gen_xor_i64(b, tmp, b);
> +    tcg_gen_and_i64(t, b, tcg_constant_i64(1));
> +    tcg_temp_free_i64(tmp);

Similarly.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


  reply	other threads:[~2022-10-10 20:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10 19:13 [PATCH v2 00/12] VMX/VSX instructions with gvec Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 01/12] target/ppc: Moved VMLADDUHM to decodetree and use gvec Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 02/12] target/ppc: Move VMH[R]ADDSHS instruction to decodetree Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 03/12] target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec Lucas Mateus Castro(alqotel)
2022-10-10 19:22   ` Richard Henderson
2022-10-10 19:13 ` [PATCH v2 04/12] target/ppc: Move VNEG[WD] to decodtree " Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 05/12] target/ppc: Move VPRTYB[WDQ] to decodetree " Lucas Mateus Castro(alqotel)
2022-10-10 19:26   ` Richard Henderson [this message]
2022-10-10 19:13 ` [PATCH v2 06/12] target/ppc: Move VAVG[SU][BHW] " Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 07/12] target/ppc: Move VABSDU[BHW] " Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 08/12] target/ppc: Use gvec to decode XV[N]ABS[DS]P/XVNEG[DS]P Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 09/12] target/ppc: Use gvec to decode XVCPSGN[SD]P Lucas Mateus Castro(alqotel)
2022-10-10 19:29   ` Richard Henderson
2022-10-10 19:13 ` [PATCH v2 10/12] target/ppc: Moved XVTSTDC[DS]P to decodetree Lucas Mateus Castro(alqotel)
2022-10-10 19:13 ` [PATCH v2 11/12] target/ppc: Moved XSTSTDC[QDS]P " Lucas Mateus Castro(alqotel)
2022-10-10 19:31   ` Richard Henderson
2022-10-10 19:13 ` [PATCH v2 12/12] target/ppc: Use gvec to decode XVTSTDC[DS]P Lucas Mateus Castro(alqotel)
2022-10-10 19:42   ` Richard Henderson
2022-10-10 19:53     ` Lucas Mateus Martins Araujo e Castro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4c812fc-08ba-9475-ec2b-972182cb1906@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=clg@kaod.org \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=lucas.araujo@eldorado.org.br \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).