Re: [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Brankovic <stefan.brankovic@rt-rk.com>
To: Richard Henderson <richard.henderson@linaro.org>, qemu-devel@nongnu.org
Cc: david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions
Date: Mon, 17 Jun 2019 13:42:28 +0200	[thread overview]
Message-ID: <2629bf10-43ac-8633-b51c-d0bb7a4c1a78@rt-rk.com> (raw)
In-Reply-To: <93061f61-699f-821d-fda2-4fa287b4506b@linaro.org>


On 6.6.19. 22:38, Richard Henderson wrote:
> On 6/6/19 5:15 AM, Stefan Brankovic wrote:
>> Optimize Altivec instruction vclzh (Vector Count Leading Zeros Halfword).
>> This instruction counts the number of leading zeros of each halfword element
>> in source register and places result in the appropriate halfword element of
>> destination register.
> For halfword, you're generating 32 operations.  A loop over the halfwords,
> similar to the word loop I suggested for the last patch, does not reduce this
> total, since one has to adjust the clz32 result.
>
> For byte, you're generating 64 operations.
>
> These expansions are so big that without host vector support it's probably best
> to leave them out-of-line.
>
> I can imagine a byte clz expansion like
>
> 	t0 = input >> 4;
> 	t1 = input << 4;
> 	cmp = input == 0 ? -1 : 0;
> 	input = cmp ? t1 : input;
> 	output = cmp & 4;
>
> 	t0 = input >> 6;
> 	t1 = input << 2;
> 	cmp = input == 0 ? -1 : 0;
> 	input = cmp ? t1 : input;
> 	t0 = cmp & 2;
> 	output += t0;
>
> 	t1 = input << 1;
> 	cmp = input >= 0 ? -1 : 0;
> 	output -= cmp;
>
> 	cmp = input == 0 ? -1 : 0;
> 	output -= cmp;
>
> which would expand to 20 x86_64 vector instructions.  A halfword expansion
> would require one more round and thus 25 instructions.

I based this patch on performance results and my measurements say that 
tcg implementation is still significantly superior to helper 
implementation, regardless of somewhat large number of instructions.

I can attach both performance measurements results and disassembly of 
both helper and tcg implementations, if you want me to do this.

>
> I'll also note that ARM, Power8, and S390 all support this as a native vector
> operation; only x86_64 would require the above expansion.  It probably makes
> sense to add this operation to tcg.

I agree with this, but currently we don't have this implemented in tcg, 
so I worked with what I have.

Kind Regards,

Stefan

> r~

next prev parent reply	other threads:[~2019-06-17 12:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06 10:15 [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl, Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions Stefan Brankovic
2019-06-06 16:46   ` Richard Henderson
2019-06-17 11:31     ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions Stefan Brankovic
2019-06-06 17:03   ` Richard Henderson
2019-06-17 11:36     ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 3/8] target/ppc: Optimize emulation of vpkpx instruction Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction Stefan Brankovic
2019-06-06 18:19   ` Richard Henderson
2019-06-17 11:58     ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 5/8] target/ppc: Optimize emulation of vclzd instruction Stefan Brankovic
2019-06-06 18:26   ` Richard Henderson
2019-06-06 10:15 ` [Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction Stefan Brankovic
2019-06-06 18:34   ` Richard Henderson
2019-06-17 11:50     ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions Stefan Brankovic
2019-06-06 20:38   ` Richard Henderson
2019-06-17 11:42     ` Stefan Brankovic [this message]
2019-06-06 10:15 ` [Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions Stefan Brankovic
2019-06-06 20:43   ` Richard Henderson
2019-06-17 11:43     ` Stefan Brankovic
2019-06-06 17:13 ` [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl, Richard Henderson
2019-06-12  7:31   ` [Qemu-devel] ?==?utf-8?q? ?==?utf-8?q? [PATCH 0/8] Optimize emulation of ten Altivec instructions:?==?utf-8?q? lvsl, Stefan Brankovic
2019-06-17 11:32   ` [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl, Stefan Brankovic
2019-06-07  3:51 ` Howard Spoelstra
  -- strict thread matches above, loose matches on Subject: below --
2019-06-19 11:03 [Qemu-devel] [PATCH 0/8] target/ppc: Optimize emulation of some Altivec instructions Stefan Brankovic
2019-06-19 11:03 ` [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions Stefan Brankovic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2629bf10-43ac-8633-b51c-d0bb7a4c1a78@rt-rk.com \
    --to=stefan.brankovic@rt-rk.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).