From: Stefan Brankovic <stefan.brankovic@rt-rk.com>
To: Richard Henderson <richard.henderson@linaro.org>, qemu-devel@nongnu.org
Cc: david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions
Date: Mon, 17 Jun 2019 13:42:28 +0200 [thread overview]
Message-ID: <2629bf10-43ac-8633-b51c-d0bb7a4c1a78@rt-rk.com> (raw)
In-Reply-To: <93061f61-699f-821d-fda2-4fa287b4506b@linaro.org>
On 6.6.19. 22:38, Richard Henderson wrote:
> On 6/6/19 5:15 AM, Stefan Brankovic wrote:
>> Optimize Altivec instruction vclzh (Vector Count Leading Zeros Halfword).
>> This instruction counts the number of leading zeros of each halfword element
>> in source register and places result in the appropriate halfword element of
>> destination register.
> For halfword, you're generating 32 operations. A loop over the halfwords,
> similar to the word loop I suggested for the last patch, does not reduce this
> total, since one has to adjust the clz32 result.
>
> For byte, you're generating 64 operations.
>
> These expansions are so big that without host vector support it's probably best
> to leave them out-of-line.
>
> I can imagine a byte clz expansion like
>
> t0 = input >> 4;
> t1 = input << 4;
> cmp = input == 0 ? -1 : 0;
> input = cmp ? t1 : input;
> output = cmp & 4;
>
> t0 = input >> 6;
> t1 = input << 2;
> cmp = input == 0 ? -1 : 0;
> input = cmp ? t1 : input;
> t0 = cmp & 2;
> output += t0;
>
> t1 = input << 1;
> cmp = input >= 0 ? -1 : 0;
> output -= cmp;
>
> cmp = input == 0 ? -1 : 0;
> output -= cmp;
>
> which would expand to 20 x86_64 vector instructions. A halfword expansion
> would require one more round and thus 25 instructions.
I based this patch on performance results and my measurements say that
tcg implementation is still significantly superior to helper
implementation, regardless of somewhat large number of instructions.
I can attach both performance measurements results and disassembly of
both helper and tcg implementations, if you want me to do this.
>
> I'll also note that ARM, Power8, and S390 all support this as a native vector
> operation; only x86_64 would require the above expansion. It probably makes
> sense to add this operation to tcg.
I agree with this, but currently we don't have this implemented in tcg,
so I worked with what I have.
Kind Regards,
Stefan
> r~
next prev parent reply other threads:[~2019-06-17 12:07 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 10:15 [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl, Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions Stefan Brankovic
2019-06-06 16:46 ` Richard Henderson
2019-06-17 11:31 ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions Stefan Brankovic
2019-06-06 17:03 ` Richard Henderson
2019-06-17 11:36 ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 3/8] target/ppc: Optimize emulation of vpkpx instruction Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction Stefan Brankovic
2019-06-06 18:19 ` Richard Henderson
2019-06-17 11:58 ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 5/8] target/ppc: Optimize emulation of vclzd instruction Stefan Brankovic
2019-06-06 18:26 ` Richard Henderson
2019-06-06 10:15 ` [Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction Stefan Brankovic
2019-06-06 18:34 ` Richard Henderson
2019-06-17 11:50 ` Stefan Brankovic
2019-06-06 10:15 ` [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions Stefan Brankovic
2019-06-06 20:38 ` Richard Henderson
2019-06-17 11:42 ` Stefan Brankovic [this message]
2019-06-06 10:15 ` [Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions Stefan Brankovic
2019-06-06 20:43 ` Richard Henderson
2019-06-17 11:43 ` Stefan Brankovic
2019-06-06 17:13 ` [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl, Richard Henderson
2019-06-12 7:31 ` [Qemu-devel] ?==?utf-8?q? ?==?utf-8?q? [PATCH 0/8] Optimize emulation of ten Altivec instructions:?==?utf-8?q? lvsl, Stefan Brankovic
2019-06-17 11:32 ` [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl, Stefan Brankovic
2019-06-07 3:51 ` Howard Spoelstra
-- strict thread matches above, loose matches on Subject: below --
2019-06-19 11:03 [Qemu-devel] [PATCH 0/8] target/ppc: Optimize emulation of some Altivec instructions Stefan Brankovic
2019-06-19 11:03 ` [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions Stefan Brankovic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2629bf10-43ac-8633-b51c-d0bb7a4c1a78@rt-rk.com \
--to=stefan.brankovic@rt-rk.com \
--cc=david@gibson.dropbear.id.au \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).