From: Richard Henderson <richard.henderson@linaro.org>
To: Daniel Henrique Barboza <dbarboza@ventanamicro.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"open list:RISC-V" <qemu-riscv@nongnu.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
Jeff Law <jlaw@ventanamicro.com>,
Alistair Francis <alistair.francis@wdc.com>
Subject: Re: [RFC] risc-v vector (RVV) emulation performance issues
Date: Tue, 25 Jul 2023 11:53:46 -0700 [thread overview]
Message-ID: <9fc36ebe-6ec4-23dd-bbb6-5333905f7d2f@linaro.org> (raw)
In-Reply-To: <0e54c6c1-2903-7942-eff2-2b8c5e21187e@ventanamicro.com>
On 7/24/23 06:40, Daniel Henrique Barboza wrote:
> Hi,
>
> As some of you are already aware the current RVV emulation could be faster.
> We have at least one commit (bc0ec52eb2, "target/riscv/vector_helper.c:
> skip set tail when vta is zero") that tried to address at least part of the
> problem.
>
> Running a simple program like this:
>
> -------
>
> #define SZ 10000000
>
> int main ()
> {
> int *a = malloc (SZ * sizeof (int));
> int *b = malloc (SZ * sizeof (int));
> int *c = malloc (SZ * sizeof (int));
>
> for (int i = 0; i < SZ; i++)
> c[i] = a[i] + b[i];
> return c[SZ - 1];
> }
>
> -------
>
> And then compiling it without RVV support will run in 50 milis or so:
>
> $ time ~/work/qemu/build/qemu-riscv64 -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128
> ./foo-novect.out
>
> real 0m0.043s
> user 0m0.025s
> sys 0m0.018s
>
> Building the same program with RVV support slows it 4-5 times:
>
> $ time ~/work/qemu/build/qemu-riscv64 -cpu
> rv64,debug=false,vext_spec=v1.0,v=true,vlen=1024 ./foo.out
>
> real 0m0.196s
> user 0m0.177s
> sys 0m0.018s
>
> Using the lowest 'vlen' val allowed (128) will slow down things even further, taking it to
> ~0.260s.
>
>
> 'perf record' shows the following profile on the aforementioned binary:
>
> 23.27% qemu-riscv64 qemu-riscv64 [.] do_ld4_mmu
> 21.11% qemu-riscv64 qemu-riscv64 [.] vext_ldst_us
> 14.05% qemu-riscv64 qemu-riscv64 [.] cpu_ldl_le_data_ra
> 11.51% qemu-riscv64 qemu-riscv64 [.] cpu_stl_le_data_ra
> 8.18% qemu-riscv64 qemu-riscv64 [.] cpu_mmu_lookup
> 8.04% qemu-riscv64 qemu-riscv64 [.] do_st4_mmu
> 2.04% qemu-riscv64 qemu-riscv64 [.] ste_w
> 1.15% qemu-riscv64 qemu-riscv64 [.] lde_w
> 1.02% qemu-riscv64 [unknown] [k] 0xffffffffb3001260
> 0.90% qemu-riscv64 qemu-riscv64 [.] cpu_get_tb_cpu_state
> 0.64% qemu-riscv64 qemu-riscv64 [.] tb_lookup
> 0.64% qemu-riscv64 qemu-riscv64 [.] riscv_cpu_mmu_index
> 0.39% qemu-riscv64 qemu-riscv64 [.] object_dynamic_cast_assert
>
>
> First thing that caught my attention is vext_ldst_us from target/riscv/vector_helper.c:
>
> /* load bytes from guest memory */
> for (i = env->vstart; i < evl; i++, env->vstart++) {
> k = 0;
> while (k < nf) {
> target_ulong addr = base + ((i * nf + k) << log2_esz);
> ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
> k++;
> }
> }
> env->vstart = 0;
>
> Given that this is a unit-stride load that access contiguous elements in memory it
> seems that this loop could be optimized/removed since it's loading/storing bytes
> one by one. I didn't find any TCG op to do that though. I assume that ARM SVE might
> have something of the sorts. Richard, care to comment?
Yes, SVE optimizes this case -- see
https://gitlab.com/qemu-project/qemu/-/blob/master/target/arm/tcg/sve_helper.c?ref_type=heads#L5651
It's not possible to do this generically, due to the predication. There's quite a lot of
machinery that goes into expanding this such that each helper uses the correct host
load/store insn in the fast case.
r~
prev parent reply other threads:[~2023-07-25 18:54 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-24 13:40 [RFC] risc-v vector (RVV) emulation performance issues Daniel Henrique Barboza
2023-07-24 15:23 ` Philippe Mathieu-Daudé
2023-07-25 18:53 ` Richard Henderson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9fc36ebe-6ec4-23dd-bbb6-5333905f7d2f@linaro.org \
--to=richard.henderson@linaro.org \
--cc=alistair.francis@wdc.com \
--cc=dbarboza@ventanamicro.com \
--cc=jlaw@ventanamicro.com \
--cc=palmer@dabbelt.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-riscv@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).