From: Richard Henderson <richard.henderson@linaro.org>
To: Paolo Savini <paolo.savini@embecosm.com>,
qemu-devel@nongnu.org, qemu-riscv@nongnu.org
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
Alistair Francis <alistair.francis@wdc.com>,
Bin Meng <bmeng.cn@gmail.com>, Weiwei Li <liwei1518@gmail.com>,
Daniel Henrique Barboza <dbarboza@ventanamicro.com>,
Liu Zhiwei <zhiwei_liu@linux.alibaba.com>,
Helene Chelin <helene.chelin@embecosm.com>,
Max Chou <max.chou@sifive.com>
Subject: Re: [RFC 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores
Date: Sat, 27 Jul 2024 17:13:17 +1000 [thread overview]
Message-ID: <aff5f930-d291-4ff5-8f24-53291059d59a@linaro.org> (raw)
In-Reply-To: <20240717153040.11073-2-paolo.savini@embecosm.com>
On 7/18/24 01:30, Paolo Savini wrote:
> From: Helene CHELIN <helene.chelin@embecosm.com>
>
> This patch improves the performance of the emulation of the RVV unit-stride
> loads and stores in the following cases:
>
> - when the data being loaded/stored per iteration amounts to 8 bytes or less.
> - when the vector length is 16 bytes (VLEN=128) and there's no grouping of the
> vector registers (LMUL=1).
>
> The optimization consists of avoiding the overhead of probing the RAM of the
> host machine and doing a loop load/store on the input data grouped in chunks
> of as many bytes as possible (8,4,2,1 bytes).
>
> Co-authored-by: Helene CHELIN <helene.chelin@embecosm.com>
> Co-authored-by: Paolo Savini <paolo.savini@embecosm.com>
>
> Signed-off-by: Helene CHELIN <helene.chelin@embecosm.com>
> ---
> target/riscv/vector_helper.c | 46 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 29849a8b66..4b444c6bc5 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -633,6 +633,52 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
>
> VSTART_CHECK_EARLY_EXIT(env);
>
> + /* For data sizes <= 64 bits and for LMUL=1 with VLEN=128 bits we get a
> + * better performance by doing a simple simulation of the load/store
> + * without the overhead of prodding the host RAM */
> + if ((nf == 1) && ((evl << log2_esz) <= 8 ||
> + ((vext_lmul(desc) == 0) && (simd_maxsz(desc) == 16)))) {
> +
> + uint32_t evl_b = evl << log2_esz;
> +
> + for (uint32_t j = env->vstart; j < evl_b;) {
> + addr = base + j;
> + if ((evl_b - j) >= 8) {
> + if (is_load)
> + lde_d_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + else
> + ste_d_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + j += 8;
> + }
> + else if ((evl_b - j) >= 4) {
> + if (is_load)
> + lde_w_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + else
> + ste_w_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + j += 4;
> + }
> + else if ((evl_b - j) >= 2) {
> + if (is_load)
> + lde_h_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + else
> + ste_h_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + j += 2;
> + }
> + else {
> + if (is_load)
> + lde_b_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + else
> + ste_b_tlb(env, adjust_addr(env, addr), j, vd, ra);
> + j += 1;
> + }
> + }
For system mode, this performs the tlb lookup N times, and so will not be an improvement.
This will not work on a big-endian host.
r~
next prev parent reply other threads:[~2024-07-27 7:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-17 15:30 [RFC 0/2] Improve the performance of unit-stride RVV ld/st on Paolo Savini
2024-07-17 15:30 ` [RFC 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini
2024-07-26 12:22 ` Daniel Henrique Barboza
2024-07-27 7:13 ` Richard Henderson [this message]
2024-07-31 12:38 ` Daniel Henrique Barboza
2024-07-17 15:30 ` [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data Paolo Savini
2024-07-26 12:27 ` Daniel Henrique Barboza
2024-07-27 7:15 ` Richard Henderson
2024-09-10 11:20 ` Paolo Savini
2024-09-10 18:18 ` Richard Henderson
2024-07-26 12:31 ` [RFC 0/2] Improve the performance of unit-stride RVV ld/st on Daniel Henrique Barboza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aff5f930-d291-4ff5-8f24-53291059d59a@linaro.org \
--to=richard.henderson@linaro.org \
--cc=alistair.francis@wdc.com \
--cc=bmeng.cn@gmail.com \
--cc=dbarboza@ventanamicro.com \
--cc=helene.chelin@embecosm.com \
--cc=liwei1518@gmail.com \
--cc=max.chou@sifive.com \
--cc=palmer@dabbelt.com \
--cc=paolo.savini@embecosm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-riscv@nongnu.org \
--cc=zhiwei_liu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).