Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: Paolo Savini <paolo.savini@embecosm.com>,
	qemu-devel@nongnu.org, qemu-riscv@nongnu.org
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
	Alistair Francis <alistair.francis@wdc.com>,
	Bin Meng <bmeng.cn@gmail.com>, Weiwei Li <liwei1518@gmail.com>,
	Daniel Henrique Barboza <dbarboza@ventanamicro.com>,
	Liu Zhiwei <zhiwei_liu@linux.alibaba.com>,
	Helene Chelin <helene.chelin@embecosm.com>,
	Nathan Egge <negge@google.com>, Max Chou <max.chou@sifive.com>
Subject: Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.
Date: Wed, 30 Oct 2024 11:40:19 +0000	[thread overview]
Message-ID: <7a046c99-c4e7-4395-8dc9-9139e9bfba06@linaro.org> (raw)
In-Reply-To: <20241029194348.59574-3-paolo.savini@embecosm.com>

On 10/29/24 19:43, Paolo Savini wrote:
> This patch optimizes the emulation of unit-stride load/store RVV instructions
> when the data being loaded/stored per iteration amounts to 16 bytes or more.
> The optimization consists of calling __builtin_memcpy on chunks of data of 16
> bytes between the memory address of the simulated vector register and the
> destination memory address and vice versa.
> This is done only if we have direct access to the RAM of the host machine,
> if the host is little endiand and if it supports atomic 128 bit memory
> operations.
> 
> Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
> ---
>   target/riscv/vector_helper.c    | 17 ++++++++++++++++-
>   target/riscv/vector_internals.h | 12 ++++++++++++
>   2 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 75c24653f0..e1c100e907 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -488,7 +488,22 @@ vext_group_ldst_host(CPURISCVState *env, void *vd, uint32_t byte_end,
>       }
>   
>       fn = fns[is_load][group_size];
> -    fn(vd, byte_offset, host + byte_offset);
> +
> +    /* __builtin_memcpy uses host 16 bytes vector loads and stores if supported.
> +     * We need to make sure that these instructions have guarantees of atomicity.
> +     * E.g. x86 processors provide strong guarantees of atomicity for 16-byte
> +     * memory operations if the memory operands are 16-byte aligned */
> +    if (!HOST_BIG_ENDIAN && (byte_offset + 16 < byte_end) &&
> +		    ((byte_offset % 16) == 0) && HOST_128_ATOMIC_MEM_OP) {
> +      group_size = MO_128;
> +      if (is_load) {
> +        __builtin_memcpy((uint8_t *)(vd + byte_offset), (uint8_t *)(host + byte_offset), 16);
> +      } else {
> +        __builtin_memcpy((uint8_t *)(host + byte_offset), (uint8_t *)(vd + byte_offset), 16);
> +      }

I said this last time and I'll say it again:

     __builtin_memcpy DOES NOT equal VMOVDQA

Your comment there about 'if supported' does not really apply.

(1) You'd need a compile-time test not the runtime test that is HOST_128_ATOMIC_MEM_OP to 
ensure that the compiler knows that AVX vector support is present.

(2) Even then, you're not giving the compiler any reason to use VMOVDQA over VMOVDQU or 
ANY OTHER vector load/store.  So you're not really doing what you say you're doing.


Frankly, I think this entire patch set is premature.
We need to get Max Chou's patch set landed first.


r~

next prev parent reply	other threads:[~2024-10-30 11:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-29 19:43 [RFC v4 0/2] target/riscv: add wrapper for target specific macros in atomicity check Paolo Savini
2024-10-29 19:43 ` [RFC v4 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini
2024-11-06 16:08   ` Daniel Henrique Barboza
2024-10-29 19:43 ` [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data Paolo Savini
2024-10-30 11:40   ` Richard Henderson [this message]
2024-10-30 15:25     ` Paolo Savini
2024-11-04 12:48       ` Richard Henderson
2024-11-07 12:58         ` Daniel Henrique Barboza
2024-11-08  9:11           ` Richard Henderson
2024-11-11 16:04             ` Paolo Savini
2024-11-14 16:09               ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a046c99-c4e7-4395-8dc9-9139e9bfba06@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=alistair.francis@wdc.com \
    --cc=bmeng.cn@gmail.com \
    --cc=dbarboza@ventanamicro.com \
    --cc=helene.chelin@embecosm.com \
    --cc=liwei1518@gmail.com \
    --cc=max.chou@sifive.com \
    --cc=negge@google.com \
    --cc=palmer@dabbelt.com \
    --cc=paolo.savini@embecosm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=zhiwei_liu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).