From: Richard Henderson <richard.henderson@linaro.org>
To: Max Chou <max.chou@sifive.com>,
qemu-devel@nongnu.org, qemu-riscv@nongnu.org
Cc: Palmer Dabbelt <palmer@dabbelt.com>,
Alistair Francis <alistair.francis@wdc.com>,
Bin Meng <bmeng.cn@gmail.com>, Weiwei Li <liwei1518@gmail.com>,
Daniel Henrique Barboza <dbarboza@ventanamicro.com>,
Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Subject: Re: [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store
Date: Thu, 25 Jul 2024 15:51:26 +1000 [thread overview]
Message-ID: <a02bcaba-0712-44f2-924f-8972a4f98ce7@linaro.org> (raw)
In-Reply-To: <20240717133936.713642-3-max.chou@sifive.com>
On 7/17/24 23:39, Max Chou wrote:
> @@ -199,7 +212,7 @@ static void
> vext_ldst_stride(void *vd, void *v0, target_ulong base,
> target_ulong stride, CPURISCVState *env,
> uint32_t desc, uint32_t vm,
> - vext_ldst_elem_fn *ldst_elem,
> + vext_ldst_elem_fn_tlb * ldst_elem,
Extra space: "type *var"
> uint32_t log2_esz, uintptr_t ra)
> {
> uint32_t i, k;
> @@ -221,7 +234,8 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
> continue;
> }
> target_ulong addr = base + stride * i + (k << log2_esz);
> - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
> + ldst_elem(env, adjust_addr(env, addr),
> + (i + k * max_elems) << log2_esz, vd, ra);
Is this some sort of bug fix? It doesn't seem related...
If it is a bug fix, it should be a separate patch.
> /*
> * unit-stride: access elements stored contiguously in memory
> */
>
> /* unmasked unit-stride load and store operation */
> +static void
> +vext_page_ldst_us(CPURISCVState *env, void *vd, target_ulong addr,
> + uint32_t elems, uint32_t nf, uint32_t max_elems,
> + uint32_t log2_esz, bool is_load,
> + vext_ldst_elem_fn_tlb *ldst_tlb,
> + vext_ldst_elem_fn_host *ldst_host, uintptr_t ra)
> +{
> + void *host;
> + int i, k, flags;
> + uint32_t esz = 1 << log2_esz;
> + uint32_t size = (elems * nf) << log2_esz;
> + uint32_t evl = env->vstart + elems;
> + int mmu_index = riscv_env_mmu_index(env, false);
> + MMUAccessType access_type = is_load ? MMU_DATA_LOAD : MMU_DATA_STORE;
You may want to pass in mmu_index, so that it is computed once in the caller.
> +
> + /* Check page permission/pmp/watchpoint/etc. */
> + flags = probe_access_flags(env, adjust_addr(env, addr), size, access_type,
> + mmu_index, true, &host, ra);
> +
> + if (host && flags == 0) {
If flags == 0, host will always be non-null.
You only need flags == 0.
> static void
> vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> - vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl,
> - uintptr_t ra)
> + vext_ldst_elem_fn_tlb *ldst_tlb,
> + vext_ldst_elem_fn_host *ldst_host, uint32_t log2_esz,
> + uint32_t evl, uintptr_t ra, bool is_load)
> {
> - uint32_t i, k;
> + uint32_t k;
> + target_ulong page_split, elems, addr;
> uint32_t nf = vext_nf(desc);
> uint32_t max_elems = vext_max_elems(desc, log2_esz);
> uint32_t esz = 1 << log2_esz;
> + uint32_t msize = nf * esz;
>
> VSTART_CHECK_EARLY_EXIT(env);
>
> - /* load bytes from guest memory */
> - for (i = env->vstart; i < evl; env->vstart = ++i) {
> - k = 0;
> - while (k < nf) {
> - target_ulong addr = base + ((i * nf + k) << log2_esz);
> - ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
> - k++;
> + while (env->vstart < evl) {
VSTART_CHECK_EARLY_EXIT has taken care of this condition for the first page.
We know that one contiguous operation can only consume 1024 bytes, so cannot cross two
pages. Therefore this loop executes exactly once or twice.
I think it would be better to unroll this by hand:
calc page range
if (likely(elems)) {
vext_page_ldst_us(... elems ...);
}
if (unlikely(env->vstart < evl)) {
if (unlikely(page_split % msize)) {
...
}
vext_page_ldst_us(... evl - vstart ...);
}
> + /* Calculate page range */
> + addr = base + ((env->vstart * nf) << log2_esz);
> + page_split = -(addr | TARGET_PAGE_MASK);
> + /* Get number of elements */
> + elems = page_split / msize;
> + if (unlikely(env->vstart + elems >= evl)) {
> + elems = evl - env->vstart;
> + }
> +
> + /* Load/store elements in page */
> + vext_page_ldst_us(env, vd, addr, elems, nf, max_elems, log2_esz,
> + is_load, ldst_tlb, ldst_host, ra);
> +
> + /* Cross page element */
> + if (unlikely((page_split % msize) != 0 && (env->vstart + 1) < evl)) {
> + for (k = 0; k < nf; k++) {
> + addr = base + ((env->vstart * nf + k) << log2_esz);
> + ldst_tlb(env, adjust_addr(env, addr),
> + (k * max_elems + env->vstart) << log2_esz, vd, ra);
> + }
> + env->vstart++;
> }
> }
> - env->vstart = 0;
>
> + env->vstart = 0;
> vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
> }
r~
next prev parent reply other threads:[~2024-07-25 5:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-17 13:39 [RFC PATCH v5 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions Max Chou
2024-07-17 13:39 ` [RFC PATCH v5 1/5] target/riscv: Set vdata.vm field for vector load/store whole register instructions Max Chou
2024-07-17 13:39 ` [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store Max Chou
2024-07-25 5:51 ` Richard Henderson [this message]
2024-07-30 14:24 ` Max Chou
2024-07-17 13:39 ` [RFC PATCH v5 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store Max Chou
2024-07-17 13:39 ` [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions Max Chou
2024-07-25 6:04 ` Richard Henderson
2024-07-30 15:16 ` Max Chou
2024-07-17 13:39 ` [RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance Max Chou
2024-07-25 6:05 ` Richard Henderson
2024-07-30 13:29 ` Max Chou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a02bcaba-0712-44f2-924f-8972a4f98ce7@linaro.org \
--to=richard.henderson@linaro.org \
--cc=alistair.francis@wdc.com \
--cc=bmeng.cn@gmail.com \
--cc=dbarboza@ventanamicro.com \
--cc=liwei1518@gmail.com \
--cc=max.chou@sifive.com \
--cc=palmer@dabbelt.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-riscv@nongnu.org \
--cc=zhiwei_liu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).