qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions
@ 2024-06-13 17:51 Max Chou
  2024-06-13 17:51 ` [RFC PATCH v4 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Max Chou @ 2024-06-13 17:51 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Richard Henderson, Paolo Bonzini, Palmer Dabbelt,
	Alistair Francis, Bin Meng, Weiwei Li, Daniel Henrique Barboza,
	Liu Zhiwei, Max Chou

Hi,

Sorry for the quick update the version, this version fixes the
cross-page probe checking bug that I forgot to apply to the v3 version.

This RFC patch set tries to fix the issue of
https://gitlab.com/qemu-project/qemu/-/issues/2137.

In this RFC, we added patches that
1. Provide a fast path to direct access the host ram for some vector
   load/store instructions (e.g. unmasked vector unit-stride load/store
   instructions) and perform virtual address resolution once for the
   entire vector at beginning of helper function. (Thanks for Richard
   Henderson's suggestion)
2. Replace the group elements load/store TCG ops by the group element
   load/store flow in helper functions with some assumption (e.g. no
   masking, continuous memory load/store, the endian of host and guest
   architecture are the same). (Thanks for Richard Henderson's
   suggestion)
3. Try inline the vector load/store related functions that corresponding
   most of the execution time.

This version can improve the performance of the test case provided in
https://gitlab.com/qemu-project/qemu/-/issues/2137#note_1757501369
- QEMU user mode (vlen=512): from ~51.8 sec. to ~4.5 sec.
- QEMU system mode (vlen=512): from ~125.6 sec to ~6.6 sec.

This RFC is tested with SPEC CPU2006 with test input.
We will try to measure the performance on SPEC CPU2006 benchmarks.

Series based on riscv-to-apply.next branch (commit d82f37f).

Changes from v3:
- patch 2
    - Modify vext_cont_ldst_pages for corss-page checking
- patch 3
    - Modify vext_ldst_whole for vext_cont_ldst_pages

Previous version:
- v1: https://lore.kernel.org/all/20240215192823.729209-1-max.chou@sifive.com/
- v2: https://lore.kernel.org/all/20240531174504.281461-1-max.chou@sifive.com/
- v3: https://lore.kernel.org/all/20240613141906.1276105-1-max.chou@sifive.com/

Max Chou (5):
  accel/tcg: Avoid unnecessary call overhead from
    qemu_plugin_vcpu_mem_cb
  target/riscv: rvv: Provide a fast path using direct access to host ram
    for unmasked unit-stride load/store
  target/riscv: rvv: Provide a fast path using direct access to host ram
    for unit-stride whole register load/store
  target/riscv: rvv: Provide group continuous ld/st flow for unit-stride
    ld/st instructions
  target/riscv: Inline unit-stride ld/st and corresponding functions for
    performance

 accel/tcg/ldst_common.c.inc             |   8 +-
 target/riscv/insn_trans/trans_rvv.c.inc |   3 +
 target/riscv/vector_helper.c            | 854 +++++++++++++++++++-----
 target/riscv/vector_internals.h         |  48 ++
 4 files changed, 745 insertions(+), 168 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-25 15:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 17:51 [RFC PATCH v4 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions Max Chou
2024-06-13 17:51 ` [RFC PATCH v4 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
2024-06-20  2:48   ` Richard Henderson
2024-06-20  7:50   ` Frank Chang
2024-06-20 14:27   ` Alex Bennée
2024-06-13 17:51 ` [RFC PATCH v4 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store Max Chou
2024-06-20  4:29   ` Richard Henderson
2024-06-25 15:14     ` Max Chou
2024-06-20  4:41   ` Richard Henderson
2024-06-13 17:51 ` [RFC PATCH v4 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store Max Chou
2024-06-13 17:51 ` [RFC PATCH v4 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions Max Chou
2024-06-20  4:38   ` Richard Henderson
2024-06-24  6:50     ` Max Chou
2024-06-13 17:51 ` [RFC PATCH v4 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance Max Chou
2024-06-20  4:44   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).