qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions
@ 2024-06-13 14:19 Max Chou
  2024-06-13 14:19 ` [RFC PATCH v3 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Max Chou @ 2024-06-13 14:19 UTC (permalink / raw)
  To: qemu-devel, qemu-riscv
  Cc: Richard Henderson, Paolo Bonzini, Palmer Dabbelt,
	Alistair Francis, Bin Meng, Weiwei Li, Daniel Henrique Barboza,
	Liu Zhiwei, Max Chou

Hi,

This RFC patch set tries to fix the issue of
https://gitlab.com/qemu-project/qemu/-/issues/2137.

In this new version, we added patches that
1. Provide a fast path to direct access the host ram for some vector
   load/store instructions (e.g. unmasked vector unit-stride load/store
   instructions) and perform virtual address resolution once for the
   entire vector at beginning of helper function. (Thanks for Richard
   Henderson's suggestion)
2. Replace the group elements load/store TCG ops by the group element
   load/store flow in helper functions with some assumption (e.g. no
   masking, continuous memory load/store, the endian of host and guest
   architecture are the same). (Thanks for Richard Henderson's
   suggestion)
3. Try inline the vector load/store related functions that corresponding
   most of the execution time.

This version can improve the performance of the test case provided in
https://gitlab.com/qemu-project/qemu/-/issues/2137#note_1757501369
- QEMU user mode (vlen=512): from ~51.8 sec. to ~4.5 sec.
- QEMU system mode (vlen=512): from ~125.6 sec to ~6.6 sec.

Series based on riscv-to-apply.next branch (commit d82f37f).

Changes from v2:
- Drop v2 patches 1/4/5/6
- patch 2
    - Provide direct access host ram flow for vector unit-stride ld/st
- patch 3
    - Provide direct access host ram flow for vector whole reg ld/st
- patch 4
    - Provide group element load/store flow for vector continuous ld/st
- patch 5
    - Extend v2 patch 3 to more vector ld/st functions

Previous version:
- v1: https://lore.kernel.org/all/20240215192823.729209-1-max.chou@sifive.com/
- v2: https://lore.kernel.org/all/20240531174504.281461-1-max.chou@sifive.com/

Max Chou (5):
  accel/tcg: Avoid unnecessary call overhead from
    qemu_plugin_vcpu_mem_cb
  target/riscv: rvv: Provide a fast path using direct access to host ram
    for unmasked unit-stride load/store
  target/riscv: rvv: Provide a fast path using direct access to host ram
    for unit-stride whole register load/store
  target/riscv: rvv: Provide group continuous ld/st flow for unit-stride
    ld/st instructions
  target/riscv: Inline unit-stride ld/st and corresponding functions for
    performance

 accel/tcg/ldst_common.c.inc             |   8 +-
 target/riscv/insn_trans/trans_rvv.c.inc |   3 +
 target/riscv/vector_helper.c            | 847 +++++++++++++++++++-----
 target/riscv/vector_internals.h         |  48 ++
 4 files changed, 738 insertions(+), 168 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-13 15:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 14:19 [RFC PATCH v3 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions Max Chou
2024-06-13 14:19 ` [RFC PATCH v3 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb Max Chou
2024-06-13 14:19 ` [RFC PATCH v3 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store Max Chou
2024-06-13 14:19 ` [RFC PATCH v3 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store Max Chou
2024-06-13 14:19 ` [RFC PATCH v3 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions Max Chou
2024-06-13 14:19 ` [RFC PATCH v3 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance Max Chou
2024-06-13 15:42 ` [RFC PATCH v3 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions Daniel Henrique Barboza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).