[PATCH v9 0/2] Generate strided vector loads/stores with tcg nodes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v9 0/2] Generate strided vector loads/stores with tcg nodes
@ 2025-09-16  9:21 Chao Liu
  2025-09-16  9:21 ` [PATCH v9 1/2] target/riscv: Use tcg nodes for strided vector ld/st generation Chao Liu
  2025-09-16  9:21 ` [PATCH v9 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction Chao Liu
  0 siblings, 2 replies; 3+ messages in thread
From: Chao Liu @ 2025-09-16  9:21 UTC (permalink / raw)
  To: richard.henderson, paolo.savini, npiggin, ebiggers, dbarboza,
	palmer, alistair.francis, liwei1518, zhiwei_liu
  Cc: qemu-riscv, qemu-devel, Chao Liu

Hi all,

Thanks Richard for the review. In patch v9:

- Simplify the implementation of gen_check_vext_elem_mask():
  remove the `mask` argument, compute the mask directly inside the function,
  and eliminate redundant code.

- Limit the bit width to 8 bits when loading the mask from memory.

- Remove the `vreg` argument in gen_ldst_vreg().

History of changes:

patch v8:
- Use the right TCGv type for each variable — for example, make mask_elem
  type TCGv_i64.
- Use tcg_gen_trunc_i64_ptr() to change TCGv types — don't use C-style
  casting.
- Use TCG_COND_TSTNE, not TCG_COND_NE in tcg_gen_brcond_i64() to represent:
  if (vext_elem_mask(v0, i) != 0)
  https://lore.kernel.org/qemu-devel/cover.1757690407.git.chao.liu@zevorn.cn/

patch v7:
- Standardize the subject line of patch 1 and remove the trailing period.
- Split into sub-functions to improve the patch's code readability and
  facilitate review.
- Use more faster TCG ops, use tcg_gen_andi_tl() instead of tcg_gen_rem_tl().
- Add a tested-by signature for patch 2, as Eric has already tested it.
  https://lore.kernel.org/qemu-devel/cover.1756975571.git.chao.liu@zevorn.cn/

patch v6:
- If a strided vector memory access instruction has non-zero vstart, 
  check it through vlse/vsse helpers function.
- Adjust the tcg test Makefile.
  https://lore.kernel.org/qemu-devel/cover.1756906528.git.chao.liu@zevorn.cn/

Patch v5:
- Removed the redundant call to mark_vs_dirty(s) within the
  gen_ldst_stride_main_loop() function.
  https://lore.kernel.org/qemu-riscv/cover.1755609029.git.chao.liu@zevorn.cn/

Patch v4:
- Use ctz32() replace to for-loop
  https://lore.kernel.org/qemu-devel/cover.1755333616.git.chao.liu@yeah.net/

Patch v3:
- Fix the get_log2() function:
  https://lore.kernel.org/qemu-riscv/cover.1755287531.git.chao.liu@yeah.net/T/#t
- Add test for vlsseg8e32 instruction.
- Rebase on top of the latest master.

Patch v2:
- Split the TCG node emulation of the complex strided load/store operation into
  two separate functions to simplify the implementation:
  https://lore.kernel.org/qemu-riscv/20250312155547.289642-1-paolo.savini@embecosm.com/

Patch v1:
- Paolo submitted the initial version of the patch.
  https://lore.kernel.org/qemu-devel/20250211182056.412867-1-paolo.savini@embecosm.com/


Thanks,
Chao

Chao Liu (2):
  target/riscv: Use tcg nodes for strided vector ld/st generation
  tests/tcg/riscv64: Add test for vlsseg8e32 instruction

 target/riscv/insn_trans/trans_rvv.c.inc   | 354 ++++++++++++++++++++--
 tests/tcg/riscv64/Makefile.softmmu-target |   7 +-
 tests/tcg/riscv64/test-vlsseg8e32.S       | 107 +++++++
 3 files changed, 450 insertions(+), 18 deletions(-)
 create mode 100644 tests/tcg/riscv64/test-vlsseg8e32.S

-- 
2.51.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v9 1/2] target/riscv: Use tcg nodes for strided vector ld/st generation
  2025-09-16  9:21 [PATCH v9 0/2] Generate strided vector loads/stores with tcg nodes Chao Liu
@ 2025-09-16  9:21 ` Chao Liu
  2025-09-16  9:21 ` [PATCH v9 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction Chao Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Chao Liu @ 2025-09-16  9:21 UTC (permalink / raw)
  To: richard.henderson, paolo.savini, npiggin, ebiggers, dbarboza,
	palmer, alistair.francis, liwei1518, zhiwei_liu
  Cc: qemu-riscv, qemu-devel, Chao Liu

This commit improves the performance of QEMU when emulating strided vector
loads and stores by substituting the call for the helper function with the
generation of equivalent TCG operations.

PS:

An implementation is permitted to cause an illegal instruction if vstart
is not 0 and it is set to a value that can not be produced implicitly by
the implementation, but memory accesses will generally always need to
deal with page faults.

So, if a strided vector memory access instruction has non-zero vstart,
check it through vlse/vsse helpers function.

Signed-off-by: Paolo Savini <paolo.savini@embecosm.com>
Signed-off-by: Chao Liu <chao.liu@zevorn.cn>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 354 ++++++++++++++++++++++--
 1 file changed, 337 insertions(+), 17 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 71f98fb350..81625c9983 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -16,6 +16,7 @@
  */
 #include "tcg/tcg-op-gvec.h"
 #include "tcg/tcg-gvec-desc.h"
+#include "tcg/tcg-temp-internal.h"
 #include "internals.h"
 
 static inline bool is_overlapped(const int8_t astart, int8_t asize,
@@ -863,15 +864,290 @@ static bool st_us_mask_check(DisasContext *s, arg_vsm_v *a, uint8_t eew)
 GEN_VEXT_TRANS(vlm_v, MO_8, vlm_v, ld_us_mask_op, ld_us_mask_check)
 GEN_VEXT_TRANS(vsm_v, MO_8, vsm_v, st_us_mask_op, st_us_mask_check)
 
+/*
+ * MAXSZ returns the maximum vector size can be operated in bytes,
+ * which is used in GVEC IR when vl_eq_vlmax flag is set to true
+ * to accelerate vector operation.
+ */
+static inline uint32_t MAXSZ(DisasContext *s)
+{
+    int max_sz = s->cfg_ptr->vlenb << 3;
+    return max_sz >> (3 - s->lmul);
+}
+
+static inline uint32_t get_log2(uint32_t a)
+{
+    assert(is_power_of_2(a));
+    return ctz32(a);
+}
+
+typedef void gen_tl_ldst(TCGv, TCGv_ptr, tcg_target_long);
+
+static void gen_ldst_vreg(DisasContext *s, TCGv_i64 dest_offs, TCGv_i64 addr,
+                          gen_tl_ldst *ld_fn, gen_tl_ldst *st_fn, bool is_load)
+{
+    MemOp atomicity = (s->sew == 0) ? MO_ATOM_NONE : MO_ATOM_IFALIGN_PAIR;
+    TCGv_ptr dest_ptr = tcg_temp_new_ptr();
+    TCGv_i64 vreg = tcg_temp_new_i64();
+    tcg_gen_trunc_i64_ptr(dest_ptr, dest_offs);
+
+    if (is_load) {
+        tcg_gen_qemu_ld_tl(vreg, addr, s->mem_idx, MO_LE | s->sew | atomicity);
+        st_fn(vreg, dest_ptr, 0);
+    } else {
+        ld_fn(vreg, dest_ptr, 0);
+        tcg_gen_qemu_st_tl(vreg, addr, s->mem_idx, MO_LE | s->sew | atomicity);
+    }
+    tcg_temp_free_ptr(dest_ptr);
+    tcg_temp_free_i64(vreg);
+}
+
+/*
+ * Check whether the i bit of the mask is 0 or 1.
+ *
+ * static inline int vext_elem_mask(void *v0, int index)
+ * {
+ *     int idx = index / 64;
+ *     int pos = index % 64;
+ *     return (((uint64_t *)v0)[idx] >> pos) & 1;
+ * }
+ *
+ * And
+ *
+ * if (vext_elem_mask(v0, i) != 0) {
+ *     goto label;
+ * }
+ */
+static void gen_check_vext_elem_mask(DisasContext *s, TCGLabel *label,
+                                     TCGv_i64 mask_offs)
+{
+    TCGv_i64 temp = tcg_temp_new_i64();
+    TCGv_ptr ptr = tcg_temp_new_ptr();
+    TCGv_i64 elem = tcg_temp_new_i64();
+
+    tcg_gen_shri_tl(temp, mask_offs, 3);
+    tcg_gen_trunc_i64_ptr(ptr, temp);
+    tcg_gen_add_ptr(ptr, ptr, tcg_env);
+
+    tcg_gen_ld8u_i64(elem, ptr, 0);
+    tcg_gen_andi_tl(temp, mask_offs, 7);
+    tcg_gen_shr_tl(elem, elem, temp);
+    tcg_gen_brcond_i64(TCG_COND_TSTNE, elem, tcg_constant_i64(1), label);
+
+    tcg_temp_free_i64(temp);
+    tcg_temp_free_ptr(ptr);
+    tcg_temp_free_i64(elem);
+}
+
+static void gen_vext_set_elems_1s(TCGv dest, TCGv_i64 mask_offs, int sew,
+                                  gen_tl_ldst *st_fn, bool is_load)
+{
+    if (is_load) {
+        TCGv_ptr ptr = tcg_temp_new_ptr();
+        tcg_gen_shli_tl(mask_offs, mask_offs, sew);
+        tcg_gen_add_tl(mask_offs, mask_offs, dest);
+        tcg_gen_trunc_i64_ptr(ptr, mask_offs);
+        st_fn(tcg_constant_tl(-1), ptr, 0);
+        tcg_temp_free_ptr(ptr);
+    }
+}
+
+/*
+ * Simulate the strided load/store main loop:
+ *
+ * for (i = env->vstart; i < env->vl; env->vstart = ++i) {
+ *     k = 0;
+ *     while (k < nf) {
+ *         if (!vm && !vext_elem_mask(v0, i)) {
+ *             vext_set_elems_1s(vd, vma, (i + k * max_elems) * esz,
+ *                               (i + k * max_elems + 1) * esz);
+ *             k++;
+ *             continue;
+ *         }
+ *         target_ulong addr = base + stride * i + (k << log2_esz);
+ *         ldst(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
+ *         k++;
+ *     }
+ * }
+ */
+static void gen_ldst_stride_main_loop(DisasContext *s, TCGv dest, uint32_t rs1,
+                                      uint32_t rs2, uint32_t vm, uint32_t nf,
+                                      gen_tl_ldst *ld_fn, gen_tl_ldst *st_fn,
+                                      bool is_load)
+{
+    TCGv_i64 addr = tcg_temp_new_i64();
+    TCGv base = get_gpr(s, rs1, EXT_NONE);
+    TCGv stride = get_gpr(s, rs2, EXT_NONE);
+
+    TCGv i = tcg_temp_new();
+    TCGv i_esz = tcg_temp_new();
+    TCGv k = tcg_temp_new();
+    TCGv k_esz = tcg_temp_new();
+    TCGv k_max = tcg_temp_new();
+    TCGv_i64 mask_offs = tcg_temp_new_i64();
+    TCGv_i64 dest_offs = tcg_temp_new_i64();
+    TCGv_i64 stride_offs = tcg_temp_new_i64();
+
+    uint32_t max_elems = MAXSZ(s) >> s->sew;
+
+    TCGLabel *start = gen_new_label();
+    TCGLabel *end = gen_new_label();
+    TCGLabel *start_k = gen_new_label();
+    TCGLabel *inc_k = gen_new_label();
+    TCGLabel *end_k = gen_new_label();
+
+    /* Start of outer loop. */
+    tcg_gen_mov_tl(i, cpu_vstart);
+    gen_set_label(start);
+    tcg_gen_brcond_tl(TCG_COND_GE, i, cpu_vl, end);
+    tcg_gen_shli_tl(i_esz, i, s->sew);
+
+    /* Start of inner loop. */
+    tcg_gen_movi_tl(k, 0);
+    gen_set_label(start_k);
+    tcg_gen_brcond_tl(TCG_COND_GE, k, tcg_constant_tl(nf), end_k);
+
+    /*
+     * If we are in mask agnostic regime and the operation is not unmasked we
+     * set the inactive elements to 1.
+     */
+    if (!vm && s->vma) {
+        TCGLabel *active_element = gen_new_label();
+        /* (i + k * max_elems) * esz */
+        tcg_gen_shli_tl(mask_offs, k, get_log2(max_elems << s->sew));
+        tcg_gen_add_tl(mask_offs, mask_offs, i_esz);
+
+        /*
+         * Check whether the i bit of the mask is 0 or 1.
+         * If it is 0, set masked-off elements;
+         * otherwise, directly load/store the vector register.
+         */
+        gen_check_vext_elem_mask(s, active_element, mask_offs);
+
+        /*
+         * Set masked-off elements in the destination vector register to 1s.
+         * Store instructions simply skip this bit as memory ops access memory
+         * only for active elements.
+         */
+        gen_vext_set_elems_1s(dest, mask_offs, s->sew, st_fn, is_load);
+
+        tcg_gen_br(inc_k);
+        gen_set_label(active_element);
+    }
+
+    /*
+     * The element is active, calculate the address with stride:
+     * target_ulong addr = base + stride * i + (k << log2_esz);
+     */
+    tcg_gen_mul_tl(stride_offs, stride, i);
+    tcg_gen_shli_tl(k_esz, k, s->sew);
+    tcg_gen_add_tl(stride_offs, stride_offs, k_esz);
+    tcg_gen_add_tl(addr, base, stride_offs);
+
+    /* Calculate the offset in the dst/src vector register. */
+    tcg_gen_shli_tl(k_max, k, get_log2(max_elems));
+    tcg_gen_add_tl(dest_offs, i, k_max);
+    tcg_gen_shli_tl(dest_offs, dest_offs, s->sew);
+    tcg_gen_add_tl(dest_offs, dest_offs, dest);
+
+    /* Load/Store vector register. */
+    gen_ldst_vreg(s, dest_offs, addr, ld_fn, st_fn, is_load);
+
+    /*
+     * We don't execute the load/store above if the element was inactive.
+     * We jump instead directly to incrementing k and continuing the loop.
+     */
+    if (!vm && s->vma) {
+        gen_set_label(inc_k);
+    }
+    tcg_gen_addi_tl(k, k, 1);
+    tcg_gen_br(start_k);
+
+    /* End of the inner loop. */
+    gen_set_label(end_k);
+
+    tcg_gen_addi_tl(i, i, 1);
+    tcg_gen_mov_tl(cpu_vstart, i);
+    tcg_gen_br(start);
+
+    /* End of the outer loop. */
+    gen_set_label(end);
+
+    return;
+}
+
+/*
+ * Set the tail bytes of the strided loads/stores to 1:
+ *
+ * for (k = 0; k < nf; ++k) {
+ *     cnt = (k * max_elems + vl) * esz;
+ *     tot = (k * max_elems + max_elems) * esz;
+ *     for (i = cnt; i < tot; i += esz) {
+ *         store_1s(-1, vd[vl+i]);
+ *     }
+ * }
+ */
+static void gen_ldst_stride_tail_loop(DisasContext *s, TCGv dest, uint32_t nf,
+                                      gen_tl_ldst *st_fn)
+{
+    TCGv i = tcg_temp_new();
+    TCGv k = tcg_temp_new();
+    TCGv tail_cnt = tcg_temp_new();
+    TCGv tail_tot = tcg_temp_new();
+    TCGv tail_addr = tcg_temp_new();
+
+    TCGLabel *start = gen_new_label();
+    TCGLabel *end = gen_new_label();
+    TCGLabel *start_i = gen_new_label();
+    TCGLabel *end_i = gen_new_label();
+
+    uint32_t max_elems_b = MAXSZ(s);
+    uint32_t esz = 1 << s->sew;
+
+    /* Start of the outer loop. */
+    tcg_gen_movi_tl(k, 0);
+    tcg_gen_shli_tl(tail_cnt, cpu_vl, s->sew);
+    tcg_gen_movi_tl(tail_tot, max_elems_b);
+    tcg_gen_add_tl(tail_addr, dest, tail_cnt);
+    gen_set_label(start);
+    tcg_gen_brcond_tl(TCG_COND_GE, k, tcg_constant_tl(nf), end);
+
+    /* Start of the inner loop. */
+    tcg_gen_mov_tl(i, tail_cnt);
+    gen_set_label(start_i);
+    tcg_gen_brcond_tl(TCG_COND_GE, i, tail_tot, end_i);
+
+    /* store_1s(-1, vd[vl+i]); */
+    st_fn(tcg_constant_tl(-1), (TCGv_ptr)tail_addr, 0);
+    tcg_gen_addi_tl(tail_addr, tail_addr, esz);
+    tcg_gen_addi_tl(i, i, esz);
+    tcg_gen_br(start_i);
+
+    /* End of the inner loop. */
+    gen_set_label(end_i);
+
+    /* Update the counts */
+    tcg_gen_addi_tl(tail_cnt, tail_cnt, max_elems_b);
+    tcg_gen_addi_tl(tail_tot, tail_cnt, max_elems_b);
+    tcg_gen_addi_tl(k, k, 1);
+    tcg_gen_br(start);
+
+    /* End of the outer loop. */
+    gen_set_label(end);
+
+    return;
+}
+
 /*
  *** stride load and store
  */
 typedef void gen_helper_ldst_stride(TCGv_ptr, TCGv_ptr, TCGv,
                                     TCGv, TCGv_env, TCGv_i32);
 
-static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
-                              uint32_t data, gen_helper_ldst_stride *fn,
-                              DisasContext *s)
+static
+bool gen_call_helper_ldst_stride(uint32_t vd, uint32_t rs1, uint32_t rs2,
+                                 uint32_t data, gen_helper_ldst_stride *fn,
+                                 DisasContext *s)
 {
     TCGv_ptr dest, mask;
     TCGv base, stride;
@@ -895,11 +1171,66 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
     return true;
 }
 
+static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
+                              uint32_t data, gen_helper_ldst_stride *fn,
+                              DisasContext *s, bool is_load)
+{
+    if (!s->vstart_eq_zero) {
+        /* vstart != 0 helper slowpath */
+        return gen_call_helper_ldst_stride(vd, rs1, rs2, data, fn, s);
+    }
+
+    TCGv dest = tcg_temp_new();
+
+    uint32_t nf = FIELD_EX32(data, VDATA, NF);
+    uint32_t vm = FIELD_EX32(data, VDATA, VM);
+
+    /* Destination register and mask register */
+    tcg_gen_addi_tl(dest, (TCGv)tcg_env, vreg_ofs(s, vd));
+
+    /*
+     * Select the appropriate load/store to retrieve data from the vector
+     * register given a specific sew.
+     */
+    static gen_tl_ldst * const ld_fns[4] = {
+        tcg_gen_ld8u_tl, tcg_gen_ld16u_tl,
+        tcg_gen_ld32u_tl, tcg_gen_ld_tl
+    };
+
+    static gen_tl_ldst * const st_fns[4] = {
+        tcg_gen_st8_tl, tcg_gen_st16_tl,
+        tcg_gen_st32_tl, tcg_gen_st_tl
+    };
+
+    gen_tl_ldst *ld_fn = ld_fns[s->sew];
+    gen_tl_ldst *st_fn = st_fns[s->sew];
+
+    if (ld_fn == NULL || st_fn == NULL) {
+        return false;
+    }
+
+    mark_vs_dirty(s);
+
+    gen_ldst_stride_main_loop(s, dest, rs1, rs2, vm, nf, ld_fn, st_fn, is_load);
+
+    tcg_gen_movi_tl(cpu_vstart, 0);
+
+    /*
+     * Set the tail bytes to 1 if tail agnostic:
+     */
+    if (s->vta != 0 && is_load) {
+        gen_ldst_stride_tail_loop(s, dest, nf, st_fn);
+    }
+
+    finalize_rvv_inst(s);
+    return true;
+}
+
 static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
 {
     uint32_t data = 0;
     gen_helper_ldst_stride *fn;
-    static gen_helper_ldst_stride * const fns[4] = {
+    static gen_helper_ldst_stride *const fns[4] = {
         gen_helper_vlse8_v, gen_helper_vlse16_v,
         gen_helper_vlse32_v, gen_helper_vlse64_v
     };
@@ -915,7 +1246,7 @@ static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
     data = FIELD_DP32(data, VDATA, NF, a->nf);
     data = FIELD_DP32(data, VDATA, VTA, s->vta);
     data = FIELD_DP32(data, VDATA, VMA, s->vma);
-    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s, true);
 }
 
 static bool ld_stride_check(DisasContext *s, arg_rnfvm* a, uint8_t eew)
@@ -949,7 +1280,7 @@ static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
         return false;
     }
 
-    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+    return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s, false);
 }
 
 static bool st_stride_check(DisasContext *s, arg_rnfvm* a, uint8_t eew)
@@ -1300,17 +1631,6 @@ GEN_LDST_WHOLE_TRANS(vs8r_v, int8_t, 8, false)
  *** Vector Integer Arithmetic Instructions
  */
 
-/*
- * MAXSZ returns the maximum vector size can be operated in bytes,
- * which is used in GVEC IR when vl_eq_vlmax flag is set to true
- * to accelerate vector operation.
- */
-static inline uint32_t MAXSZ(DisasContext *s)
-{
-    int max_sz = s->cfg_ptr->vlenb * 8;
-    return max_sz >> (3 - s->lmul);
-}
-
 static bool opivv_check(DisasContext *s, arg_rmrr *a)
 {
     return require_rvv(s) &&
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v9 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction
  2025-09-16  9:21 [PATCH v9 0/2] Generate strided vector loads/stores with tcg nodes Chao Liu
  2025-09-16  9:21 ` [PATCH v9 1/2] target/riscv: Use tcg nodes for strided vector ld/st generation Chao Liu
@ 2025-09-16  9:21 ` Chao Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Chao Liu @ 2025-09-16  9:21 UTC (permalink / raw)
  To: richard.henderson, paolo.savini, npiggin, ebiggers, dbarboza,
	palmer, alistair.francis, liwei1518, zhiwei_liu
  Cc: qemu-riscv, qemu-devel, Chao Liu, Chao Liu

From: Chao Liu <chao.liu@yeah.net>

This case, it copied 64 bytes from a0 to a1 with vlsseg8e32.

Signed-off-by: Chao Liu <chao.liu@zevorn.cn>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Eric Biggers <ebiggers@kernel.org>
---
 tests/tcg/riscv64/Makefile.softmmu-target |   7 +-
 tests/tcg/riscv64/test-vlsseg8e32.S       | 107 ++++++++++++++++++++++
 2 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/riscv64/test-vlsseg8e32.S

diff --git a/tests/tcg/riscv64/Makefile.softmmu-target b/tests/tcg/riscv64/Makefile.softmmu-target
index 3ca595335d..e52c005f87 100644
--- a/tests/tcg/riscv64/Makefile.softmmu-target
+++ b/tests/tcg/riscv64/Makefile.softmmu-target
@@ -7,7 +7,7 @@ VPATH += $(TEST_SRC)
 
 LINK_SCRIPT = $(TEST_SRC)/semihost.ld
 LDFLAGS = -T $(LINK_SCRIPT)
-CFLAGS += -g -Og
+CFLAGS += -march=rv64gcv -mabi=lp64d -g -Og
 
 %.o: %.S
 	$(CC) $(CFLAGS) $< -Wa,--noexecstack -c -o $@
@@ -24,5 +24,10 @@ EXTRA_RUNS += run-test-mepc-masking
 run-test-mepc-masking: test-mepc-masking
 	$(call run-test, $<, $(QEMU) $(QEMU_OPTS)$<)
 
+EXTRA_RUNS += run-vlsseg8e32
+run-vlsseg8e32: QEMU_OPTS := -cpu rv64,v=true $(QEMU_OPTS)
+run-vlsseg8e32: test-vlsseg8e32
+	$(call run-test, $<, $(QEMU) $(QEMU_OPTS)$<)
+
 # We don't currently support the multiarch system tests
 undefine MULTIARCH_TESTS
diff --git a/tests/tcg/riscv64/test-vlsseg8e32.S b/tests/tcg/riscv64/test-vlsseg8e32.S
new file mode 100644
index 0000000000..bbc79d5e8d
--- /dev/null
+++ b/tests/tcg/riscv64/test-vlsseg8e32.S
@@ -0,0 +1,107 @@
+#
+# QEMU RISC-V Vector Strided Load Instruction testcase
+#
+# Copyright (c) 2025 Chao Liu chao.liu@yeah.net
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+	.option	norvc
+
+	.section .data
+	.align 4
+source_data:
+	.asciz "Test the vssseg8e32 insn by copy 64b and verifying correctness."
+	.equ source_len, 64
+
+	.text
+	.global _start
+_start:
+	lla	t0, trap
+	csrw	mtvec, t0
+
+enable_rvv:
+
+	li	x15, 0x800000000024112d
+	csrw	0x301, x15
+	li	x1, 0x2200
+	csrr	x2, mstatus
+	or	x2, x2, x1
+	csrw	mstatus, x2
+
+rvv_test_func:
+	la	a0, source_data
+	li	a1, 0x80020000
+	vsetivli	zero, 1, e32, m1, ta, ma
+	li	t0, 64
+
+	vlsseg8e32.v	v0, (a0), t0
+	addi	a0, a0, 32
+	vlsseg8e32.v	v8, (a0), t0
+
+	vssseg8e32.v	v0, (a1), t0
+	addi	a1, a1, 32
+	vssseg8e32.v	v8, (a1), t0
+
+compare_start:
+	la	a0, source_data
+	li	a1, 0x80020000
+	li	t0, 0
+	li	t1, source_len
+
+compare_loop:
+	# when t0 >= len, compare end
+	bge	 t0, t1, compare_done
+
+	lb	t2, 0(a0)
+	lb	t3, 0(a1)
+	bne	t2, t3, compare_fail
+
+	addi	a0, a0, 1
+	addi	a1, a1, 1
+	addi	t0, t0, 1
+	j	compare_loop
+
+compare_done:
+	# compare ok, return 0
+	li	a0, 0
+	j	_exit
+
+compare_fail:
+	# compare failed, return 2
+	li	a0, 2
+	j	_exit
+
+trap:
+	# When an instruction traps, compare it to the insn in memory.
+	csrr	t0, mepc
+	csrr	t1, mtval
+	lwu	t2, 0(t0)
+	bne	t1, t2, fail
+
+	# Skip the insn and continue.
+	addi	t0, t0, 4
+	csrw	mepc, t0
+	mret
+
+fail:
+	li	a0, 1
+
+# Exit code in a0
+_exit:
+	lla	a1, semiargs
+	li	t0, 0x20026	# ADP_Stopped_ApplicationExit
+	sd	t0, 0(a1)
+	sd	a0, 8(a1)
+	li	a0, 0x20	# TARGET_SYS_EXIT_EXTENDED
+
+	# Semihosting call sequence
+	.balign	16
+	slli	zero, zero, 0x1f
+	ebreak
+	srai	zero, zero, 0x7
+	j	.
+
+	.data
+	.balign	16
+semiargs:
+	.space	16
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-16  9:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16  9:21 [PATCH v9 0/2] Generate strided vector loads/stores with tcg nodes Chao Liu
2025-09-16  9:21 ` [PATCH v9 1/2] target/riscv: Use tcg nodes for strided vector ld/st generation Chao Liu
2025-09-16  9:21 ` [PATCH v9 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction Chao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).