* [PATCH v1 0/2] target/riscv: fix vector register address calculation in strided LD/ST
@ 2025-08-15 19:37 Chao Liu
2025-08-15 19:37 ` [PATCH v1 1/2] " Chao Liu
2025-08-15 19:37 ` [PATCH v1 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction Chao Liu
0 siblings, 2 replies; 3+ messages in thread
From: Chao Liu @ 2025-08-15 19:37 UTC (permalink / raw)
To: paolo.savini, dbarboza, ebiggers, palmer, alistair.francis,
liwei1518, zhiwei_liu
Cc: qemu-riscv, qemu-devel, Chao Liu
Hi Paolo, Eric, Daniel,
I have attempted to fix this issue. Thanks to Eric for providing the test case.
This patch fixes a critical bug in the RISC-V vector instruction translation
that caused incorrect data handling in strided load operations
(e.g.,vlsseg8e32).
#### Problem Description
The `get_log2` function in `trans_rvv.c.inc` returned a value 1 higher than the
actual log2 value. For example, get_log2(4) incorrectly returned 3 instead of 2.
This led to erroneous vector register offset calculations, resulting in data
overlap where bytes 32-47 were incorrectly copied to positions 16-31 in ChaCha20
encryption code.
rvv_test_func:
vsetivli zero, 1, e32, m1, ta, ma
li t0, 64
vlsseg8e32.v v0, (a0), t0
addi a0, a0, 32
vlsseg8e32.v v8, (a0), t0
vssseg8e32.v v0, (a1), t0
addi a1, a1, 32
vssseg8e32.v v8, (a1), t0
ret
#### Root Cause Analysis
The original implementation counted the number of right shifts until zero,
including the final shift that reduced the value to zero:
static inline uint32_t get_log2(uint32_t a)
{
uint32_t i = 0;
for (; a > 0;) {
a >>= 1;
i++;
}
return i; // Returns 3 for a=4 (0b100 → 0b10 → 0b1 → 0b0)
}
#### Fix Implementation
The corrected function stops shifting when only the highest bit remains and
handles the special case of a=0:
static inline uint32_t get_log2(uint32_t a)
{
uint32_t i = 0;
if (a == 0) {
return i; // Handle edge case
}
for (; a > 1; a >>= 1) {
i++;
}
return i; // Now returns 2 for a=4
}
#### Testing
This fix has been verified with:
1. The provided ChaCha20 vector optimization test case
2. RVV strided load instruction tests in `test-vlsseg8e32.S`
All tests now pass with correct data handling and no memory overlap.
Test using the following command:
./configure --target-list=riscv64-softmmu \
--cross-prefix-riscv64=riscv64-unknown-elf-
ninja -j$(nproc) -C build && make check-tcg
Expected result:
BUILD riscv64-softmmu guest-tests
RUN riscv64-softmmu guest-tests
TEST issue1060 on riscv64
TEST test-vlsseg8e32 on riscv64
Best regards,
Chao
Chao Liu (2):
target/riscv: fix vector register address calculation in strided LD/ST
tests/tcg/riscv64: Add test for vlsseg8e32 instruction
target/riscv/insn_trans/trans_rvv.c.inc | 5 +-
tests/tcg/riscv64/Makefile.softmmu-target | 8 +-
tests/tcg/riscv64/test-vlsseg8e32.S | 108 ++++++++++++++++++++++
3 files changed, 118 insertions(+), 3 deletions(-)
create mode 100644 tests/tcg/riscv64/test-vlsseg8e32.S
--
2.50.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v1 1/2] target/riscv: fix vector register address calculation in strided LD/ST
2025-08-15 19:37 [PATCH v1 0/2] target/riscv: fix vector register address calculation in strided LD/ST Chao Liu
@ 2025-08-15 19:37 ` Chao Liu
2025-08-15 19:37 ` [PATCH v1 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction Chao Liu
1 sibling, 0 replies; 3+ messages in thread
From: Chao Liu @ 2025-08-15 19:37 UTC (permalink / raw)
To: paolo.savini, dbarboza, ebiggers, palmer, alistair.francis,
liwei1518, zhiwei_liu
Cc: qemu-riscv, qemu-devel, Chao Liu
his patch fixes a critical bug in the RISC-V vector instruction translation that
caused incorrect data handling in strided load operations (e.g., vlsseg8e32).
Problem Description:
The `get_log2` function in `trans_rvv.c.inc` returned a value 1 higher than the
actual log2 value. For example, get_log2(4) incorrectly returned 3 instead of 2.
This led to erroneous vector register offset calculations, resulting in data
overlap where bytes 32-47 were incorrectly copied to positions 16-31 in ChaCha20
encryption code.
rvv_test_func:
vsetivli zero, 1, e32, m1, ta, ma
li t0, 64
vlsseg8e32.v v0, (a0), t0
addi a0, a0, 32
vlsseg8e32.v v8, (a0), t0
vssseg8e32.v v0, (a1), t0
addi a1, a1, 32
vssseg8e32.v v8, (a1), t0
ret
Analysis:
The original implementation counted the number of right shifts until
zero, including the final shift that reduced the value to zero:
static inline uint32_t get_log2(uint32_t a)
{
uint32_t i = 0;
for (; a > 0;) {
a >>= 1;
i++;
}
return i; // Returns 3 for a=4 (0b100 → 0b10 → 0b1 → 0b0)
}
Fix:
The corrected function stops shifting when only the highest bit remains
and handles the special case of a=0:
static inline uint32_t get_log2(uint32_t a)
{
uint32_t i = 0;
if (a == 0) {
return i; // Handle edge case
}
for (; a > 1; a >>= 1) {
i++;
}
return i; // Now returns 2 for a=4
}
Fixes: 28c12c1f2f ("Generate strided vector loads/stores with tcg nodes.")
Signed-off-by: Chao Liu <chao.liu@yeah.net>
---
target/riscv/insn_trans/trans_rvv.c.inc | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 2b6077ac06..f50b62b1d8 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -877,7 +877,10 @@ static inline uint32_t MAXSZ(DisasContext *s)
static inline uint32_t get_log2(uint32_t a)
{
uint32_t i = 0;
- for (; a > 0;) {
+ if (a == 0) {
+ return i;
+ }
+ for (; a > 1;) {
a >>= 1;
i++;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v1 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction
2025-08-15 19:37 [PATCH v1 0/2] target/riscv: fix vector register address calculation in strided LD/ST Chao Liu
2025-08-15 19:37 ` [PATCH v1 1/2] " Chao Liu
@ 2025-08-15 19:37 ` Chao Liu
1 sibling, 0 replies; 3+ messages in thread
From: Chao Liu @ 2025-08-15 19:37 UTC (permalink / raw)
To: paolo.savini, dbarboza, ebiggers, palmer, alistair.francis,
liwei1518, zhiwei_liu
Cc: qemu-riscv, qemu-devel, Chao Liu
This case, it copied 64 bytes from a0 to a1 with vlsseg8e32.
Signed-off-by: Chao Liu <chao.liu@yeah.net>
---
tests/tcg/riscv64/Makefile.softmmu-target | 8 +-
tests/tcg/riscv64/test-vlsseg8e32.S | 108 ++++++++++++++++++++++
2 files changed, 114 insertions(+), 2 deletions(-)
create mode 100644 tests/tcg/riscv64/test-vlsseg8e32.S
diff --git a/tests/tcg/riscv64/Makefile.softmmu-target b/tests/tcg/riscv64/Makefile.softmmu-target
index 7c1d44d3f4..c3c5b66713 100644
--- a/tests/tcg/riscv64/Makefile.softmmu-target
+++ b/tests/tcg/riscv64/Makefile.softmmu-target
@@ -7,18 +7,22 @@ VPATH += $(TEST_SRC)
LINK_SCRIPT = $(TEST_SRC)/semihost.ld
LDFLAGS = -T $(LINK_SCRIPT)
-CFLAGS += -g -Og
+CFLAGS += -march=rv64gcv -mabi=lp64d -g -Og
%.o: %.S
$(CC) $(CFLAGS) $< -Wa,--noexecstack -c -o $@
%: %.o $(LINK_SCRIPT)
$(LD) $(LDFLAGS) $< -o $@
-QEMU_OPTS += -M virt -display none -semihosting -device loader,file=
+QEMU_OPTS += -M virt -cpu rv64,v=true -display none -semihosting -device loader,file=
EXTRA_RUNS += run-issue1060
run-issue1060: issue1060
$(call run-test, $<, $(QEMU) $(QEMU_OPTS)$<)
+EXTRA_RUNS += run-vlsseg8e32
+run-vlsseg8e32: test-vlsseg8e32
+ $(call run-test, $<, $(QEMU) $(QEMU_OPTS)$<)
+
# We don't currently support the multiarch system tests
undefine MULTIARCH_TESTS
diff --git a/tests/tcg/riscv64/test-vlsseg8e32.S b/tests/tcg/riscv64/test-vlsseg8e32.S
new file mode 100644
index 0000000000..2861ff3702
--- /dev/null
+++ b/tests/tcg/riscv64/test-vlsseg8e32.S
@@ -0,0 +1,108 @@
+#
+# QEMU RISC-V Vector Strided Load Instruction testcase
+#
+# Copyright (c) 2025 Chao Liu chao.liu@yeah.net
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+
+ .option norvc
+
+ .section .data
+ .align 4
+source_data:
+ .asciz "Test the vlsseg8e32 insn by copy 64b and verifying correctness."
+ .equ source_len, 64
+
+ .text
+ .global _start
+_start:
+ lla t0, trap
+ csrw mtvec, t0
+
+enable_rvv:
+
+ li x15, 0x800000000024112d
+ csrw 0x301, x15
+ li x1, 0x2200
+ csrr x2, mstatus
+ or x2, x2, x1
+ csrw mstatus, x2
+
+rvv_test_func:
+ la a0, source_data
+ li a1, 0x80020000
+ vsetivli zero, 1, e32, m1, ta, ma
+ li t0, 64
+
+ vlsseg8e32.v v0, (a0), t0
+ addi a0, a0, 32
+ vlsseg8e32.v v8, (a0), t0
+
+ vssseg8e32.v v0, (a1), t0
+ addi a1, a1, 32
+ vssseg8e32.v v8, (a1), t0
+
+compare_start:
+ la a0, source_data
+ li a1, 0x80020000
+ li t0, 0
+ li t1, source_len
+
+compare_loop:
+ # when t0 >= len, compare end
+ bge t0, t1, compare_done
+
+ lb t2, 0(a0)
+ lb t3, 0(a1)
+ bne t2, t3, compare_fail
+
+ addi a0, a0, 1
+ addi a1, a1, 1
+ addi t0, t0, 1
+ j compare_loop
+
+compare_done:
+ # compare ok, return 0
+ li a0, 0
+ j _exit
+
+compare_fail:
+ # compare failed, return 2
+ li a0, 2
+ j _exit
+
+trap:
+ # When an instruction traps, compare it to the insn in memory.
+ csrr t0, mepc
+ csrr t1, mtval
+ lwu t2, 0(t0)
+ bne t1, t2, fail
+
+ # Skip the insn and continue.
+ addi t0, t0, 4
+ csrw mepc, t0
+ mret
+
+fail:
+ li a0, 1
+
+# Exit code in a0
+_exit:
+ lla a1, semiargs
+ li t0, 0x20026 # ADP_Stopped_ApplicationExit
+ sd t0, 0(a1)
+ sd a0, 8(a1)
+ li a0, 0x20 # TARGET_SYS_EXIT_EXTENDED
+
+ # Semihosting call sequence
+ .balign 16
+ slli zero, zero, 0x1f
+ ebreak
+ srai zero, zero, 0x7
+ j .
+
+ .data
+ .balign 16
+semiargs:
+ .space 16
--
2.50.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-15 19:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 19:37 [PATCH v1 0/2] target/riscv: fix vector register address calculation in strided LD/ST Chao Liu
2025-08-15 19:37 ` [PATCH v1 1/2] " Chao Liu
2025-08-15 19:37 ` [PATCH v1 2/2] tests/tcg/riscv64: Add test for vlsseg8e32 instruction Chao Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).