* [PATCH for 9.0 v15 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
@ 2024-03-14 17:56 ` Daniel Henrique Barboza
2024-03-19 0:52 ` LIU Zhiwei
2024-03-14 17:56 ` [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns Daniel Henrique Barboza
` (9 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:56 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
The helper isn't setting env->vstart = 0 after its execution, as it is
expected from every vector instruction that completes successfully.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
target/riscv/vector_helper.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index fe56c007d5..ca79571ae2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4781,6 +4781,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
} \
*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset)); \
} \
+ env->vstart = 0; \
/* set tail elements to 1s */ \
vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz); \
}
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()
2024-03-14 17:56 ` [PATCH for 9.0 v15 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX() Daniel Henrique Barboza
@ 2024-03-19 0:52 ` LIU Zhiwei
0 siblings, 0 replies; 24+ messages in thread
From: LIU Zhiwei @ 2024-03-19 0:52 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, palmer, max.chou,
richard.henderson
On 2024/3/15 1:56, Daniel Henrique Barboza wrote:
> The helper isn't setting env->vstart = 0 after its execution, as it is
> expected from every vector instruction that completes successfully.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Zhiwei
> ---
> target/riscv/vector_helper.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index fe56c007d5..ca79571ae2 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -4781,6 +4781,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> } \
> *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset)); \
> } \
> + env->vstart = 0; \
> /* set tail elements to 1s */ \
> vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz); \
> }
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
2024-03-14 17:56 ` [PATCH for 9.0 v15 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX() Daniel Henrique Barboza
@ 2024-03-14 17:56 ` Daniel Henrique Barboza
2024-03-18 8:30 ` Alistair Francis
2024-03-19 0:56 ` LIU Zhiwei
2024-03-14 17:56 ` [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess Daniel Henrique Barboza
` (8 subsequent siblings)
10 siblings, 2 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:56 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
trans_vmv_x_s, trans_vmv_s_x, trans_vfmv_f_s and trans_vfmv_s_f aren't
setting vstart = 0 after execution. This is usually done by a helper in
vector_helper.c but these functions don't use helpers.
We'll set vstart after any potential 'over' brconds, and that will also
mandate a mark_vs_dirty() too.
Fixes: dedc53cbc9 ("target/riscv: rvv-1.0: integer scalar move instructions")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/riscv/insn_trans/trans_rvv.c.inc | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index e42728990e..8c16a9f5b3 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3373,6 +3373,8 @@ static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
vec_element_loadi(s, t1, a->rs2, 0, true);
tcg_gen_trunc_i64_tl(dest, t1);
gen_set_gpr(s, a->rd, dest);
+ tcg_gen_movi_tl(cpu_vstart, 0);
+ mark_vs_dirty(s);
return true;
}
return false;
@@ -3399,8 +3401,9 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
s1 = get_gpr(s, a->rs1, EXT_NONE);
tcg_gen_ext_tl_i64(t1, s1);
vec_element_storei(s, a->rd, 0, t1);
- mark_vs_dirty(s);
gen_set_label(over);
+ tcg_gen_movi_tl(cpu_vstart, 0);
+ mark_vs_dirty(s);
return true;
}
return false;
@@ -3427,6 +3430,8 @@ static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
}
mark_fs_dirty(s);
+ tcg_gen_movi_tl(cpu_vstart, 0);
+ mark_vs_dirty(s);
return true;
}
return false;
@@ -3452,8 +3457,9 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
do_nanbox(s, t1, cpu_fpr[a->rs1]);
vec_element_storei(s, a->rd, 0, t1);
- mark_vs_dirty(s);
gen_set_label(over);
+ tcg_gen_movi_tl(cpu_vstart, 0);
+ mark_vs_dirty(s);
return true;
}
return false;
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns Daniel Henrique Barboza
@ 2024-03-18 8:30 ` Alistair Francis
2024-03-19 0:56 ` LIU Zhiwei
1 sibling, 0 replies; 24+ messages in thread
From: Alistair Francis @ 2024-03-18 8:30 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
zhiwei_liu, palmer, max.chou, richard.henderson
On Fri, Mar 15, 2024 at 3:59 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> trans_vmv_x_s, trans_vmv_s_x, trans_vfmv_f_s and trans_vfmv_s_f aren't
> setting vstart = 0 after execution. This is usually done by a helper in
> vector_helper.c but these functions don't use helpers.
>
> We'll set vstart after any potential 'over' brconds, and that will also
> mandate a mark_vs_dirty() too.
>
> Fixes: dedc53cbc9 ("target/riscv: rvv-1.0: integer scalar move instructions")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index e42728990e..8c16a9f5b3 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -3373,6 +3373,8 @@ static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
> vec_element_loadi(s, t1, a->rs2, 0, true);
> tcg_gen_trunc_i64_tl(dest, t1);
> gen_set_gpr(s, a->rd, dest);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> @@ -3399,8 +3401,9 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
> s1 = get_gpr(s, a->rs1, EXT_NONE);
> tcg_gen_ext_tl_i64(t1, s1);
> vec_element_storei(s, a->rd, 0, t1);
> - mark_vs_dirty(s);
> gen_set_label(over);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> @@ -3427,6 +3430,8 @@ static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
> }
>
> mark_fs_dirty(s);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> @@ -3452,8 +3457,9 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
> do_nanbox(s, t1, cpu_fpr[a->rs1]);
>
> vec_element_storei(s, a->rd, 0, t1);
> - mark_vs_dirty(s);
> gen_set_label(over);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns Daniel Henrique Barboza
2024-03-18 8:30 ` Alistair Francis
@ 2024-03-19 0:56 ` LIU Zhiwei
1 sibling, 0 replies; 24+ messages in thread
From: LIU Zhiwei @ 2024-03-19 0:56 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, palmer, max.chou,
richard.henderson
On 2024/3/15 1:56, Daniel Henrique Barboza wrote:
> trans_vmv_x_s, trans_vmv_s_x, trans_vfmv_f_s and trans_vfmv_s_f aren't
> setting vstart = 0 after execution. This is usually done by a helper in
> vector_helper.c but these functions don't use helpers.
>
> We'll set vstart after any potential 'over' brconds, and that will also
> mandate a mark_vs_dirty() too.
>
> Fixes: dedc53cbc9 ("target/riscv: rvv-1.0: integer scalar move instructions")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewd-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Zhiwei
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index e42728990e..8c16a9f5b3 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -3373,6 +3373,8 @@ static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
> vec_element_loadi(s, t1, a->rs2, 0, true);
> tcg_gen_trunc_i64_tl(dest, t1);
> gen_set_gpr(s, a->rd, dest);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> @@ -3399,8 +3401,9 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
> s1 = get_gpr(s, a->rs1, EXT_NONE);
> tcg_gen_ext_tl_i64(t1, s1);
> vec_element_storei(s, a->rd, 0, t1);
> - mark_vs_dirty(s);
> gen_set_label(over);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> @@ -3427,6 +3430,8 @@ static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
> }
>
> mark_fs_dirty(s);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
> @@ -3452,8 +3457,9 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
> do_nanbox(s, t1, cpu_fpr[a->rs1]);
>
> vec_element_storei(s, a->rd, 0, t1);
> - mark_vs_dirty(s);
> gen_set_label(over);
> + tcg_gen_movi_tl(cpu_vstart, 0);
> + mark_vs_dirty(s);
> return true;
> }
> return false;
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
2024-03-14 17:56 ` [PATCH for 9.0 v15 01/10] target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX() Daniel Henrique Barboza
2024-03-14 17:56 ` [PATCH for 9.0 v15 02/10] trans_rvv.c.inc: set vstart = 0 in int scalar move insns Daniel Henrique Barboza
@ 2024-03-14 17:56 ` Daniel Henrique Barboza
2024-03-15 7:41 ` Richard Henderson
` (2 more replies)
2024-03-14 17:56 ` [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns Daniel Henrique Barboza
` (7 subsequent siblings)
10 siblings, 3 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:56 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
vmvr_v isn't handling the case where the host might be big endian and
the bytes to be copied aren't sequential.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
target/riscv/vector_helper.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ca79571ae2..34ac4aa808 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -5075,9 +5075,17 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
uint32_t startb = env->vstart * sewb;
uint32_t i = startb;
+ if (HOST_BIG_ENDIAN && i % 8 != 0) {
+ uint32_t j = ROUND_UP(i, 8);
+ memcpy((uint8_t *)vd + H1(j - 1),
+ (uint8_t *)vs2 + H1(j - 1),
+ j - i);
+ i = j;
+ }
+
memcpy((uint8_t *)vd + H1(i),
(uint8_t *)vs2 + H1(i),
- maxsz - startb);
+ maxsz - i);
env->vstart = 0;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
2024-03-14 17:56 ` [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess Daniel Henrique Barboza
@ 2024-03-15 7:41 ` Richard Henderson
2024-03-18 8:43 ` Alistair Francis
2024-03-19 7:52 ` LIU Zhiwei
2 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2024-03-15 7:41 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou
On 3/14/24 07:56, Daniel Henrique Barboza wrote:
> vmvr_v isn't handling the case where the host might be big endian and
> the bytes to be copied aren't sequential.
>
> Suggested-by: Richard Henderson<richard.henderson@linaro.org>
> Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
> Signed-off-by: Daniel Henrique Barboza<dbarboza@ventanamicro.com>
> ---
> target/riscv/vector_helper.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
2024-03-14 17:56 ` [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess Daniel Henrique Barboza
2024-03-15 7:41 ` Richard Henderson
@ 2024-03-18 8:43 ` Alistair Francis
2024-03-19 7:52 ` LIU Zhiwei
2 siblings, 0 replies; 24+ messages in thread
From: Alistair Francis @ 2024-03-18 8:43 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
zhiwei_liu, palmer, max.chou, richard.henderson
On Fri, Mar 15, 2024 at 3:58 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> vmvr_v isn't handling the case where the host might be big endian and
> the bytes to be copied aren't sequential.
>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> target/riscv/vector_helper.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ca79571ae2..34ac4aa808 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -5075,9 +5075,17 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
> uint32_t startb = env->vstart * sewb;
> uint32_t i = startb;
>
> + if (HOST_BIG_ENDIAN && i % 8 != 0) {
> + uint32_t j = ROUND_UP(i, 8);
> + memcpy((uint8_t *)vd + H1(j - 1),
> + (uint8_t *)vs2 + H1(j - 1),
> + j - i);
> + i = j;
> + }
> +
> memcpy((uint8_t *)vd + H1(i),
> (uint8_t *)vs2 + H1(i),
> - maxsz - startb);
> + maxsz - i);
>
> env->vstart = 0;
> }
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
2024-03-14 17:56 ` [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess Daniel Henrique Barboza
2024-03-15 7:41 ` Richard Henderson
2024-03-18 8:43 ` Alistair Francis
@ 2024-03-19 7:52 ` LIU Zhiwei
2 siblings, 0 replies; 24+ messages in thread
From: LIU Zhiwei @ 2024-03-19 7:52 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, palmer, max.chou,
richard.henderson
On 2024/3/15 1:56, Daniel Henrique Barboza wrote:
> vmvr_v isn't handling the case where the host might be big endian and
> the bytes to be copied aren't sequential.
>
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
> target/riscv/vector_helper.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ca79571ae2..34ac4aa808 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -5075,9 +5075,17 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
> uint32_t startb = env->vstart * sewb;
> uint32_t i = startb;
>
> + if (HOST_BIG_ENDIAN && i % 8 != 0) {
> + uint32_t j = ROUND_UP(i, 8);
> + memcpy((uint8_t *)vd + H1(j - 1),
> + (uint8_t *)vs2 + H1(j - 1),
> + j - i);
> + i = j;
> + }
> +
> memcpy((uint8_t *)vd + H1(i),
> (uint8_t *)vs2 + H1(i),
> - maxsz - startb);
> + maxsz - i);
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Zhiwei
>
> env->vstart = 0;
> }
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (2 preceding siblings ...)
2024-03-14 17:56 ` [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess Daniel Henrique Barboza
@ 2024-03-14 17:56 ` Daniel Henrique Barboza
2024-03-15 7:41 ` Richard Henderson
` (2 more replies)
2024-03-14 17:56 ` [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns Daniel Henrique Barboza
` (6 subsequent siblings)
10 siblings, 3 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:56 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
These insns have 2 paths: we'll either have vstart already cleared if
vstart_eq_zero or we'll do a brcond to check if vstart >= maxsz to call
the 'vmvr_v' helper. The helper will clear vstart if it executes until
the end, or if vstart >= vl.
For starters, the check itself is wrong: we're checking vstart >= maxsz,
when in fact we should use vstart in bytes, or 'startb' like 'vmvr_v' is
calling, to do the comparison. But even after fixing the comparison we'll
still need to clear vstart in the end, which isn't happening too.
We want to make the helpers responsible to manage vstart, including
these corner cases, precisely to avoid these situations:
- remove the wrong vstart >= maxsz cond from the translation;
- add a 'startb >= maxsz' cond in 'vmvr_v', and clear vstart if that
happens.
This way we're now sure that vstart is being cleared in the end of the
execution, regardless of the path taken.
Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
target/riscv/insn_trans/trans_rvv.c.inc | 3 ---
target/riscv/vector_helper.c | 5 +++++
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 8c16a9f5b3..52c26a7834 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3664,12 +3664,9 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
vreg_ofs(s, a->rs2), maxsz, maxsz); \
mark_vs_dirty(s); \
} else { \
- TCGLabel *over = gen_new_label(); \
- tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, maxsz, over); \
tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
mark_vs_dirty(s); \
- gen_set_label(over); \
} \
return true; \
} \
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 34ac4aa808..bcc553c0e2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -5075,6 +5075,11 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
uint32_t startb = env->vstart * sewb;
uint32_t i = startb;
+ if (startb >= maxsz) {
+ env->vstart = 0;
+ return;
+ }
+
if (HOST_BIG_ENDIAN && i % 8 != 0) {
uint32_t j = ROUND_UP(i, 8);
memcpy((uint8_t *)vd + H1(j - 1),
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns Daniel Henrique Barboza
@ 2024-03-15 7:41 ` Richard Henderson
2024-03-18 8:55 ` Alistair Francis
2024-03-19 8:13 ` LIU Zhiwei
2 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2024-03-15 7:41 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou
On 3/14/24 07:56, Daniel Henrique Barboza wrote:
> These insns have 2 paths: we'll either have vstart already cleared if
> vstart_eq_zero or we'll do a brcond to check if vstart >= maxsz to call
> the 'vmvr_v' helper. The helper will clear vstart if it executes until
> the end, or if vstart >= vl.
>
> For starters, the check itself is wrong: we're checking vstart >= maxsz,
> when in fact we should use vstart in bytes, or 'startb' like 'vmvr_v' is
> calling, to do the comparison. But even after fixing the comparison we'll
> still need to clear vstart in the end, which isn't happening too.
>
> We want to make the helpers responsible to manage vstart, including
> these corner cases, precisely to avoid these situations:
>
> - remove the wrong vstart >= maxsz cond from the translation;
> - add a 'startb >= maxsz' cond in 'vmvr_v', and clear vstart if that
> happens.
>
> This way we're now sure that vstart is being cleared in the end of the
> execution, regardless of the path taken.
>
> Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
> Signed-off-by: Daniel Henrique Barboza<dbarboza@ventanamicro.com>
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 3 ---
> target/riscv/vector_helper.c | 5 +++++
> 2 files changed, 5 insertions(+), 3 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns Daniel Henrique Barboza
2024-03-15 7:41 ` Richard Henderson
@ 2024-03-18 8:55 ` Alistair Francis
2024-03-19 8:13 ` LIU Zhiwei
2 siblings, 0 replies; 24+ messages in thread
From: Alistair Francis @ 2024-03-18 8:55 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
zhiwei_liu, palmer, max.chou, richard.henderson
On Fri, Mar 15, 2024 at 3:58 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> These insns have 2 paths: we'll either have vstart already cleared if
> vstart_eq_zero or we'll do a brcond to check if vstart >= maxsz to call
> the 'vmvr_v' helper. The helper will clear vstart if it executes until
> the end, or if vstart >= vl.
>
> For starters, the check itself is wrong: we're checking vstart >= maxsz,
> when in fact we should use vstart in bytes, or 'startb' like 'vmvr_v' is
> calling, to do the comparison. But even after fixing the comparison we'll
> still need to clear vstart in the end, which isn't happening too.
>
> We want to make the helpers responsible to manage vstart, including
> these corner cases, precisely to avoid these situations:
>
> - remove the wrong vstart >= maxsz cond from the translation;
> - add a 'startb >= maxsz' cond in 'vmvr_v', and clear vstart if that
> happens.
>
> This way we're now sure that vstart is being cleared in the end of the
> execution, regardless of the path taken.
>
> Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 3 ---
> target/riscv/vector_helper.c | 5 +++++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index 8c16a9f5b3..52c26a7834 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -3664,12 +3664,9 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
> vreg_ofs(s, a->rs2), maxsz, maxsz); \
> mark_vs_dirty(s); \
> } else { \
> - TCGLabel *over = gen_new_label(); \
> - tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, maxsz, over); \
> tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
> tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
> mark_vs_dirty(s); \
> - gen_set_label(over); \
> } \
> return true; \
> } \
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 34ac4aa808..bcc553c0e2 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -5075,6 +5075,11 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
> uint32_t startb = env->vstart * sewb;
> uint32_t i = startb;
>
> + if (startb >= maxsz) {
> + env->vstart = 0;
> + return;
> + }
> +
> if (HOST_BIG_ENDIAN && i % 8 != 0) {
> uint32_t j = ROUND_UP(i, 8);
> memcpy((uint8_t *)vd + H1(j - 1),
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns Daniel Henrique Barboza
2024-03-15 7:41 ` Richard Henderson
2024-03-18 8:55 ` Alistair Francis
@ 2024-03-19 8:13 ` LIU Zhiwei
2 siblings, 0 replies; 24+ messages in thread
From: LIU Zhiwei @ 2024-03-19 8:13 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, palmer, max.chou,
richard.henderson
On 2024/3/15 1:56, Daniel Henrique Barboza wrote:
> These insns have 2 paths: we'll either have vstart already cleared if
> vstart_eq_zero or we'll do a brcond to check if vstart >= maxsz to call
> the 'vmvr_v' helper. The helper will clear vstart if it executes until
> the end, or if vstart >= vl.
>
> For starters, the check itself is wrong: we're checking vstart >= maxsz,
> when in fact we should use vstart in bytes, or 'startb' like 'vmvr_v' is
> calling, to do the comparison. But even after fixing the comparison we'll
> still need to clear vstart in the end, which isn't happening too.
>
> We want to make the helpers responsible to manage vstart, including
> these corner cases, precisely to avoid these situations:
>
> - remove the wrong vstart >= maxsz cond from the translation;
> - add a 'startb >= maxsz' cond in 'vmvr_v', and clear vstart if that
> happens.
>
> This way we're now sure that vstart is being cleared in the end of the
> execution, regardless of the path taken.
>
> Fixes: f714361ed7 ("target/riscv: rvv-1.0: implement vstart CSR")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Zhiwei
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 3 ---
> target/riscv/vector_helper.c | 5 +++++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index 8c16a9f5b3..52c26a7834 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -3664,12 +3664,9 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
> vreg_ofs(s, a->rs2), maxsz, maxsz); \
> mark_vs_dirty(s); \
> } else { \
> - TCGLabel *over = gen_new_label(); \
> - tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, maxsz, over); \
> tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
> tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
> mark_vs_dirty(s); \
> - gen_set_label(over); \
> } \
> return true; \
> } \
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 34ac4aa808..bcc553c0e2 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -5075,6 +5075,11 @@ void HELPER(vmvr_v)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
> uint32_t startb = env->vstart * sewb;
> uint32_t i = startb;
>
> + if (startb >= maxsz) {
> + env->vstart = 0;
> + return;
> + }
> +
> if (HOST_BIG_ENDIAN && i % 8 != 0) {
> uint32_t j = ROUND_UP(i, 8);
> memcpy((uint8_t *)vd + H1(j - 1),
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (3 preceding siblings ...)
2024-03-14 17:56 ` [PATCH for 9.0 v15 04/10] target/riscv: always clear vstart in whole vec move insns Daniel Henrique Barboza
@ 2024-03-14 17:56 ` Daniel Henrique Barboza
2024-03-15 11:25 ` Max Chou
2024-03-18 9:10 ` Alistair Francis
2024-03-14 17:57 ` [PATCH for 9.0 v15 06/10] target/riscv/vector_helpers: do early exit when vstart >= vl Daniel Henrique Barboza
` (5 subsequent siblings)
10 siblings, 2 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:56 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
Commit 8ff8ac6329 added a conditional to guard the vext_ldst_whole()
helper if vstart >= evl. But by skipping the helper we're also not
setting vstart = 0 at the end of the insns, which is incorrect.
We'll move the conditional to vext_ldst_whole(), following in line with
the removal of all brconds vstart >= vl that the next patch will do. The
idea is to make the helpers responsible for their own vstart management.
Fix ldst_whole isns by:
- remove the brcond that skips the helper if vstart is >= evl;
- vext_ldst_whole() now does an early exit with the same check, where
evl = (vlenb * nf) >> log2_esz, but the early exit will also clear
vstart.
The 'width' param is now unneeded in ldst_whole_trans() and is also
removed. It was used for the evl calculation for the brcond and has no
other use now. The 'width' is reflected in vext_ldst_whole() via
log2_esz, which is encoded by GEN_VEXT_LD_WHOLE() as
"ctzl(sizeof(ETYPE))".
Suggested-by: Max Chou <max.chou@sifive.com>
Fixes: 8ff8ac6329 ("target/riscv: rvv: Add missing early exit condition for whole register load/store")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
target/riscv/insn_trans/trans_rvv.c.inc | 52 +++++++++++--------------
target/riscv/vector_helper.c | 5 +++
2 files changed, 28 insertions(+), 29 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 52c26a7834..1366445e1f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -1097,13 +1097,9 @@ GEN_VEXT_TRANS(vle64ff_v, MO_64, r2nfvm, ldff_op, ld_us_check)
typedef void gen_helper_ldst_whole(TCGv_ptr, TCGv, TCGv_env, TCGv_i32);
static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
- uint32_t width, gen_helper_ldst_whole *fn,
+ gen_helper_ldst_whole *fn,
DisasContext *s)
{
- uint32_t evl = s->cfg_ptr->vlenb * nf / width;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
-
TCGv_ptr dest;
TCGv base;
TCGv_i32 desc;
@@ -1120,8 +1116,6 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
fn(dest, base, tcg_env, desc);
- gen_set_label(over);
-
return true;
}
@@ -1129,42 +1123,42 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
* load and store whole register instructions ignore vtype and vl setting.
* Thus, we don't need to check vill bit. (Section 7.9)
*/
-#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF, WIDTH) \
+#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF) \
static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
{ \
if (require_rvv(s) && \
QEMU_IS_ALIGNED(a->rd, ARG_NF)) { \
- return ldst_whole_trans(a->rd, a->rs1, ARG_NF, WIDTH, \
+ return ldst_whole_trans(a->rd, a->rs1, ARG_NF, \
gen_helper_##NAME, s); \
} \
return false; \
}
-GEN_LDST_WHOLE_TRANS(vl1re8_v, 1, 1)
-GEN_LDST_WHOLE_TRANS(vl1re16_v, 1, 2)
-GEN_LDST_WHOLE_TRANS(vl1re32_v, 1, 4)
-GEN_LDST_WHOLE_TRANS(vl1re64_v, 1, 8)
-GEN_LDST_WHOLE_TRANS(vl2re8_v, 2, 1)
-GEN_LDST_WHOLE_TRANS(vl2re16_v, 2, 2)
-GEN_LDST_WHOLE_TRANS(vl2re32_v, 2, 4)
-GEN_LDST_WHOLE_TRANS(vl2re64_v, 2, 8)
-GEN_LDST_WHOLE_TRANS(vl4re8_v, 4, 1)
-GEN_LDST_WHOLE_TRANS(vl4re16_v, 4, 2)
-GEN_LDST_WHOLE_TRANS(vl4re32_v, 4, 4)
-GEN_LDST_WHOLE_TRANS(vl4re64_v, 4, 8)
-GEN_LDST_WHOLE_TRANS(vl8re8_v, 8, 1)
-GEN_LDST_WHOLE_TRANS(vl8re16_v, 8, 2)
-GEN_LDST_WHOLE_TRANS(vl8re32_v, 8, 4)
-GEN_LDST_WHOLE_TRANS(vl8re64_v, 8, 8)
+GEN_LDST_WHOLE_TRANS(vl1re8_v, 1)
+GEN_LDST_WHOLE_TRANS(vl1re16_v, 1)
+GEN_LDST_WHOLE_TRANS(vl1re32_v, 1)
+GEN_LDST_WHOLE_TRANS(vl1re64_v, 1)
+GEN_LDST_WHOLE_TRANS(vl2re8_v, 2)
+GEN_LDST_WHOLE_TRANS(vl2re16_v, 2)
+GEN_LDST_WHOLE_TRANS(vl2re32_v, 2)
+GEN_LDST_WHOLE_TRANS(vl2re64_v, 2)
+GEN_LDST_WHOLE_TRANS(vl4re8_v, 4)
+GEN_LDST_WHOLE_TRANS(vl4re16_v, 4)
+GEN_LDST_WHOLE_TRANS(vl4re32_v, 4)
+GEN_LDST_WHOLE_TRANS(vl4re64_v, 4)
+GEN_LDST_WHOLE_TRANS(vl8re8_v, 8)
+GEN_LDST_WHOLE_TRANS(vl8re16_v, 8)
+GEN_LDST_WHOLE_TRANS(vl8re32_v, 8)
+GEN_LDST_WHOLE_TRANS(vl8re64_v, 8)
/*
* The vector whole register store instructions are encoded similar to
* unmasked unit-stride store of elements with EEW=8.
*/
-GEN_LDST_WHOLE_TRANS(vs1r_v, 1, 1)
-GEN_LDST_WHOLE_TRANS(vs2r_v, 2, 1)
-GEN_LDST_WHOLE_TRANS(vs4r_v, 4, 1)
-GEN_LDST_WHOLE_TRANS(vs8r_v, 8, 1)
+GEN_LDST_WHOLE_TRANS(vs1r_v, 1)
+GEN_LDST_WHOLE_TRANS(vs2r_v, 2)
+GEN_LDST_WHOLE_TRANS(vs4r_v, 4)
+GEN_LDST_WHOLE_TRANS(vs8r_v, 8)
/*
*** Vector Integer Arithmetic Instructions
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index bcc553c0e2..1f4c276b21 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -572,6 +572,11 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
uint32_t vlenb = riscv_cpu_cfg(env)->vlenb;
uint32_t max_elems = vlenb >> log2_esz;
+ if (env->vstart >= ((vlenb * nf) >> log2_esz)) {
+ env->vstart = 0;
+ return;
+ }
+
k = env->vstart / max_elems;
off = env->vstart % max_elems;
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns Daniel Henrique Barboza
@ 2024-03-15 11:25 ` Max Chou
2024-03-18 9:10 ` Alistair Francis
1 sibling, 0 replies; 24+ messages in thread
From: Max Chou @ 2024-03-15 11:25 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, richard.henderson
Reviewed-by: Max Chou <max.chou@sifive.com>
On 2024/3/15 1:56 AM, Daniel Henrique Barboza wrote:
> Commit 8ff8ac6329 added a conditional to guard the vext_ldst_whole()
> helper if vstart >= evl. But by skipping the helper we're also not
> setting vstart = 0 at the end of the insns, which is incorrect.
>
> We'll move the conditional to vext_ldst_whole(), following in line with
> the removal of all brconds vstart >= vl that the next patch will do. The
> idea is to make the helpers responsible for their own vstart management.
>
> Fix ldst_whole isns by:
>
> - remove the brcond that skips the helper if vstart is >= evl;
>
> - vext_ldst_whole() now does an early exit with the same check, where
> evl = (vlenb * nf) >> log2_esz, but the early exit will also clear
> vstart.
>
> The 'width' param is now unneeded in ldst_whole_trans() and is also
> removed. It was used for the evl calculation for the brcond and has no
> other use now. The 'width' is reflected in vext_ldst_whole() via
> log2_esz, which is encoded by GEN_VEXT_LD_WHOLE() as
> "ctzl(sizeof(ETYPE))".
>
> Suggested-by: Max Chou <max.chou@sifive.com>
> Fixes: 8ff8ac6329 ("target/riscv: rvv: Add missing early exit condition for whole register load/store")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 52 +++++++++++--------------
> target/riscv/vector_helper.c | 5 +++
> 2 files changed, 28 insertions(+), 29 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index 52c26a7834..1366445e1f 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -1097,13 +1097,9 @@ GEN_VEXT_TRANS(vle64ff_v, MO_64, r2nfvm, ldff_op, ld_us_check)
> typedef void gen_helper_ldst_whole(TCGv_ptr, TCGv, TCGv_env, TCGv_i32);
>
> static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
> - uint32_t width, gen_helper_ldst_whole *fn,
> + gen_helper_ldst_whole *fn,
> DisasContext *s)
> {
> - uint32_t evl = s->cfg_ptr->vlenb * nf / width;
> - TCGLabel *over = gen_new_label();
> - tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
> -
> TCGv_ptr dest;
> TCGv base;
> TCGv_i32 desc;
> @@ -1120,8 +1116,6 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
>
> fn(dest, base, tcg_env, desc);
>
> - gen_set_label(over);
> -
> return true;
> }
>
> @@ -1129,42 +1123,42 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
> * load and store whole register instructions ignore vtype and vl setting.
> * Thus, we don't need to check vill bit. (Section 7.9)
> */
> -#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF, WIDTH) \
> +#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF) \
> static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
> { \
> if (require_rvv(s) && \
> QEMU_IS_ALIGNED(a->rd, ARG_NF)) { \
> - return ldst_whole_trans(a->rd, a->rs1, ARG_NF, WIDTH, \
> + return ldst_whole_trans(a->rd, a->rs1, ARG_NF, \
> gen_helper_##NAME, s); \
> } \
> return false; \
> }
>
> -GEN_LDST_WHOLE_TRANS(vl1re8_v, 1, 1)
> -GEN_LDST_WHOLE_TRANS(vl1re16_v, 1, 2)
> -GEN_LDST_WHOLE_TRANS(vl1re32_v, 1, 4)
> -GEN_LDST_WHOLE_TRANS(vl1re64_v, 1, 8)
> -GEN_LDST_WHOLE_TRANS(vl2re8_v, 2, 1)
> -GEN_LDST_WHOLE_TRANS(vl2re16_v, 2, 2)
> -GEN_LDST_WHOLE_TRANS(vl2re32_v, 2, 4)
> -GEN_LDST_WHOLE_TRANS(vl2re64_v, 2, 8)
> -GEN_LDST_WHOLE_TRANS(vl4re8_v, 4, 1)
> -GEN_LDST_WHOLE_TRANS(vl4re16_v, 4, 2)
> -GEN_LDST_WHOLE_TRANS(vl4re32_v, 4, 4)
> -GEN_LDST_WHOLE_TRANS(vl4re64_v, 4, 8)
> -GEN_LDST_WHOLE_TRANS(vl8re8_v, 8, 1)
> -GEN_LDST_WHOLE_TRANS(vl8re16_v, 8, 2)
> -GEN_LDST_WHOLE_TRANS(vl8re32_v, 8, 4)
> -GEN_LDST_WHOLE_TRANS(vl8re64_v, 8, 8)
> +GEN_LDST_WHOLE_TRANS(vl1re8_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl1re16_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl1re32_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl1re64_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl2re8_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl2re16_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl2re32_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl2re64_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl4re8_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl4re16_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl4re32_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl4re64_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl8re8_v, 8)
> +GEN_LDST_WHOLE_TRANS(vl8re16_v, 8)
> +GEN_LDST_WHOLE_TRANS(vl8re32_v, 8)
> +GEN_LDST_WHOLE_TRANS(vl8re64_v, 8)
>
> /*
> * The vector whole register store instructions are encoded similar to
> * unmasked unit-stride store of elements with EEW=8.
> */
> -GEN_LDST_WHOLE_TRANS(vs1r_v, 1, 1)
> -GEN_LDST_WHOLE_TRANS(vs2r_v, 2, 1)
> -GEN_LDST_WHOLE_TRANS(vs4r_v, 4, 1)
> -GEN_LDST_WHOLE_TRANS(vs8r_v, 8, 1)
> +GEN_LDST_WHOLE_TRANS(vs1r_v, 1)
> +GEN_LDST_WHOLE_TRANS(vs2r_v, 2)
> +GEN_LDST_WHOLE_TRANS(vs4r_v, 4)
> +GEN_LDST_WHOLE_TRANS(vs8r_v, 8)
>
> /*
> *** Vector Integer Arithmetic Instructions
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index bcc553c0e2..1f4c276b21 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -572,6 +572,11 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> uint32_t vlenb = riscv_cpu_cfg(env)->vlenb;
> uint32_t max_elems = vlenb >> log2_esz;
>
> + if (env->vstart >= ((vlenb * nf) >> log2_esz)) {
> + env->vstart = 0;
> + return;
> + }
> +
> k = env->vstart / max_elems;
> off = env->vstart % max_elems;
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns
2024-03-14 17:56 ` [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns Daniel Henrique Barboza
2024-03-15 11:25 ` Max Chou
@ 2024-03-18 9:10 ` Alistair Francis
1 sibling, 0 replies; 24+ messages in thread
From: Alistair Francis @ 2024-03-18 9:10 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
zhiwei_liu, palmer, max.chou, richard.henderson
On Fri, Mar 15, 2024 at 3:57 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> Commit 8ff8ac6329 added a conditional to guard the vext_ldst_whole()
> helper if vstart >= evl. But by skipping the helper we're also not
> setting vstart = 0 at the end of the insns, which is incorrect.
>
> We'll move the conditional to vext_ldst_whole(), following in line with
> the removal of all brconds vstart >= vl that the next patch will do. The
> idea is to make the helpers responsible for their own vstart management.
>
> Fix ldst_whole isns by:
>
> - remove the brcond that skips the helper if vstart is >= evl;
>
> - vext_ldst_whole() now does an early exit with the same check, where
> evl = (vlenb * nf) >> log2_esz, but the early exit will also clear
> vstart.
>
> The 'width' param is now unneeded in ldst_whole_trans() and is also
> removed. It was used for the evl calculation for the brcond and has no
> other use now. The 'width' is reflected in vext_ldst_whole() via
> log2_esz, which is encoded by GEN_VEXT_LD_WHOLE() as
> "ctzl(sizeof(ETYPE))".
>
> Suggested-by: Max Chou <max.chou@sifive.com>
> Fixes: 8ff8ac6329 ("target/riscv: rvv: Add missing early exit condition for whole register load/store")
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> target/riscv/insn_trans/trans_rvv.c.inc | 52 +++++++++++--------------
> target/riscv/vector_helper.c | 5 +++
> 2 files changed, 28 insertions(+), 29 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
> index 52c26a7834..1366445e1f 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -1097,13 +1097,9 @@ GEN_VEXT_TRANS(vle64ff_v, MO_64, r2nfvm, ldff_op, ld_us_check)
> typedef void gen_helper_ldst_whole(TCGv_ptr, TCGv, TCGv_env, TCGv_i32);
>
> static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
> - uint32_t width, gen_helper_ldst_whole *fn,
> + gen_helper_ldst_whole *fn,
> DisasContext *s)
> {
> - uint32_t evl = s->cfg_ptr->vlenb * nf / width;
> - TCGLabel *over = gen_new_label();
> - tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
> -
> TCGv_ptr dest;
> TCGv base;
> TCGv_i32 desc;
> @@ -1120,8 +1116,6 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
>
> fn(dest, base, tcg_env, desc);
>
> - gen_set_label(over);
> -
> return true;
> }
>
> @@ -1129,42 +1123,42 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
> * load and store whole register instructions ignore vtype and vl setting.
> * Thus, we don't need to check vill bit. (Section 7.9)
> */
> -#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF, WIDTH) \
> +#define GEN_LDST_WHOLE_TRANS(NAME, ARG_NF) \
> static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
> { \
> if (require_rvv(s) && \
> QEMU_IS_ALIGNED(a->rd, ARG_NF)) { \
> - return ldst_whole_trans(a->rd, a->rs1, ARG_NF, WIDTH, \
> + return ldst_whole_trans(a->rd, a->rs1, ARG_NF, \
> gen_helper_##NAME, s); \
> } \
> return false; \
> }
>
> -GEN_LDST_WHOLE_TRANS(vl1re8_v, 1, 1)
> -GEN_LDST_WHOLE_TRANS(vl1re16_v, 1, 2)
> -GEN_LDST_WHOLE_TRANS(vl1re32_v, 1, 4)
> -GEN_LDST_WHOLE_TRANS(vl1re64_v, 1, 8)
> -GEN_LDST_WHOLE_TRANS(vl2re8_v, 2, 1)
> -GEN_LDST_WHOLE_TRANS(vl2re16_v, 2, 2)
> -GEN_LDST_WHOLE_TRANS(vl2re32_v, 2, 4)
> -GEN_LDST_WHOLE_TRANS(vl2re64_v, 2, 8)
> -GEN_LDST_WHOLE_TRANS(vl4re8_v, 4, 1)
> -GEN_LDST_WHOLE_TRANS(vl4re16_v, 4, 2)
> -GEN_LDST_WHOLE_TRANS(vl4re32_v, 4, 4)
> -GEN_LDST_WHOLE_TRANS(vl4re64_v, 4, 8)
> -GEN_LDST_WHOLE_TRANS(vl8re8_v, 8, 1)
> -GEN_LDST_WHOLE_TRANS(vl8re16_v, 8, 2)
> -GEN_LDST_WHOLE_TRANS(vl8re32_v, 8, 4)
> -GEN_LDST_WHOLE_TRANS(vl8re64_v, 8, 8)
> +GEN_LDST_WHOLE_TRANS(vl1re8_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl1re16_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl1re32_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl1re64_v, 1)
> +GEN_LDST_WHOLE_TRANS(vl2re8_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl2re16_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl2re32_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl2re64_v, 2)
> +GEN_LDST_WHOLE_TRANS(vl4re8_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl4re16_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl4re32_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl4re64_v, 4)
> +GEN_LDST_WHOLE_TRANS(vl8re8_v, 8)
> +GEN_LDST_WHOLE_TRANS(vl8re16_v, 8)
> +GEN_LDST_WHOLE_TRANS(vl8re32_v, 8)
> +GEN_LDST_WHOLE_TRANS(vl8re64_v, 8)
>
> /*
> * The vector whole register store instructions are encoded similar to
> * unmasked unit-stride store of elements with EEW=8.
> */
> -GEN_LDST_WHOLE_TRANS(vs1r_v, 1, 1)
> -GEN_LDST_WHOLE_TRANS(vs2r_v, 2, 1)
> -GEN_LDST_WHOLE_TRANS(vs4r_v, 4, 1)
> -GEN_LDST_WHOLE_TRANS(vs8r_v, 8, 1)
> +GEN_LDST_WHOLE_TRANS(vs1r_v, 1)
> +GEN_LDST_WHOLE_TRANS(vs2r_v, 2)
> +GEN_LDST_WHOLE_TRANS(vs4r_v, 4)
> +GEN_LDST_WHOLE_TRANS(vs8r_v, 8)
>
> /*
> *** Vector Integer Arithmetic Instructions
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index bcc553c0e2..1f4c276b21 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -572,6 +572,11 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> uint32_t vlenb = riscv_cpu_cfg(env)->vlenb;
> uint32_t max_elems = vlenb >> log2_esz;
>
> + if (env->vstart >= ((vlenb * nf) >> log2_esz)) {
> + env->vstart = 0;
> + return;
> + }
> +
> k = env->vstart / max_elems;
> off = env->vstart % max_elems;
>
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 06/10] target/riscv/vector_helpers: do early exit when vstart >= vl
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (4 preceding siblings ...)
2024-03-14 17:56 ` [PATCH for 9.0 v15 05/10] target/riscv: always clear vstart for ldst_whole insns Daniel Henrique Barboza
@ 2024-03-14 17:57 ` Daniel Henrique Barboza
2024-03-20 4:44 ` Alistair Francis
2024-03-14 17:57 ` [PATCH for 9.0 v15 07/10] target/riscv: remove 'over' brconds from vector trans Daniel Henrique Barboza
` (4 subsequent siblings)
10 siblings, 1 reply; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
We're going to make changes that will required each helper to be
responsible for the 'vstart' management, i.e. we will relieve the
'vstart < vl' assumption that helpers have today.
Helpers are usually able to deal with vstart >= vl, i.e. doing nothing
aside from setting vstart = 0 at the end, but the tail update functions
will update the tail regardless of vstart being valid or not. Unifying
the tail update process in a single function that would handle the
vstart >= vl case isn't trivial (see [1] for more info).
This patch takes a blunt approach: do an early exit in every single
vector helper if vstart >= vl, unless the helper is guarded with
vstart_eq_zero in the translation. For those cases the helper is ready
to deal with cases where vl might be zero, i.e. throwing exceptions
based on it like vcpop_m() and first_m().
Helpers that weren't changed:
- vcpop_m(), vfirst_m(), vmsetm(), GEN_VEXT_VIOTA_M(): these are guarded
directly with vstart_eq_zero;
- GEN_VEXT_VCOMPRESS_VM(): guarded with vcompress_vm_check() that checks
vstart_eq_zero;
- GEN_VEXT_RED(): guarded with either reduction_check() or
reduction_widen_check(), both check vstart_eq_zero;
- GEN_VEXT_FRED(): guarded with either freduction_check() or
freduction_widen_check(), both check vstart_eq_zero.
Another exception is vext_ldst_whole(), who operates on effective vector
length regardless of the current settings in vtype and vl.
[1] https://lore.kernel.org/qemu-riscv/1590234b-0291-432a-a0fa-c5a6876097bc@linux.alibaba.com/
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
target/riscv/vcrypto_helper.c | 32 ++++++++++++++++
target/riscv/vector_helper.c | 66 +++++++++++++++++++++++++++++++++
target/riscv/vector_internals.c | 4 ++
target/riscv/vector_internals.h | 9 +++++
4 files changed, 111 insertions(+)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index e2d719b13b..f7423df226 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -222,6 +222,8 @@ static inline void xor_round_key(AESState *round_state, AESState *round_key)
uint32_t total_elems = vext_get_total_elems(env, desc, 4); \
uint32_t vta = vext_vta(desc); \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) { \
AESState round_key; \
round_key.d[0] = *((uint64_t *)vs2 + H8(i * 2 + 0)); \
@@ -246,6 +248,8 @@ static inline void xor_round_key(AESState *round_state, AESState *round_key)
uint32_t total_elems = vext_get_total_elems(env, desc, 4); \
uint32_t vta = vext_vta(desc); \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) { \
AESState round_key; \
round_key.d[0] = *((uint64_t *)vs2 + H8(0)); \
@@ -305,6 +309,8 @@ void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
uint32_t total_elems = vext_get_total_elems(env, desc, 4);
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
uimm &= 0b1111;
if (uimm > 10 || uimm == 0) {
uimm ^= 0b1000;
@@ -351,6 +357,8 @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
uint32_t total_elems = vext_get_total_elems(env, desc, 4);
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
uimm &= 0b1111;
if (uimm > 14 || uimm < 2) {
uimm ^= 0b1000;
@@ -457,6 +465,8 @@ void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
uint32_t total_elems;
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
if (sew == MO_32) {
vsha2ms_e32(((uint32_t *)vd) + i * 4, ((uint32_t *)vs1) + i * 4,
@@ -572,6 +582,8 @@ void HELPER(vsha2ch32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
uint32_t total_elems;
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
((uint32_t *)vs1) + 4 * i + 2);
@@ -590,6 +602,8 @@ void HELPER(vsha2ch64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
uint32_t total_elems;
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
((uint64_t *)vs1) + 4 * i + 2);
@@ -608,6 +622,8 @@ void HELPER(vsha2cl32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
uint32_t total_elems;
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
(((uint32_t *)vs1) + 4 * i));
@@ -626,6 +642,8 @@ void HELPER(vsha2cl64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
uint32_t total_elems;
uint32_t vta = vext_vta(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
(((uint64_t *)vs1) + 4 * i));
@@ -658,6 +676,8 @@ void HELPER(vsm3me_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
uint32_t *vs1 = vs1_vptr;
uint32_t *vs2 = vs2_vptr;
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (int i = env->vstart / 8; i < env->vl / 8; i++) {
uint32_t w[24];
for (int j = 0; j < 8; j++) {
@@ -757,6 +777,8 @@ void HELPER(vsm3c_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
uint32_t *vs2 = vs2_vptr;
uint32_t v1[8], v2[8], v3[8];
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (int i = env->vstart / 8; i < env->vl / 8; i++) {
for (int k = 0; k < 8; k++) {
v2[k] = bswap32(vd[H4(i * 8 + k)]);
@@ -780,6 +802,8 @@ void HELPER(vghsh_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
uint32_t vta = vext_vta(desc);
uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
uint64_t Y[2] = {vd[i * 2 + 0], vd[i * 2 + 1]};
uint64_t H[2] = {brev8(vs2[i * 2 + 0]), brev8(vs2[i * 2 + 1])};
@@ -817,6 +841,8 @@ void HELPER(vgmul_vv)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,
uint32_t vta = vext_vta(desc);
uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
uint64_t Y[2] = {brev8(vd[i * 2 + 0]), brev8(vd[i * 2 + 1])};
uint64_t H[2] = {brev8(vs2[i * 2 + 0]), brev8(vs2[i * 2 + 1])};
@@ -853,6 +879,8 @@ void HELPER(vsm4k_vi)(void *vd, void *vs2, uint32_t uimm5, CPURISCVState *env,
uint32_t esz = sizeof(uint32_t);
uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = group_start; i < group_end; ++i) {
uint32_t vstart = i * egs;
uint32_t vend = (i + 1) * egs;
@@ -909,6 +937,8 @@ void HELPER(vsm4r_vv)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
uint32_t esz = sizeof(uint32_t);
uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = group_start; i < group_end; ++i) {
uint32_t vstart = i * egs;
uint32_t vend = (i + 1) * egs;
@@ -943,6 +973,8 @@ void HELPER(vsm4r_vs)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
uint32_t esz = sizeof(uint32_t);
uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = group_start; i < group_end; ++i) {
uint32_t vstart = i * egs;
uint32_t vend = (i + 1) * egs;
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 1f4c276b21..63a1083f03 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -207,6 +207,8 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
uint32_t esz = 1 << log2_esz;
uint32_t vma = vext_vma(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (i = env->vstart; i < env->vl; i++, env->vstart++) {
k = 0;
while (k < nf) {
@@ -272,6 +274,8 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
uint32_t max_elems = vext_max_elems(desc, log2_esz);
uint32_t esz = 1 << log2_esz;
+ VSTART_CHECK_EARLY_EXIT(env);
+
/* load bytes from guest memory */
for (i = env->vstart; i < evl; i++, env->vstart++) {
k = 0;
@@ -386,6 +390,8 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
uint32_t esz = 1 << log2_esz;
uint32_t vma = vext_vma(desc);
+ VSTART_CHECK_EARLY_EXIT(env);
+
/* load bytes from guest memory */
for (i = env->vstart; i < env->vl; i++, env->vstart++) {
k = 0;
@@ -477,6 +483,8 @@ vext_ldff(void *vd, void *v0, target_ulong base,
target_ulong addr, offset, remain;
int mmu_index = riscv_env_mmu_index(env, false);
+ VSTART_CHECK_EARLY_EXIT(env);
+
/* probe every access */
for (i = env->vstart; i < env->vl; i++) {
if (!vm && !vext_elem_mask(v0, i)) {
@@ -882,6 +890,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
@@ -914,6 +924,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
ETYPE carry = vext_elem_mask(v0, i); \
@@ -949,6 +961,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
uint32_t vta_all_1s = vext_vta_all_1s(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
@@ -987,6 +1001,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \
uint32_t vta_all_1s = vext_vta_all_1s(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
ETYPE carry = !vm && vext_elem_mask(v0, i); \
@@ -1083,6 +1099,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -1130,6 +1148,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -1192,6 +1212,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
@@ -1257,6 +1279,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
if (!vm && !vext_elem_mask(v0, i)) { \
@@ -1804,6 +1828,8 @@ void HELPER(NAME)(void *vd, void *vs1, CPURISCVState *env, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
*((ETYPE *)vd + H(i)) = s1; \
@@ -1828,6 +1854,8 @@ void HELPER(NAME)(void *vd, uint64_t s1, CPURISCVState *env, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
*((ETYPE *)vd + H(i)) = (ETYPE)s1; \
} \
@@ -1851,6 +1879,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE *vt = (!vext_elem_mask(v0, i) ? vs2 : vs1); \
*((ETYPE *)vd + H(i)) = *(vt + H(i)); \
@@ -1875,6 +1905,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
ETYPE d = (!vext_elem_mask(v0, i) ? s2 : \
@@ -1920,6 +1952,8 @@ vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
uint32_t vl, uint32_t vm, int vxrm,
opivv2_rm_fn *fn, uint32_t vma, uint32_t esz)
{
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart; i < vl; i++) {
if (!vm && !vext_elem_mask(v0, i)) {
/* set masked-off elements to 1s */
@@ -2045,6 +2079,8 @@ vext_vx_rm_1(void *vd, void *v0, target_long s1, void *vs2,
uint32_t vl, uint32_t vm, int vxrm,
opivx2_rm_fn *fn, uint32_t vma, uint32_t esz)
{
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (uint32_t i = env->vstart; i < vl; i++) {
if (!vm && !vext_elem_mask(v0, i)) {
/* set masked-off elements to 1s */
@@ -2842,6 +2878,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -2885,6 +2923,8 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -3471,6 +3511,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
if (vl == 0) { \
return; \
} \
@@ -3992,6 +4034,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
@@ -4032,6 +4076,8 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
if (!vm && !vext_elem_mask(v0, i)) { \
@@ -4225,6 +4271,8 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
uint32_t vta = vext_vta(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
*((ETYPE *)vd + H(i)) = \
@@ -4549,6 +4597,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, \
uint32_t i; \
int a, b; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
a = vext_elem_mask(vs1, i); \
b = vext_elem_mask(vs2, i); \
@@ -4742,6 +4792,8 @@ void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc) \
uint32_t vma = vext_vma(desc); \
int i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -4777,6 +4829,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
uint32_t vma = vext_vma(desc); \
target_ulong offset = s1, i_min, i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
i_min = MAX(env->vstart, offset); \
for (i = i_min; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
@@ -4810,6 +4864,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
uint32_t vma = vext_vma(desc); \
target_ulong i_max, i_min, i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
i_min = MIN(s1 < vlmax ? vlmax - s1 : 0, vl); \
i_max = MAX(i_min, env->vstart); \
for (i = env->vstart; i < i_max; ++i) { \
@@ -4852,6 +4908,8 @@ static void vslide1up_##BITWIDTH(void *vd, void *v0, uint64_t s1, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -4901,6 +4959,8 @@ static void vslide1down_##BITWIDTH(void *vd, void *v0, uint64_t s1, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -4976,6 +5036,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
uint64_t index; \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -5019,6 +5081,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
uint64_t index = s1; \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
@@ -5113,6 +5177,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
index 12f5964fbb..996c21eb31 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -44,6 +44,8 @@ void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
uint32_t vma = vext_vma(desc);
uint32_t i;
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (i = env->vstart; i < vl; i++) {
if (!vm && !vext_elem_mask(v0, i)) {
/* set masked-off elements to 1s */
@@ -68,6 +70,8 @@ void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
uint32_t vma = vext_vma(desc);
uint32_t i;
+ VSTART_CHECK_EARLY_EXIT(env);
+
for (i = env->vstart; i < vl; i++) {
if (!vm && !vext_elem_mask(v0, i)) {
/* set masked-off elements to 1s */
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index 842765f6c1..9e1e15b575 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -24,6 +24,13 @@
#include "tcg/tcg-gvec-desc.h"
#include "internals.h"
+#define VSTART_CHECK_EARLY_EXIT(env) do { \
+ if (env->vstart >= env->vl) { \
+ env->vstart = 0; \
+ return; \
+ } \
+} while (0)
+
static inline uint32_t vext_nf(uint32_t desc)
{
return FIELD_EX32(simd_data(desc), VDATA, NF);
@@ -151,6 +158,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, \
uint32_t vma = vext_vma(desc); \
uint32_t i; \
\
+ VSTART_CHECK_EARLY_EXIT(env); \
+ \
for (i = env->vstart; i < vl; i++) { \
if (!vm && !vext_elem_mask(v0, i)) { \
/* set masked-off elements to 1s */ \
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 06/10] target/riscv/vector_helpers: do early exit when vstart >= vl
2024-03-14 17:57 ` [PATCH for 9.0 v15 06/10] target/riscv/vector_helpers: do early exit when vstart >= vl Daniel Henrique Barboza
@ 2024-03-20 4:44 ` Alistair Francis
0 siblings, 0 replies; 24+ messages in thread
From: Alistair Francis @ 2024-03-20 4:44 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
zhiwei_liu, palmer, max.chou, richard.henderson
On Fri, Mar 15, 2024 at 3:59 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> We're going to make changes that will required each helper to be
> responsible for the 'vstart' management, i.e. we will relieve the
> 'vstart < vl' assumption that helpers have today.
>
> Helpers are usually able to deal with vstart >= vl, i.e. doing nothing
> aside from setting vstart = 0 at the end, but the tail update functions
> will update the tail regardless of vstart being valid or not. Unifying
> the tail update process in a single function that would handle the
> vstart >= vl case isn't trivial (see [1] for more info).
>
> This patch takes a blunt approach: do an early exit in every single
> vector helper if vstart >= vl, unless the helper is guarded with
> vstart_eq_zero in the translation. For those cases the helper is ready
> to deal with cases where vl might be zero, i.e. throwing exceptions
> based on it like vcpop_m() and first_m().
>
> Helpers that weren't changed:
>
> - vcpop_m(), vfirst_m(), vmsetm(), GEN_VEXT_VIOTA_M(): these are guarded
> directly with vstart_eq_zero;
>
> - GEN_VEXT_VCOMPRESS_VM(): guarded with vcompress_vm_check() that checks
> vstart_eq_zero;
>
> - GEN_VEXT_RED(): guarded with either reduction_check() or
> reduction_widen_check(), both check vstart_eq_zero;
>
> - GEN_VEXT_FRED(): guarded with either freduction_check() or
> freduction_widen_check(), both check vstart_eq_zero.
>
> Another exception is vext_ldst_whole(), who operates on effective vector
> length regardless of the current settings in vtype and vl.
>
> [1] https://lore.kernel.org/qemu-riscv/1590234b-0291-432a-a0fa-c5a6876097bc@linux.alibaba.com/
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> target/riscv/vcrypto_helper.c | 32 ++++++++++++++++
> target/riscv/vector_helper.c | 66 +++++++++++++++++++++++++++++++++
> target/riscv/vector_internals.c | 4 ++
> target/riscv/vector_internals.h | 9 +++++
> 4 files changed, 111 insertions(+)
>
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index e2d719b13b..f7423df226 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -222,6 +222,8 @@ static inline void xor_round_key(AESState *round_state, AESState *round_key)
> uint32_t total_elems = vext_get_total_elems(env, desc, 4); \
> uint32_t vta = vext_vta(desc); \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) { \
> AESState round_key; \
> round_key.d[0] = *((uint64_t *)vs2 + H8(i * 2 + 0)); \
> @@ -246,6 +248,8 @@ static inline void xor_round_key(AESState *round_state, AESState *round_key)
> uint32_t total_elems = vext_get_total_elems(env, desc, 4); \
> uint32_t vta = vext_vta(desc); \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) { \
> AESState round_key; \
> round_key.d[0] = *((uint64_t *)vs2 + H8(0)); \
> @@ -305,6 +309,8 @@ void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
> uint32_t total_elems = vext_get_total_elems(env, desc, 4);
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> uimm &= 0b1111;
> if (uimm > 10 || uimm == 0) {
> uimm ^= 0b1000;
> @@ -351,6 +357,8 @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
> uint32_t total_elems = vext_get_total_elems(env, desc, 4);
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> uimm &= 0b1111;
> if (uimm > 14 || uimm < 2) {
> uimm ^= 0b1000;
> @@ -457,6 +465,8 @@ void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
> uint32_t total_elems;
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> if (sew == MO_32) {
> vsha2ms_e32(((uint32_t *)vd) + i * 4, ((uint32_t *)vs1) + i * 4,
> @@ -572,6 +582,8 @@ void HELPER(vsha2ch32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
> uint32_t total_elems;
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
> ((uint32_t *)vs1) + 4 * i + 2);
> @@ -590,6 +602,8 @@ void HELPER(vsha2ch64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
> uint32_t total_elems;
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
> ((uint64_t *)vs1) + 4 * i + 2);
> @@ -608,6 +622,8 @@ void HELPER(vsha2cl32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
> uint32_t total_elems;
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
> (((uint32_t *)vs1) + 4 * i));
> @@ -626,6 +642,8 @@ void HELPER(vsha2cl64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
> uint32_t total_elems;
> uint32_t vta = vext_vta(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
> (((uint64_t *)vs1) + 4 * i));
> @@ -658,6 +676,8 @@ void HELPER(vsm3me_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
> uint32_t *vs1 = vs1_vptr;
> uint32_t *vs2 = vs2_vptr;
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (int i = env->vstart / 8; i < env->vl / 8; i++) {
> uint32_t w[24];
> for (int j = 0; j < 8; j++) {
> @@ -757,6 +777,8 @@ void HELPER(vsm3c_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
> uint32_t *vs2 = vs2_vptr;
> uint32_t v1[8], v2[8], v3[8];
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (int i = env->vstart / 8; i < env->vl / 8; i++) {
> for (int k = 0; k < 8; k++) {
> v2[k] = bswap32(vd[H4(i * 8 + k)]);
> @@ -780,6 +802,8 @@ void HELPER(vghsh_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
> uint32_t vta = vext_vta(desc);
> uint32_t total_elems = vext_get_total_elems(env, desc, 4);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> uint64_t Y[2] = {vd[i * 2 + 0], vd[i * 2 + 1]};
> uint64_t H[2] = {brev8(vs2[i * 2 + 0]), brev8(vs2[i * 2 + 1])};
> @@ -817,6 +841,8 @@ void HELPER(vgmul_vv)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,
> uint32_t vta = vext_vta(desc);
> uint32_t total_elems = vext_get_total_elems(env, desc, 4);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
> uint64_t Y[2] = {brev8(vd[i * 2 + 0]), brev8(vd[i * 2 + 1])};
> uint64_t H[2] = {brev8(vs2[i * 2 + 0]), brev8(vs2[i * 2 + 1])};
> @@ -853,6 +879,8 @@ void HELPER(vsm4k_vi)(void *vd, void *vs2, uint32_t uimm5, CPURISCVState *env,
> uint32_t esz = sizeof(uint32_t);
> uint32_t total_elems = vext_get_total_elems(env, desc, esz);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = group_start; i < group_end; ++i) {
> uint32_t vstart = i * egs;
> uint32_t vend = (i + 1) * egs;
> @@ -909,6 +937,8 @@ void HELPER(vsm4r_vv)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
> uint32_t esz = sizeof(uint32_t);
> uint32_t total_elems = vext_get_total_elems(env, desc, esz);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = group_start; i < group_end; ++i) {
> uint32_t vstart = i * egs;
> uint32_t vend = (i + 1) * egs;
> @@ -943,6 +973,8 @@ void HELPER(vsm4r_vs)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
> uint32_t esz = sizeof(uint32_t);
> uint32_t total_elems = vext_get_total_elems(env, desc, esz);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = group_start; i < group_end; ++i) {
> uint32_t vstart = i * egs;
> uint32_t vend = (i + 1) * egs;
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 1f4c276b21..63a1083f03 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -207,6 +207,8 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
> uint32_t esz = 1 << log2_esz;
> uint32_t vma = vext_vma(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (i = env->vstart; i < env->vl; i++, env->vstart++) {
> k = 0;
> while (k < nf) {
> @@ -272,6 +274,8 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> uint32_t max_elems = vext_max_elems(desc, log2_esz);
> uint32_t esz = 1 << log2_esz;
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> /* load bytes from guest memory */
> for (i = env->vstart; i < evl; i++, env->vstart++) {
> k = 0;
> @@ -386,6 +390,8 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
> uint32_t esz = 1 << log2_esz;
> uint32_t vma = vext_vma(desc);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> /* load bytes from guest memory */
> for (i = env->vstart; i < env->vl; i++, env->vstart++) {
> k = 0;
> @@ -477,6 +483,8 @@ vext_ldff(void *vd, void *v0, target_ulong base,
> target_ulong addr, offset, remain;
> int mmu_index = riscv_env_mmu_index(env, false);
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> /* probe every access */
> for (i = env->vstart; i < env->vl; i++) {
> if (!vm && !vext_elem_mask(v0, i)) {
> @@ -882,6 +890,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> @@ -914,6 +924,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> ETYPE carry = vext_elem_mask(v0, i); \
> @@ -949,6 +961,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
> uint32_t vta_all_1s = vext_vta_all_1s(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> @@ -987,6 +1001,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \
> uint32_t vta_all_1s = vext_vta_all_1s(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> ETYPE carry = !vm && vext_elem_mask(v0, i); \
> @@ -1083,6 +1099,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -1130,6 +1148,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -1192,6 +1212,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> @@ -1257,6 +1279,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> if (!vm && !vext_elem_mask(v0, i)) { \
> @@ -1804,6 +1828,8 @@ void HELPER(NAME)(void *vd, void *vs1, CPURISCVState *env, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
> *((ETYPE *)vd + H(i)) = s1; \
> @@ -1828,6 +1854,8 @@ void HELPER(NAME)(void *vd, uint64_t s1, CPURISCVState *env, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> *((ETYPE *)vd + H(i)) = (ETYPE)s1; \
> } \
> @@ -1851,6 +1879,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE *vt = (!vext_elem_mask(v0, i) ? vs2 : vs1); \
> *((ETYPE *)vd + H(i)) = *(vt + H(i)); \
> @@ -1875,6 +1905,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> ETYPE d = (!vext_elem_mask(v0, i) ? s2 : \
> @@ -1920,6 +1952,8 @@ vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
> uint32_t vl, uint32_t vm, int vxrm,
> opivv2_rm_fn *fn, uint32_t vma, uint32_t esz)
> {
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart; i < vl; i++) {
> if (!vm && !vext_elem_mask(v0, i)) {
> /* set masked-off elements to 1s */
> @@ -2045,6 +2079,8 @@ vext_vx_rm_1(void *vd, void *v0, target_long s1, void *vs2,
> uint32_t vl, uint32_t vm, int vxrm,
> opivx2_rm_fn *fn, uint32_t vma, uint32_t esz)
> {
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (uint32_t i = env->vstart; i < vl; i++) {
> if (!vm && !vext_elem_mask(v0, i)) {
> /* set masked-off elements to 1s */
> @@ -2842,6 +2878,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -2885,6 +2923,8 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -3471,6 +3511,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> if (vl == 0) { \
> return; \
> } \
> @@ -3992,6 +4034,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s1 = *((ETYPE *)vs1 + H(i)); \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> @@ -4032,6 +4076,8 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> if (!vm && !vext_elem_mask(v0, i)) { \
> @@ -4225,6 +4271,8 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
> uint32_t vta = vext_vta(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> ETYPE s2 = *((ETYPE *)vs2 + H(i)); \
> *((ETYPE *)vd + H(i)) = \
> @@ -4549,6 +4597,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, \
> uint32_t i; \
> int a, b; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> a = vext_elem_mask(vs1, i); \
> b = vext_elem_mask(vs2, i); \
> @@ -4742,6 +4792,8 @@ void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc) \
> uint32_t vma = vext_vma(desc); \
> int i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -4777,6 +4829,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> target_ulong offset = s1, i_min, i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> i_min = MAX(env->vstart, offset); \
> for (i = i_min; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> @@ -4810,6 +4864,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> target_ulong i_max, i_min, i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> i_min = MIN(s1 < vlmax ? vlmax - s1 : 0, vl); \
> i_max = MAX(i_min, env->vstart); \
> for (i = env->vstart; i < i_max; ++i) { \
> @@ -4852,6 +4908,8 @@ static void vslide1up_##BITWIDTH(void *vd, void *v0, uint64_t s1, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -4901,6 +4959,8 @@ static void vslide1down_##BITWIDTH(void *vd, void *v0, uint64_t s1, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -4976,6 +5036,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \
> uint64_t index; \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -5019,6 +5081,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> uint64_t index = s1; \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> @@ -5113,6 +5177,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
> index 12f5964fbb..996c21eb31 100644
> --- a/target/riscv/vector_internals.c
> +++ b/target/riscv/vector_internals.c
> @@ -44,6 +44,8 @@ void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
> uint32_t vma = vext_vma(desc);
> uint32_t i;
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (i = env->vstart; i < vl; i++) {
> if (!vm && !vext_elem_mask(v0, i)) {
> /* set masked-off elements to 1s */
> @@ -68,6 +70,8 @@ void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
> uint32_t vma = vext_vma(desc);
> uint32_t i;
>
> + VSTART_CHECK_EARLY_EXIT(env);
> +
> for (i = env->vstart; i < vl; i++) {
> if (!vm && !vext_elem_mask(v0, i)) {
> /* set masked-off elements to 1s */
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> index 842765f6c1..9e1e15b575 100644
> --- a/target/riscv/vector_internals.h
> +++ b/target/riscv/vector_internals.h
> @@ -24,6 +24,13 @@
> #include "tcg/tcg-gvec-desc.h"
> #include "internals.h"
>
> +#define VSTART_CHECK_EARLY_EXIT(env) do { \
> + if (env->vstart >= env->vl) { \
> + env->vstart = 0; \
> + return; \
> + } \
> +} while (0)
> +
> static inline uint32_t vext_nf(uint32_t desc)
> {
> return FIELD_EX32(simd_data(desc), VDATA, NF);
> @@ -151,6 +158,8 @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, \
> uint32_t vma = vext_vma(desc); \
> uint32_t i; \
> \
> + VSTART_CHECK_EARLY_EXIT(env); \
> + \
> for (i = env->vstart; i < vl; i++) { \
> if (!vm && !vext_elem_mask(v0, i)) { \
> /* set masked-off elements to 1s */ \
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 07/10] target/riscv: remove 'over' brconds from vector trans
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (5 preceding siblings ...)
2024-03-14 17:57 ` [PATCH for 9.0 v15 06/10] target/riscv/vector_helpers: do early exit when vstart >= vl Daniel Henrique Barboza
@ 2024-03-14 17:57 ` Daniel Henrique Barboza
2024-03-14 17:57 ` [PATCH for 9.0 v15 08/10] trans_rvv.c.inc: remove redundant mark_vs_dirty() calls Daniel Henrique Barboza
` (3 subsequent siblings)
10 siblings, 0 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
All helpers that rely on vstart >= vl are now doing early exits using
the VSTART_CHECK_EARLY_EXIT() macro. This macro will not only exit the
helper but also clear vstart.
We're still left with brconds that are skipping the helper, which is the
only place where we're clearing vstart. The pattern goes like this:
tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
(... calls helper that clears vstart ...)
gen_set_label(over);
return true;
This means that every time we jump to 'over' we're not clearing vstart,
which is an oversight that we're doing across the board.
Instead of setting vstart = 0 manually after each 'over' jump, remove
those brconds that are skipping helpers. The exception will be
trans_vmv_s_x() and trans_vfmv_s_f(): they don't use a helper and are
already clearing vstart manually in the 'over' label.
While we're at it, remove the (vl == 0) brconds from trans_rvbf16.c.inc
too since they're unneeded.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
target/riscv/insn_trans/trans_rvbf16.c.inc | 12 ---
target/riscv/insn_trans/trans_rvv.c.inc | 99 ----------------------
target/riscv/insn_trans/trans_rvvk.c.inc | 18 ----
3 files changed, 129 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc b/target/riscv/insn_trans/trans_rvbf16.c.inc
index 8ee99df3f3..a842e76a6b 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -71,11 +71,8 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, arg_vfncvtbf16_f_f_w *a)
if (opfv_narrow_check(ctx, a) && (ctx->sew == MO_16)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
- tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -87,7 +84,6 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, arg_vfncvtbf16_f_f_w *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
mark_vs_dirty(ctx);
- gen_set_label(over);
return true;
}
return false;
@@ -100,11 +96,8 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, arg_vfwcvtbf16_f_f_v *a)
if (opfv_widen_check(ctx, a) && (ctx->sew == MO_16)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
- tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -116,7 +109,6 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, arg_vfwcvtbf16_f_f_v *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
mark_vs_dirty(ctx);
- gen_set_label(over);
return true;
}
return false;
@@ -130,11 +122,8 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, arg_vfwmaccbf16_vv *a)
if (require_rvv(ctx) && vext_check_isa_ill(ctx) && (ctx->sew == MO_16) &&
vext_check_dss(ctx, a->rd, a->rs1, a->rs2, a->vm)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
gen_set_rm_chkfrm(ctx, RISCV_FRM_DYN);
- tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, ctx->lmul);
@@ -147,7 +136,6 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, arg_vfwmaccbf16_vv *a)
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
mark_vs_dirty(ctx);
- gen_set_label(over);
return true;
}
return false;
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 1366445e1f..7931fb2f3f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -616,9 +616,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
TCGv base;
TCGv_i32 desc;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
base = get_gpr(s, rs1, EXT_NONE);
@@ -660,7 +657,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
}
- gen_set_label(over);
return true;
}
@@ -802,9 +798,6 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
TCGv base, stride;
TCGv_i32 desc;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
base = get_gpr(s, rs1, EXT_NONE);
@@ -819,7 +812,6 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
fn(dest, mask, base, stride, tcg_env, desc);
- gen_set_label(over);
return true;
}
@@ -906,9 +898,6 @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
TCGv base;
TCGv_i32 desc;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
index = tcg_temp_new_ptr();
@@ -924,7 +913,6 @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
fn(dest, mask, base, index, tcg_env, desc);
- gen_set_label(over);
return true;
}
@@ -1044,9 +1032,6 @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
TCGv base;
TCGv_i32 desc;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
base = get_gpr(s, rs1, EXT_NONE);
@@ -1059,7 +1044,6 @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
fn(dest, mask, base, tcg_env, desc);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
@@ -1189,10 +1173,6 @@ static inline bool
do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
gen_helper_gvec_4_ptr *fn)
{
- TCGLabel *over = gen_new_label();
-
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
gvec_fn(s->sew, vreg_ofs(s, a->rd),
vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),
@@ -1210,7 +1190,6 @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
s->cfg_ptr->vlenb, data, fn);
}
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
@@ -1242,9 +1221,6 @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
TCGv_i32 desc;
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
src2 = tcg_temp_new_ptr();
@@ -1265,7 +1241,6 @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
fn(dest, mask, src1, src2, tcg_env, desc);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
@@ -1404,9 +1379,6 @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
TCGv_i32 desc;
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
src2 = tcg_temp_new_ptr();
@@ -1427,7 +1399,6 @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
fn(dest, mask, src1, src2, tcg_env, desc);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
@@ -1489,8 +1460,6 @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
{
if (checkfn(s, a)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -1503,7 +1472,6 @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
s->cfg_ptr->vlenb,
data, fn);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -1565,8 +1533,6 @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
{
if (opiwv_widen_check(s, a)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -1578,7 +1544,6 @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -1637,8 +1602,6 @@ static bool opivv_trans(uint32_t vd, uint32_t vs1, uint32_t vs2, uint32_t vm,
gen_helper_gvec_4_ptr *fn, DisasContext *s)
{
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, vm);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -1649,7 +1612,6 @@ static bool opivv_trans(uint32_t vd, uint32_t vs1, uint32_t vs2, uint32_t vm,
vreg_ofs(s, vs2), tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
@@ -1828,8 +1790,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
gen_helper_##NAME##_h, \
gen_helper_##NAME##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -1842,7 +1802,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2039,14 +1998,11 @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
};
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data,
fns[s->sew]);
- gen_set_label(over);
}
mark_vs_dirty(s);
return true;
@@ -2062,8 +2018,6 @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
/* vmv.v.x has rs2 = 0 and vm = 1 */
vext_check_ss(s, a->rd, 0, 1)) {
TCGv s1;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
s1 = get_gpr(s, a->rs1, EXT_SIGN);
@@ -2096,7 +2050,6 @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
}
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -2123,8 +2076,6 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
gen_helper_vmv_v_x_b, gen_helper_vmv_v_x_h,
gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
};
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
s1 = tcg_constant_i64(simm);
dest = tcg_temp_new_ptr();
@@ -2134,7 +2085,6 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
fns[s->sew](dest, s1, tcg_env, desc);
mark_vs_dirty(s);
- gen_set_label(over);
}
return true;
}
@@ -2269,9 +2219,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
gen_helper_##NAME##_w, \
gen_helper_##NAME##_d, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm(s, RISCV_FRM_DYN); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2286,7 +2234,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2304,9 +2251,6 @@ static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
TCGv_i32 desc;
TCGv_i64 t1;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-
dest = tcg_temp_new_ptr();
mask = tcg_temp_new_ptr();
src2 = tcg_temp_new_ptr();
@@ -2324,7 +2268,6 @@ static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
fn(dest, mask, t1, src2, tcg_env, desc);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
@@ -2387,9 +2330,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
static gen_helper_gvec_4_ptr * const fns[2] = { \
gen_helper_##NAME##_h, gen_helper_##NAME##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm(s, RISCV_FRM_DYN); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);\
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2402,7 +2343,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2461,9 +2401,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
static gen_helper_gvec_4_ptr * const fns[2] = { \
gen_helper_##NAME##_h, gen_helper_##NAME##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm(s, RISCV_FRM_DYN); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2476,7 +2414,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2578,9 +2515,7 @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
{
if (checkfn(s, a)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
gen_set_rm_chkfrm(s, rm);
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -2591,7 +2526,6 @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -2690,8 +2624,6 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
gen_helper_vmv_v_x_w,
gen_helper_vmv_v_x_d,
};
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
t1 = tcg_temp_new_i64();
/* NaN-box f[rs1] */
@@ -2705,7 +2637,6 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
fns[s->sew - 1](dest, t1, tcg_env, desc);
mark_vs_dirty(s);
- gen_set_label(over);
}
return true;
}
@@ -2767,9 +2698,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
gen_helper_##HELPER##_h, \
gen_helper_##HELPER##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm_chkfrm(s, FRM); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2781,7 +2710,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2818,9 +2746,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
gen_helper_##NAME##_h, \
gen_helper_##NAME##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm(s, RISCV_FRM_DYN); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2832,7 +2758,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2885,9 +2810,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
gen_helper_##HELPER##_h, \
gen_helper_##HELPER##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm_chkfrm(s, FRM); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2899,7 +2822,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -2934,9 +2856,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
gen_helper_##HELPER##_h, \
gen_helper_##HELPER##_w, \
}; \
- TCGLabel *over = gen_new_label(); \
gen_set_rm_chkfrm(s, FRM); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -2948,7 +2868,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, data, \
fns[s->sew]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -3025,8 +2944,6 @@ static bool trans_##NAME(DisasContext *s, arg_r *a) \
vext_check_isa_ill(s)) { \
uint32_t data = 0; \
gen_helper_gvec_4_ptr *fn = gen_helper_##NAME; \
- TCGLabel *over = gen_new_label(); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
data = \
@@ -3037,7 +2954,6 @@ static bool trans_##NAME(DisasContext *s, arg_r *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, fn); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -3125,8 +3041,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->vstart_eq_zero) { \
uint32_t data = 0; \
gen_helper_gvec_3_ptr *fn = gen_helper_##NAME; \
- TCGLabel *over = gen_new_label(); \
- tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -3139,7 +3053,6 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, \
data, fn); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -3165,8 +3078,6 @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
require_align(a->rd, s->lmul) &&
s->vstart_eq_zero) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -3181,7 +3092,6 @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fns[s->sew]);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -3195,8 +3105,6 @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
require_align(a->rd, s->lmul) &&
require_vm(a->vm, a->rd)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
data = FIELD_DP32(data, VDATA, VM, a->vm);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -3211,7 +3119,6 @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
s->cfg_ptr->vlenb,
data, fns[s->sew]);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -3624,8 +3531,6 @@ static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
gen_helper_vcompress_vm_b, gen_helper_vcompress_vm_h,
gen_helper_vcompress_vm_w, gen_helper_vcompress_vm_d,
};
- TCGLabel *over = gen_new_label();
- tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
data = FIELD_DP32(data, VDATA, VTA, s->vta);
@@ -3635,7 +3540,6 @@ static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
s->cfg_ptr->vlenb, data,
fns[s->sew]);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -3689,8 +3593,6 @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
{
uint32_t data = 0;
gen_helper_gvec_3_ptr *fn;
- TCGLabel *over = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
static gen_helper_gvec_3_ptr * const fns[6][4] = {
{
@@ -3735,7 +3637,6 @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
s->cfg_ptr->vlenb, data, fn);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
index a5cdd1b67f..6d640e4596 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -164,8 +164,6 @@ GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvkb_vx_check)
gen_helper_##NAME##_w, \
gen_helper_##NAME##_d, \
}; \
- TCGLabel *over = gen_new_label(); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
@@ -177,7 +175,6 @@ GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvkb_vx_check)
s->cfg_ptr->vlenb, s->cfg_ptr->vlenb, \
data, fns[s->sew]); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -249,14 +246,12 @@ GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
TCGv_ptr rd_v, rs2_v; \
TCGv_i32 desc, egs; \
uint32_t data = 0; \
- TCGLabel *over = gen_new_label(); \
\
if (!s->vstart_eq_zero || !s->vl_eq_vlmax) { \
/* save opcode for unwinding in case we throw an exception */ \
decode_save_opc(s); \
egs = tcg_constant_i32(EGS); \
gen_helper_egs_check(egs, tcg_env); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
} \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
@@ -272,7 +267,6 @@ GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
tcg_gen_addi_ptr(rs2_v, tcg_env, vreg_ofs(s, a->rs2)); \
gen_helper_##NAME(rd_v, rs2_v, tcg_env, desc); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -325,14 +319,12 @@ GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs, ZVKNED_EGS)
TCGv_ptr rd_v, rs2_v; \
TCGv_i32 uimm_v, desc, egs; \
uint32_t data = 0; \
- TCGLabel *over = gen_new_label(); \
\
if (!s->vstart_eq_zero || !s->vl_eq_vlmax) { \
/* save opcode for unwinding in case we throw an exception */ \
decode_save_opc(s); \
egs = tcg_constant_i32(EGS); \
gen_helper_egs_check(egs, tcg_env); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
} \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
@@ -350,7 +342,6 @@ GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs, ZVKNED_EGS)
tcg_gen_addi_ptr(rs2_v, tcg_env, vreg_ofs(s, a->rs2)); \
gen_helper_##NAME(rd_v, rs2_v, uimm_v, tcg_env, desc); \
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -394,7 +385,6 @@ GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
{ \
if (CHECK(s, a)) { \
uint32_t data = 0; \
- TCGLabel *over = gen_new_label(); \
TCGv_i32 egs; \
\
if (!s->vstart_eq_zero || !s->vl_eq_vlmax) { \
@@ -402,7 +392,6 @@ GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
decode_save_opc(s); \
egs = tcg_constant_i32(EGS); \
gen_helper_egs_check(egs, tcg_env); \
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
} \
\
data = FIELD_DP32(data, VDATA, VM, a->vm); \
@@ -417,7 +406,6 @@ GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
data, gen_helper_##NAME); \
\
mark_vs_dirty(s); \
- gen_set_label(over); \
return true; \
} \
return false; \
@@ -448,7 +436,6 @@ static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
{
if (vsha_check(s, a)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
TCGv_i32 egs;
if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {
@@ -456,7 +443,6 @@ static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
decode_save_opc(s);
egs = tcg_constant_i32(ZVKNH_EGS);
gen_helper_egs_check(egs, tcg_env);
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
}
data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -472,7 +458,6 @@ static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
gen_helper_vsha2cl32_vv : gen_helper_vsha2cl64_vv);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
@@ -482,7 +467,6 @@ static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
{
if (vsha_check(s, a)) {
uint32_t data = 0;
- TCGLabel *over = gen_new_label();
TCGv_i32 egs;
if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {
@@ -490,7 +474,6 @@ static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
decode_save_opc(s);
egs = tcg_constant_i32(ZVKNH_EGS);
gen_helper_egs_check(egs, tcg_env);
- tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
}
data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -506,7 +489,6 @@ static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
gen_helper_vsha2ch32_vv : gen_helper_vsha2ch64_vv);
mark_vs_dirty(s);
- gen_set_label(over);
return true;
}
return false;
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 08/10] trans_rvv.c.inc: remove redundant mark_vs_dirty() calls
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (6 preceding siblings ...)
2024-03-14 17:57 ` [PATCH for 9.0 v15 07/10] target/riscv: remove 'over' brconds from vector trans Daniel Henrique Barboza
@ 2024-03-14 17:57 ` Daniel Henrique Barboza
2024-03-14 17:57 ` [PATCH for 9.0 v15 09/10] target/riscv: enable 'vstart_eq_zero' in the end of insns Daniel Henrique Barboza
` (2 subsequent siblings)
10 siblings, 0 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza,
Philippe Mathieu-Daudé
trans_vmv_v_i , trans_vfmv_v_f and the trans_##NAME macro from
GEN_VMV_WHOLE_TRANS() are calling mark_vs_dirty() in both branches of
their 'ifs'. conditionals.
Call it just once in the end like other functions are doing.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
target/riscv/insn_trans/trans_rvv.c.inc | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 7931fb2f3f..401ee939b8 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2065,7 +2065,6 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
tcg_gen_gvec_dup_imm(s->sew, vreg_ofs(s, a->rd),
MAXSZ(s), MAXSZ(s), simm);
- mark_vs_dirty(s);
} else {
TCGv_i32 desc;
TCGv_i64 s1;
@@ -2083,9 +2082,8 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
s->cfg_ptr->vlenb, data));
tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
fns[s->sew](dest, s1, tcg_env, desc);
-
- mark_vs_dirty(s);
}
+ mark_vs_dirty(s);
return true;
}
return false;
@@ -2612,7 +2610,6 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
tcg_gen_gvec_dup_i64(s->sew, vreg_ofs(s, a->rd),
MAXSZ(s), MAXSZ(s), t1);
- mark_vs_dirty(s);
} else {
TCGv_ptr dest;
TCGv_i32 desc;
@@ -2635,9 +2632,8 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
fns[s->sew - 1](dest, t1, tcg_env, desc);
-
- mark_vs_dirty(s);
}
+ mark_vs_dirty(s);
return true;
}
return false;
@@ -3560,12 +3556,11 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
if (s->vstart_eq_zero) { \
tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd), \
vreg_ofs(s, a->rs2), maxsz, maxsz); \
- mark_vs_dirty(s); \
} else { \
tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
- mark_vs_dirty(s); \
} \
+ mark_vs_dirty(s); \
return true; \
} \
return false; \
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 09/10] target/riscv: enable 'vstart_eq_zero' in the end of insns
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (7 preceding siblings ...)
2024-03-14 17:57 ` [PATCH for 9.0 v15 08/10] trans_rvv.c.inc: remove redundant mark_vs_dirty() calls Daniel Henrique Barboza
@ 2024-03-14 17:57 ` Daniel Henrique Barboza
2024-03-14 17:57 ` [PATCH for 9.0 v15 10/10] target/riscv/vector_helper.c: optimize loops in ldst helpers Daniel Henrique Barboza
2024-03-20 4:55 ` [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Alistair Francis
10 siblings, 0 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Ivan Klokov,
Daniel Henrique Barboza
From: Ivan Klokov <ivan.klokov@syntacore.com>
The vstart_eq_zero flag is updated at the beginning of the translation
phase from the env->vstart variable. During the execution phase all
functions will set env->vstart = 0 after a successful execution, but the
vstart_eq_zero flag remains the same as at the start of the block. This
will wrongly cause SIGILLs in translations that requires env->vstart = 0
and might be reading vstart_eq_zero = false.
This patch adds a new finalize_rvv_inst() helper that is called at the
end of each vector instruction that will both update vstart_eq_zero and
do a mark_vs_dirty().
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1976
Signed-off-by: Ivan Klokov <ivan.klokov@syntacore.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
---
target/riscv/insn_trans/trans_rvbf16.c.inc | 6 +-
target/riscv/insn_trans/trans_rvv.c.inc | 83 ++++++++++++----------
target/riscv/insn_trans/trans_rvvk.c.inc | 12 ++--
target/riscv/translate.c | 6 ++
4 files changed, 59 insertions(+), 48 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc b/target/riscv/insn_trans/trans_rvbf16.c.inc
index a842e76a6b..0a9cd1ec31 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -83,7 +83,7 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, arg_vfncvtbf16_f_f_w *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
- mark_vs_dirty(ctx);
+ finalize_rvv_inst(ctx);
return true;
}
return false;
@@ -108,7 +108,7 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, arg_vfwcvtbf16_f_f_v *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
- mark_vs_dirty(ctx);
+ finalize_rvv_inst(ctx);
return true;
}
return false;
@@ -135,7 +135,7 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, arg_vfwmaccbf16_vv *a)
ctx->cfg_ptr->vlenb,
ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
- mark_vs_dirty(ctx);
+ finalize_rvv_inst(ctx);
return true;
}
return false;
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index 401ee939b8..7d84e7d812 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -167,7 +167,7 @@ static bool do_vsetvl(DisasContext *s, int rd, int rs1, TCGv s2)
gen_helper_vsetvl(dst, tcg_env, s1, s2);
gen_set_gpr(s, rd, dst);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
gen_update_pc(s, s->cur_insn_len);
lookup_and_goto_ptr(s);
@@ -187,7 +187,7 @@ static bool do_vsetivli(DisasContext *s, int rd, TCGv s1, TCGv s2)
gen_helper_vsetvl(dst, tcg_env, s1, s2);
gen_set_gpr(s, rd, dst);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
gen_update_pc(s, s->cur_insn_len);
lookup_and_goto_ptr(s);
s->base.is_jmp = DISAS_NORETURN;
@@ -657,6 +657,7 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
}
+ finalize_rvv_inst(s);
return true;
}
@@ -812,6 +813,7 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
fn(dest, mask, base, stride, tcg_env, desc);
+ finalize_rvv_inst(s);
return true;
}
@@ -913,6 +915,7 @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
fn(dest, mask, base, index, tcg_env, desc);
+ finalize_rvv_inst(s);
return true;
}
@@ -1043,7 +1046,7 @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
fn(dest, mask, base, tcg_env, desc);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
@@ -1100,6 +1103,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
fn(dest, base, tcg_env, desc);
+ finalize_rvv_inst(s);
return true;
}
@@ -1189,7 +1193,7 @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
}
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
@@ -1240,7 +1244,7 @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
fn(dest, mask, src1, src2, tcg_env, desc);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
@@ -1265,7 +1269,7 @@ do_opivx_gvec(DisasContext *s, arg_rmrr *a, GVecGen2sFn *gvec_fn,
gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
src1, MAXSZ(s), MAXSZ(s));
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return opivx_trans(a->rd, a->rs1, a->rs2, a->vm, fn, s);
@@ -1398,7 +1402,7 @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
fn(dest, mask, src1, src2, tcg_env, desc);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
@@ -1412,7 +1416,7 @@ do_opivi_gvec(DisasContext *s, arg_rmrr *a, GVecGen2iFn *gvec_fn,
if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
extract_imm(s, a->rs1, imm_mode), MAXSZ(s), MAXSZ(s));
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return opivi_trans(a->rd, a->rs1, a->rs2, a->vm, fn, s, imm_mode);
@@ -1471,7 +1475,7 @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb,
data, fn);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -1543,7 +1547,7 @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
vreg_ofs(s, a->rs2),
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -1611,7 +1615,7 @@ static bool opivv_trans(uint32_t vd, uint32_t vs1, uint32_t vs2, uint32_t vm,
tcg_gen_gvec_4_ptr(vreg_ofs(s, vd), vreg_ofs(s, 0), vreg_ofs(s, vs1),
vreg_ofs(s, vs2), tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
@@ -1744,7 +1748,7 @@ do_opivx_gvec_shift(DisasContext *s, arg_rmrr *a, GVecGen2sFn32 *gvec_fn,
gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
src1, MAXSZ(s), MAXSZ(s));
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return opivx_trans(a->rd, a->rs1, a->rs2, a->vm, fn, s);
@@ -1801,7 +1805,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2004,7 +2008,7 @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
s->cfg_ptr->vlenb, data,
fns[s->sew]);
}
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -2049,7 +2053,7 @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
fns[s->sew](dest, s1_i64, tcg_env, desc);
}
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -2083,7 +2087,7 @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, a->rd));
fns[s->sew](dest, s1, tcg_env, desc);
}
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -2231,7 +2235,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2265,7 +2269,7 @@ static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
fn(dest, mask, t1, src2, tcg_env, desc);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
@@ -2340,7 +2344,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2411,7 +2415,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2523,7 +2527,7 @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
vreg_ofs(s, a->rs2), tcg_env,
s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -2633,7 +2637,7 @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
fns[s->sew - 1](dest, t1, tcg_env, desc);
}
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -2705,7 +2709,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2753,7 +2757,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2817,7 +2821,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew - 1]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2863,7 +2867,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, \
fns[s->sew]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -2949,7 +2953,7 @@ static bool trans_##NAME(DisasContext *s, arg_r *a) \
vreg_ofs(s, a->rs2), tcg_env, \
s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, data, fn); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -3048,7 +3052,7 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a) \
tcg_env, s->cfg_ptr->vlenb, \
s->cfg_ptr->vlenb, \
data, fn); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -3087,7 +3091,7 @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
vreg_ofs(s, a->rs2), tcg_env,
s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fns[s->sew]);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3114,7 +3118,7 @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb,
data, fns[s->sew]);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3271,7 +3275,7 @@ static bool trans_vmv_x_s(DisasContext *s, arg_vmv_x_s *a)
tcg_gen_trunc_i64_tl(dest, t1);
gen_set_gpr(s, a->rd, dest);
tcg_gen_movi_tl(cpu_vstart, 0);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3300,7 +3304,7 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
vec_element_storei(s, a->rd, 0, t1);
gen_set_label(over);
tcg_gen_movi_tl(cpu_vstart, 0);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3328,7 +3332,7 @@ static bool trans_vfmv_f_s(DisasContext *s, arg_vfmv_f_s *a)
mark_fs_dirty(s);
tcg_gen_movi_tl(cpu_vstart, 0);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3354,9 +3358,10 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
do_nanbox(s, t1, cpu_fpr[a->rs1]);
vec_element_storei(s, a->rd, 0, t1);
+
gen_set_label(over);
tcg_gen_movi_tl(cpu_vstart, 0);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3462,7 +3467,7 @@ static bool trans_vrgather_vx(DisasContext *s, arg_rmrr *a)
tcg_gen_gvec_dup_i64(s->sew, vreg_ofs(s, a->rd),
MAXSZ(s), MAXSZ(s), dest);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
} else {
static gen_helper_opivx * const fns[4] = {
gen_helper_vrgather_vx_b, gen_helper_vrgather_vx_h,
@@ -3490,7 +3495,7 @@ static bool trans_vrgather_vi(DisasContext *s, arg_rmrr *a)
endian_ofs(s, a->rs2, a->rs1),
MAXSZ(s), MAXSZ(s));
}
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
} else {
static gen_helper_opivx * const fns[4] = {
gen_helper_vrgather_vx_b, gen_helper_vrgather_vx_h,
@@ -3535,7 +3540,7 @@ static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
tcg_env, s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data,
fns[s->sew]);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -3560,7 +3565,7 @@ static bool trans_##NAME(DisasContext *s, arg_##NAME * a) \
tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2), \
tcg_env, maxsz, maxsz, 0, gen_helper_vmvr_v); \
} \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -3631,7 +3636,7 @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data, fn);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
index 6d640e4596..ae1f40174a 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -174,7 +174,7 @@ GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvkb_vx_check)
vreg_ofs(s, a->rs2), tcg_env, \
s->cfg_ptr->vlenb, s->cfg_ptr->vlenb, \
data, fns[s->sew]); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -266,7 +266,7 @@ GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
tcg_gen_addi_ptr(rd_v, tcg_env, vreg_ofs(s, a->rd)); \
tcg_gen_addi_ptr(rs2_v, tcg_env, vreg_ofs(s, a->rs2)); \
gen_helper_##NAME(rd_v, rs2_v, tcg_env, desc); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -341,7 +341,7 @@ GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs, ZVKNED_EGS)
tcg_gen_addi_ptr(rd_v, tcg_env, vreg_ofs(s, a->rd)); \
tcg_gen_addi_ptr(rs2_v, tcg_env, vreg_ofs(s, a->rs2)); \
gen_helper_##NAME(rd_v, rs2_v, uimm_v, tcg_env, desc); \
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -405,7 +405,7 @@ GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
s->cfg_ptr->vlenb, s->cfg_ptr->vlenb, \
data, gen_helper_##NAME); \
\
- mark_vs_dirty(s); \
+ finalize_rvv_inst(s); \
return true; \
} \
return false; \
@@ -457,7 +457,7 @@ static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
s->sew == MO_32 ?
gen_helper_vsha2cl32_vv : gen_helper_vsha2cl64_vv);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
@@ -488,7 +488,7 @@ static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
s->sew == MO_32 ?
gen_helper_vsha2ch32_vv : gen_helper_vsha2ch64_vv);
- mark_vs_dirty(s);
+ finalize_rvv_inst(s);
return true;
}
return false;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index ea5d52b2ef..9d57089fcc 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -676,6 +676,12 @@ static void mark_vs_dirty(DisasContext *ctx)
static inline void mark_vs_dirty(DisasContext *ctx) { }
#endif
+static void finalize_rvv_inst(DisasContext *ctx)
+{
+ mark_vs_dirty(ctx);
+ ctx->vstart_eq_zero = true;
+}
+
static void gen_set_rm(DisasContext *ctx, int rm)
{
if (ctx->frm == rm) {
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH for 9.0 v15 10/10] target/riscv/vector_helper.c: optimize loops in ldst helpers
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (8 preceding siblings ...)
2024-03-14 17:57 ` [PATCH for 9.0 v15 09/10] target/riscv: enable 'vstart_eq_zero' in the end of insns Daniel Henrique Barboza
@ 2024-03-14 17:57 ` Daniel Henrique Barboza
2024-03-20 4:55 ` [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Alistair Francis
10 siblings, 0 replies; 24+ messages in thread
From: Daniel Henrique Barboza @ 2024-03-14 17:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liwei1518, zhiwei_liu,
palmer, max.chou, richard.henderson, Daniel Henrique Barboza
Change the for loops in ldst helpers to do a single increment in the
counter, and assign it env->vstart, to avoid re-reading from vstart
every time.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/riscv/vector_helper.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 63a1083f03..fa139040f8 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -209,7 +209,7 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
VSTART_CHECK_EARLY_EXIT(env);
- for (i = env->vstart; i < env->vl; i++, env->vstart++) {
+ for (i = env->vstart; i < env->vl; env->vstart = ++i) {
k = 0;
while (k < nf) {
if (!vm && !vext_elem_mask(v0, i)) {
@@ -277,7 +277,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
VSTART_CHECK_EARLY_EXIT(env);
/* load bytes from guest memory */
- for (i = env->vstart; i < evl; i++, env->vstart++) {
+ for (i = env->vstart; i < evl; env->vstart = ++i) {
k = 0;
while (k < nf) {
target_ulong addr = base + ((i * nf + k) << log2_esz);
@@ -393,7 +393,7 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
VSTART_CHECK_EARLY_EXIT(env);
/* load bytes from guest memory */
- for (i = env->vstart; i < env->vl; i++, env->vstart++) {
+ for (i = env->vstart; i < env->vl; env->vstart = ++i) {
k = 0;
while (k < nf) {
if (!vm && !vext_elem_mask(v0, i)) {
--
2.44.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH for 9.0 v15 00/10] target/riscv: vector fixes
2024-03-14 17:56 [PATCH for 9.0 v15 00/10] target/riscv: vector fixes Daniel Henrique Barboza
` (9 preceding siblings ...)
2024-03-14 17:57 ` [PATCH for 9.0 v15 10/10] target/riscv/vector_helper.c: optimize loops in ldst helpers Daniel Henrique Barboza
@ 2024-03-20 4:55 ` Alistair Francis
10 siblings, 0 replies; 24+ messages in thread
From: Alistair Francis @ 2024-03-20 4:55 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liwei1518,
zhiwei_liu, palmer, max.chou, richard.henderson
On Fri, Mar 15, 2024 at 3:59 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> Hi,
>
> The series was renamed to reflect that at this point we're fixing more
> things than just vstart management.
>
> In this new version a couple fixes were added:
>
> - patch 3 (new) fixes the memcpy endianess in 'vmvr_v', as suggested by
> Richard;
>
> - patch 5 (new) fixes ldst_whole insns to now clear vstart in all cases.
> The fix was proposed by Max.
>
> Another notable change was made in patch 6 (patch 4 from v14). We're not
> doing early exits in helpers that are gated by vstart_eq_zero. This was
> found to cause side-effects with insns that wants to send faults if vl =
> 0, and for the rest it becomes a moot check since vstart is granted to
> be zero beforehand.
>
> Series based on master.
>
> Patches missing acks: 3, 4, 5
>
> Changes from v14:
> - patch 3 (new):
> - make 'vmvr_v' big endian compliant
> - patch 5 (new):
> - make ldst_whole insns clear vstart in all code paths
> - patch 6 (patch 4 from v14):
> - do not add early exits on helpers that are gated with vstart_eq_zero
> - v14 link: https://lore.kernel.org/qemu-riscv/20240313220141.427730-1-dbarboza@ventanamicro.com/
>
> Daniel Henrique Barboza (9):
> target/riscv/vector_helper.c: set vstart = 0 in GEN_VEXT_VSLIDEUP_VX()
> trans_rvv.c.inc: set vstart = 0 in int scalar move insns
> target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
> target/riscv: always clear vstart in whole vec move insns
> target/riscv: always clear vstart for ldst_whole insns
> target/riscv/vector_helpers: do early exit when vstart >= vl
> target/riscv: remove 'over' brconds from vector trans
> trans_rvv.c.inc: remove redundant mark_vs_dirty() calls
> target/riscv/vector_helper.c: optimize loops in ldst helpers
>
> Ivan Klokov (1):
> target/riscv: enable 'vstart_eq_zero' in the end of insns
Thanks!
Applied to riscv-to-apply.next
Alistair
>
> target/riscv/insn_trans/trans_rvbf16.c.inc | 18 +-
> target/riscv/insn_trans/trans_rvv.c.inc | 244 ++++++---------------
> target/riscv/insn_trans/trans_rvvk.c.inc | 30 +--
> target/riscv/translate.c | 6 +
> target/riscv/vcrypto_helper.c | 32 +++
> target/riscv/vector_helper.c | 93 +++++++-
> target/riscv/vector_internals.c | 4 +
> target/riscv/vector_internals.h | 9 +
> 8 files changed, 220 insertions(+), 216 deletions(-)
>
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread