* [PATCH 0/2] target/riscv: RVV 1-fill tail element changes
@ 2023-04-27 20:57 Daniel Henrique Barboza
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-27 20:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liweiwei, zhiwei_liu, palmer,
Daniel Henrique Barboza
Hi,
This series makes changes in vext_set_tail_elements_1s() to be a little
nicer to the emulation.
First patch makes the function a no-op when vta == 0. Aside from the
logic simplification we also have a little performance boost.
Second patch makes the function debug only. The logic is explained in
the commit message, but long story short: we don't have to implement any
tail-agnostic policy at all to be spec compliant, but this function has
its uses for debug purposes, so keeping it as a debug option allow users
to disable it on demand.
Patches are based on top of Alistair's riscv-to-apply.next.
Daniel Henrique Barboza (2):
target/riscv/vector_helper.c: skip set tail when vta is zero
target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
target/riscv/vector_helper.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--
2.40.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
@ 2023-04-27 20:57 ` Daniel Henrique Barboza
2023-04-28 1:08 ` Weiwei Li
` (2 more replies)
2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
` (2 subsequent siblings)
3 siblings, 3 replies; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-27 20:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liweiwei, zhiwei_liu, palmer,
Daniel Henrique Barboza
The function is a no-op if 'vta' is zero but we're still doing a lot of
stuff in this function regardless. vext_set_elems_1s() will ignore every
single time (since vta is zero) and we just wasted time.
Skip it altogether in this case. Aside from the code simplification
there's a noticeable emulation performance gain by doing it. For a
regular C binary that does a vectors operation like this:
=======
#define SZ 10000000
int main ()
{
int *a = malloc (SZ * sizeof (int));
int *b = malloc (SZ * sizeof (int));
int *c = malloc (SZ * sizeof (int));
for (int i = 0; i < SZ; i++)
c[i] = a[i] + b[i];
return c[SZ - 1];
}
=======
Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
$ time ~/work/qemu/build/qemu-riscv64 \
-cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
real 0m0.303s
user 0m0.281s
sys 0m0.023s
With this skip we take ~0.275 sec:
$ time ~/work/qemu/build/qemu-riscv64 \
-cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
real 0m0.274s
user 0m0.252s
sys 0m0.019s
This performance gain adds up fast when executing heavy benchmarks like
SPEC.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
target/riscv/vector_helper.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f4d0438988..8e6c99e573 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
void *vd, uint32_t desc, uint32_t nf,
uint32_t esz, uint32_t max_elems)
{
- uint32_t total_elems = vext_get_total_elems(env, desc, esz);
- uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+ uint32_t total_elems, vlenb, registers_used;
uint32_t vta = vext_vta(desc);
- uint32_t registers_used;
int k;
+ if (vta == 0) {
+ return;
+ }
+
+ total_elems = vext_get_total_elems(env, desc, esz);
+ vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+
for (k = 0; k < nf; ++k) {
vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
(k * max_elems + max_elems) * esz);
--
2.40.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
@ 2023-04-27 20:57 ` Daniel Henrique Barboza
2023-04-28 1:22 ` Weiwei Li
2023-04-27 21:15 ` [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Palmer Dabbelt
2023-04-28 9:30 ` Dickon Hood
3 siblings, 1 reply; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-27 20:57 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, liweiwei, zhiwei_liu, palmer,
Daniel Henrique Barboza
Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
vta fields in the vtype register, while also defining that QEMU doesn't
need to have a tail agnostic policy to be compliant with the RVV spec.
It ended up removing all tail handling code as well. Later, commit
752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
instructions") reintroduced the tail agnostic fill for vector load/store
instructions only.
This puts QEMU in a situation where some functions are 1-filling the
tail elements and others don't. This is still a valid implementation,
but the process of 1-filling the tail elements takes valuable emulation
time that can be used doing anything else. If the spec doesn't demand a
specific tail-agostic policy, a proper software wouldn't expect any
policy to be in place. This means that, more often than not, the work
we're doing by 1-filling tail elements is wasted. We would be better of
if vext_set_tail_elems_1s() is removed entirely from the code.
All this said, there's still a debug value associated with it. So,
instead of removing it, let's gate it with cpu->cfg.debug. This way
software can enable this code if desirable, but for the regular case we
shouldn't waste time with it.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
target/riscv/vector_helper.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8e6c99e573..e0a292ac24 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
uint32_t vta = vext_vta(desc);
int k;
- if (vta == 0) {
+ if (vta == 0 || !riscv_cpu_cfg(env)->debug) {
return;
}
--
2.40.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] target/riscv: RVV 1-fill tail element changes
2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
@ 2023-04-27 21:15 ` Palmer Dabbelt
2023-04-28 9:30 ` Dickon Hood
3 siblings, 0 replies; 10+ messages in thread
From: Palmer Dabbelt @ 2023-04-27 21:15 UTC (permalink / raw)
To: dbarboza
Cc: qemu-devel, qemu-riscv, Alistair Francis, bmeng, liweiwei,
zhiwei_liu, dbarboza
On Thu, 27 Apr 2023 13:57:06 PDT (-0700), dbarboza@ventanamicro.com wrote:
> Hi,
>
> This series makes changes in vext_set_tail_elements_1s() to be a little
> nicer to the emulation.
>
> First patch makes the function a no-op when vta == 0. Aside from the
> logic simplification we also have a little performance boost.
>
> Second patch makes the function debug only. The logic is explained in
> the commit message, but long story short: we don't have to implement any
> tail-agnostic policy at all to be spec compliant, but this function has
> its uses for debug purposes, so keeping it as a debug option allow users
> to disable it on demand.
>
> Patches are based on top of Alistair's riscv-to-apply.next.
>
> Daniel Henrique Barboza (2):
> target/riscv/vector_helper.c: skip set tail when vta is zero
> target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
>
> target/riscv/vector_helper.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Though this made me think: it'd be nice to have some sort of
"aggressively do odd things for VTA/VMA" mode in QEMU, as that could
help shake out bugs in software.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
@ 2023-04-28 1:08 ` Weiwei Li
2023-05-07 23:26 ` Alistair Francis
2023-05-07 23:30 ` Alistair Francis
2 siblings, 0 replies; 10+ messages in thread
From: Weiwei Li @ 2023-04-28 1:08 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: liweiwei, qemu-riscv, alistair.francis, bmeng, zhiwei_liu, palmer
On 2023/4/28 04:57, Daniel Henrique Barboza wrote:
> The function is a no-op if 'vta' is zero but we're still doing a lot of
> stuff in this function regardless. vext_set_elems_1s() will ignore every
> single time (since vta is zero) and we just wasted time.
>
> Skip it altogether in this case. Aside from the code simplification
> there's a noticeable emulation performance gain by doing it. For a
> regular C binary that does a vectors operation like this:
>
> =======
> #define SZ 10000000
>
> int main ()
> {
> int *a = malloc (SZ * sizeof (int));
> int *b = malloc (SZ * sizeof (int));
> int *c = malloc (SZ * sizeof (int));
>
> for (int i = 0; i < SZ; i++)
> c[i] = a[i] + b[i];
> return c[SZ - 1];
> }
> =======
>
> Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
> -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real 0m0.303s
> user 0m0.281s
> sys 0m0.023s
>
> With this skip we take ~0.275 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
> -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real 0m0.274s
> user 0m0.252s
> sys 0m0.019s
>
> This performance gain adds up fast when executing heavy benchmarks like
> SPEC.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Weiwei Li
> target/riscv/vector_helper.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index f4d0438988..8e6c99e573 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
> void *vd, uint32_t desc, uint32_t nf,
> uint32_t esz, uint32_t max_elems)
> {
> - uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> - uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> + uint32_t total_elems, vlenb, registers_used;
> uint32_t vta = vext_vta(desc);
> - uint32_t registers_used;
> int k;
>
> + if (vta == 0) {
> + return;
> + }
> +
> + total_elems = vext_get_total_elems(env, desc, esz);
> + vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +
> for (k = 0; k < nf; ++k) {
> vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
> (k * max_elems + max_elems) * esz);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
@ 2023-04-28 1:22 ` Weiwei Li
2023-04-28 9:16 ` Daniel Henrique Barboza
0 siblings, 1 reply; 10+ messages in thread
From: Weiwei Li @ 2023-04-28 1:22 UTC (permalink / raw)
To: Daniel Henrique Barboza, qemu-devel
Cc: liweiwei, qemu-riscv, alistair.francis, bmeng, zhiwei_liu, palmer
On 2023/4/28 04:57, Daniel Henrique Barboza wrote:
> Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
> vta fields in the vtype register, while also defining that QEMU doesn't
> need to have a tail agnostic policy to be compliant with the RVV spec.
> It ended up removing all tail handling code as well. Later, commit
> 752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
> instructions") reintroduced the tail agnostic fill for vector load/store
> instructions only.
>
> This puts QEMU in a situation where some functions are 1-filling the
> tail elements and others don't. This is still a valid implementation,
> but the process of 1-filling the tail elements takes valuable emulation
> time that can be used doing anything else. If the spec doesn't demand a
> specific tail-agostic policy, a proper software wouldn't expect any
> policy to be in place. This means that, more often than not, the work
> we're doing by 1-filling tail elements is wasted. We would be better of
> if vext_set_tail_elems_1s() is removed entirely from the code.
>
> All this said, there's still a debug value associated with it. So,
> instead of removing it, let's gate it with cpu->cfg.debug. This way
> software can enable this code if desirable, but for the regular case we
> shouldn't waste time with it.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
> target/riscv/vector_helper.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 8e6c99e573..e0a292ac24 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
> uint32_t vta = vext_vta(desc);
> int k;
>
> - if (vta == 0) {
> + if (vta == 0 || !riscv_cpu_cfg(env)->debug) {
I think this is not correct. 'debug' property is used for debug spec.
And this feature is controlled by another property 'rvv_ta_all_1s' .
By the way, cfg.rvv_ta_all_1s have been ANDed intovta value. So
additional check on it is also unnecessary here.
Regards,
Weiwei Li
> return;
> }
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
2023-04-28 1:22 ` Weiwei Li
@ 2023-04-28 9:16 ` Daniel Henrique Barboza
0 siblings, 0 replies; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-28 9:16 UTC (permalink / raw)
To: Weiwei Li, qemu-devel
Cc: qemu-riscv, alistair.francis, bmeng, zhiwei_liu, palmer
On 4/27/23 22:22, Weiwei Li wrote:
>
> On 2023/4/28 04:57, Daniel Henrique Barboza wrote:
>> Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
>> vta fields in the vtype register, while also defining that QEMU doesn't
>> need to have a tail agnostic policy to be compliant with the RVV spec.
>> It ended up removing all tail handling code as well. Later, commit
>> 752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
>> instructions") reintroduced the tail agnostic fill for vector load/store
>> instructions only.
>>
>> This puts QEMU in a situation where some functions are 1-filling the
>> tail elements and others don't. This is still a valid implementation,
>> but the process of 1-filling the tail elements takes valuable emulation
>> time that can be used doing anything else. If the spec doesn't demand a
>> specific tail-agostic policy, a proper software wouldn't expect any
>> policy to be in place. This means that, more often than not, the work
>> we're doing by 1-filling tail elements is wasted. We would be better of
>> if vext_set_tail_elems_1s() is removed entirely from the code.
>>
>> All this said, there's still a debug value associated with it. So,
>> instead of removing it, let's gate it with cpu->cfg.debug. This way
>> software can enable this code if desirable, but for the regular case we
>> shouldn't waste time with it.
>>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>> target/riscv/vector_helper.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 8e6c99e573..e0a292ac24 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
>> uint32_t vta = vext_vta(desc);
>> int k;
>> - if (vta == 0) {
>> + if (vta == 0 || !riscv_cpu_cfg(env)->debug) {
>
> I think this is not correct. 'debug' property is used for debug spec. And this feature is controlled by another property 'rvv_ta_all_1s' .
You're right. I wasn't aware that this flag exists:
$ git grep 'rvv_ta_all_1s'
target/riscv/cpu.c: DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
target/riscv/cpu.h: bool rvv_ta_all_1s;
target/riscv/translate.c: ctx->vta = FIELD_EX32(tb_flags, TB_FLAGS, VTA) && cpu->cfg.rvv_ta_all_1s;
target/riscv/translate.c: ctx->cfg_vta_all_1s = cpu->cfg.rvv_ta_all_1s;
>
> By the way, cfg.rvv_ta_all_1s have been ANDed intovta value. So additional check on it is also unnecessary here.
Yes. We can drop this patch then since 'vta' is already accounting for ta_all_1s.
Thanks,
Daniel
>
> Regards,
>
> Weiwei Li
>
>> return;
>> }
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] target/riscv: RVV 1-fill tail element changes
2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
` (2 preceding siblings ...)
2023-04-27 21:15 ` [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Palmer Dabbelt
@ 2023-04-28 9:30 ` Dickon Hood
3 siblings, 0 replies; 10+ messages in thread
From: Dickon Hood @ 2023-04-28 9:30 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liweiwei,
zhiwei_liu, palmer
On Thu, Apr 27, 2023 at 17:57:06 -0300, Daniel Henrique Barboza wrote:
: Second patch makes the function debug only. The logic is explained in
: the commit message, but long story short: we don't have to implement any
: tail-agnostic policy at all to be spec compliant, but this function has
: its uses for debug purposes, so keeping it as a debug option allow users
: to disable it on demand.
Yes, please don't remove this functionality completely -- our internal
stress test tool for the vector crypto patches relies on it.
--
Dickon Hood Senior Software Engineer
Codethink Ltd.
3rd Floor, Dale House, 35 Dale Street,
https://www.codethink.co.uk/ Manchester, M1 2HF, United Kingdom.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
2023-04-28 1:08 ` Weiwei Li
@ 2023-05-07 23:26 ` Alistair Francis
2023-05-07 23:30 ` Alistair Francis
2 siblings, 0 replies; 10+ messages in thread
From: Alistair Francis @ 2023-05-07 23:26 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liweiwei,
zhiwei_liu, palmer
On Fri, Apr 28, 2023 at 6:58 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> The function is a no-op if 'vta' is zero but we're still doing a lot of
> stuff in this function regardless. vext_set_elems_1s() will ignore every
> single time (since vta is zero) and we just wasted time.
>
> Skip it altogether in this case. Aside from the code simplification
> there's a noticeable emulation performance gain by doing it. For a
> regular C binary that does a vectors operation like this:
>
> =======
> #define SZ 10000000
>
> int main ()
> {
> int *a = malloc (SZ * sizeof (int));
> int *b = malloc (SZ * sizeof (int));
> int *c = malloc (SZ * sizeof (int));
>
> for (int i = 0; i < SZ; i++)
> c[i] = a[i] + b[i];
> return c[SZ - 1];
> }
> =======
>
> Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
> -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real 0m0.303s
> user 0m0.281s
> sys 0m0.023s
>
> With this skip we take ~0.275 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
> -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real 0m0.274s
> user 0m0.252s
> sys 0m0.019s
>
> This performance gain adds up fast when executing heavy benchmarks like
> SPEC.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> target/riscv/vector_helper.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index f4d0438988..8e6c99e573 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
> void *vd, uint32_t desc, uint32_t nf,
> uint32_t esz, uint32_t max_elems)
> {
> - uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> - uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> + uint32_t total_elems, vlenb, registers_used;
> uint32_t vta = vext_vta(desc);
> - uint32_t registers_used;
> int k;
>
> + if (vta == 0) {
> + return;
> + }
> +
> + total_elems = vext_get_total_elems(env, desc, esz);
> + vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +
> for (k = 0; k < nf; ++k) {
> vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
> (k * max_elems + max_elems) * esz);
> --
> 2.40.0
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
2023-04-28 1:08 ` Weiwei Li
2023-05-07 23:26 ` Alistair Francis
@ 2023-05-07 23:30 ` Alistair Francis
2 siblings, 0 replies; 10+ messages in thread
From: Alistair Francis @ 2023-05-07 23:30 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liweiwei,
zhiwei_liu, palmer
On Fri, Apr 28, 2023 at 6:58 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> The function is a no-op if 'vta' is zero but we're still doing a lot of
> stuff in this function regardless. vext_set_elems_1s() will ignore every
> single time (since vta is zero) and we just wasted time.
>
> Skip it altogether in this case. Aside from the code simplification
> there's a noticeable emulation performance gain by doing it. For a
> regular C binary that does a vectors operation like this:
>
> =======
> #define SZ 10000000
>
> int main ()
> {
> int *a = malloc (SZ * sizeof (int));
> int *b = malloc (SZ * sizeof (int));
> int *c = malloc (SZ * sizeof (int));
>
> for (int i = 0; i < SZ; i++)
> c[i] = a[i] + b[i];
> return c[SZ - 1];
> }
> =======
>
> Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
> -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real 0m0.303s
> user 0m0.281s
> sys 0m0.023s
>
> With this skip we take ~0.275 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
> -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real 0m0.274s
> user 0m0.252s
> sys 0m0.019s
>
> This performance gain adds up fast when executing heavy benchmarks like
> SPEC.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Thanks!
Applied to riscv-to-apply.next
Alistair
> ---
> target/riscv/vector_helper.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index f4d0438988..8e6c99e573 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
> void *vd, uint32_t desc, uint32_t nf,
> uint32_t esz, uint32_t max_elems)
> {
> - uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> - uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> + uint32_t total_elems, vlenb, registers_used;
> uint32_t vta = vext_vta(desc);
> - uint32_t registers_used;
> int k;
>
> + if (vta == 0) {
> + return;
> + }
> +
> + total_elems = vext_get_total_elems(env, desc, esz);
> + vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +
> for (k = 0; k < nf; ++k) {
> vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
> (k * max_elems + max_elems) * esz);
> --
> 2.40.0
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-05-07 23:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
2023-04-28 1:08 ` Weiwei Li
2023-05-07 23:26 ` Alistair Francis
2023-05-07 23:30 ` Alistair Francis
2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
2023-04-28 1:22 ` Weiwei Li
2023-04-28 9:16 ` Daniel Henrique Barboza
2023-04-27 21:15 ` [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Palmer Dabbelt
2023-04-28 9:30 ` Dickon Hood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).