qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] target/riscv: RVV 1-fill tail element changes
@ 2023-04-27 20:57 Daniel Henrique Barboza
  2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-27 20:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liweiwei, zhiwei_liu, palmer,
	Daniel Henrique Barboza

Hi,

This series makes changes in vext_set_tail_elements_1s() to be a little
nicer to the emulation.

First patch makes the function a no-op when vta == 0. Aside from the
logic simplification we also have a little performance boost.

Second patch makes the function debug only. The logic is explained in
the commit message, but long story short: we don't have to implement any
tail-agnostic policy at all to be spec compliant, but this function has
its uses for debug purposes, so keeping it as a debug option allow users
to disable it on demand.

Patches are based on top of Alistair's riscv-to-apply.next.
 
Daniel Henrique Barboza (2):
  target/riscv/vector_helper.c: skip set tail when vta is zero
  target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only

 target/riscv/vector_helper.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

-- 
2.40.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
  2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
@ 2023-04-27 20:57 ` Daniel Henrique Barboza
  2023-04-28  1:08   ` Weiwei Li
                     ` (2 more replies)
  2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-27 20:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liweiwei, zhiwei_liu, palmer,
	Daniel Henrique Barboza

The function is a no-op if 'vta' is zero but we're still doing a lot of
stuff in this function regardless. vext_set_elems_1s() will ignore every
single time (since vta is zero) and we just wasted time.

Skip it altogether in this case. Aside from the code simplification
there's a noticeable emulation performance gain by doing it. For a
regular C binary that does a vectors operation like this:

=======
 #define SZ 10000000

int main ()
{
  int *a = malloc (SZ * sizeof (int));
  int *b = malloc (SZ * sizeof (int));
  int *c = malloc (SZ * sizeof (int));

  for (int i = 0; i < SZ; i++)
    c[i] = a[i] + b[i];
  return c[SZ - 1];
}
=======

Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
    -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real    0m0.303s
user    0m0.281s
sys     0m0.023s

With this skip we take ~0.275 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
    -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real    0m0.274s
user    0m0.252s
sys     0m0.019s

This performance gain adds up fast when executing heavy benchmarks like
SPEC.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 target/riscv/vector_helper.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f4d0438988..8e6c99e573 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
                                    void *vd, uint32_t desc, uint32_t nf,
                                    uint32_t esz, uint32_t max_elems)
 {
-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-    uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+    uint32_t total_elems, vlenb, registers_used;
     uint32_t vta = vext_vta(desc);
-    uint32_t registers_used;
     int k;
 
+    if (vta == 0) {
+        return;
+    }
+
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+
     for (k = 0; k < nf; ++k) {
         vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
                           (k * max_elems + max_elems) * esz);
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
  2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
  2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
@ 2023-04-27 20:57 ` Daniel Henrique Barboza
  2023-04-28  1:22   ` Weiwei Li
  2023-04-27 21:15 ` [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Palmer Dabbelt
  2023-04-28  9:30 ` Dickon Hood
  3 siblings, 1 reply; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-27 20:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, liweiwei, zhiwei_liu, palmer,
	Daniel Henrique Barboza

Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
vta fields in the vtype register, while also defining that QEMU doesn't
need to have a tail agnostic policy to be compliant with the RVV spec.
It ended up removing all tail handling code as well. Later, commit
752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
instructions") reintroduced the tail agnostic fill for vector load/store
instructions only.

This puts QEMU in a situation where some functions are 1-filling the
tail elements and others don't. This is still a valid implementation,
but the process of 1-filling the tail elements takes valuable emulation
time that can be used doing anything else. If the spec doesn't demand a
specific tail-agostic policy, a proper software wouldn't expect any
policy to be in place. This means that, more often than not, the work
we're doing by 1-filling tail elements is wasted. We would be better of
if vext_set_tail_elems_1s() is removed entirely from the code.

All this said, there's still a debug value associated with it. So,
instead of removing it, let's gate it with cpu->cfg.debug. This way
software can enable this code if desirable, but for the regular case we
shouldn't waste time with it.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 target/riscv/vector_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8e6c99e573..e0a292ac24 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
     uint32_t vta = vext_vta(desc);
     int k;
 
-    if (vta == 0) {
+    if (vta == 0 || !riscv_cpu_cfg(env)->debug)  {
         return;
     }
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] target/riscv: RVV 1-fill tail element changes
  2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
  2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
  2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
@ 2023-04-27 21:15 ` Palmer Dabbelt
  2023-04-28  9:30 ` Dickon Hood
  3 siblings, 0 replies; 10+ messages in thread
From: Palmer Dabbelt @ 2023-04-27 21:15 UTC (permalink / raw)
  To: dbarboza
  Cc: qemu-devel, qemu-riscv, Alistair Francis, bmeng, liweiwei,
	zhiwei_liu, dbarboza

On Thu, 27 Apr 2023 13:57:06 PDT (-0700), dbarboza@ventanamicro.com wrote:
> Hi,
>
> This series makes changes in vext_set_tail_elements_1s() to be a little
> nicer to the emulation.
>
> First patch makes the function a no-op when vta == 0. Aside from the
> logic simplification we also have a little performance boost.
>
> Second patch makes the function debug only. The logic is explained in
> the commit message, but long story short: we don't have to implement any
> tail-agnostic policy at all to be spec compliant, but this function has
> its uses for debug purposes, so keeping it as a debug option allow users
> to disable it on demand.
>
> Patches are based on top of Alistair's riscv-to-apply.next.
>
> Daniel Henrique Barboza (2):
>   target/riscv/vector_helper.c: skip set tail when vta is zero
>   target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
>
>  target/riscv/vector_helper.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

Though this made me think: it'd be nice to have some sort of 
"aggressively do odd things for VTA/VMA" mode in QEMU, as that could 
help shake out bugs in software.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
  2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
@ 2023-04-28  1:08   ` Weiwei Li
  2023-05-07 23:26   ` Alistair Francis
  2023-05-07 23:30   ` Alistair Francis
  2 siblings, 0 replies; 10+ messages in thread
From: Weiwei Li @ 2023-04-28  1:08 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel
  Cc: liweiwei, qemu-riscv, alistair.francis, bmeng, zhiwei_liu, palmer


On 2023/4/28 04:57, Daniel Henrique Barboza wrote:
> The function is a no-op if 'vta' is zero but we're still doing a lot of
> stuff in this function regardless. vext_set_elems_1s() will ignore every
> single time (since vta is zero) and we just wasted time.
>
> Skip it altogether in this case. Aside from the code simplification
> there's a noticeable emulation performance gain by doing it. For a
> regular C binary that does a vectors operation like this:
>
> =======
>   #define SZ 10000000
>
> int main ()
> {
>    int *a = malloc (SZ * sizeof (int));
>    int *b = malloc (SZ * sizeof (int));
>    int *c = malloc (SZ * sizeof (int));
>
>    for (int i = 0; i < SZ; i++)
>      c[i] = a[i] + b[i];
>    return c[SZ - 1];
> }
> =======
>
> Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
>      -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real    0m0.303s
> user    0m0.281s
> sys     0m0.023s
>
> With this skip we take ~0.275 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
>      -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real    0m0.274s
> user    0m0.252s
> sys     0m0.019s
>
> This performance gain adds up fast when executing heavy benchmarks like
> SPEC.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---

Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>

Weiwei Li

>   target/riscv/vector_helper.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index f4d0438988..8e6c99e573 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
>                                      void *vd, uint32_t desc, uint32_t nf,
>                                      uint32_t esz, uint32_t max_elems)
>   {
> -    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> -    uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +    uint32_t total_elems, vlenb, registers_used;
>       uint32_t vta = vext_vta(desc);
> -    uint32_t registers_used;
>       int k;
>   
> +    if (vta == 0) {
> +        return;
> +    }
> +
> +    total_elems = vext_get_total_elems(env, desc, esz);
> +    vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +
>       for (k = 0; k < nf; ++k) {
>           vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
>                             (k * max_elems + max_elems) * esz);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
  2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
@ 2023-04-28  1:22   ` Weiwei Li
  2023-04-28  9:16     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 10+ messages in thread
From: Weiwei Li @ 2023-04-28  1:22 UTC (permalink / raw)
  To: Daniel Henrique Barboza, qemu-devel
  Cc: liweiwei, qemu-riscv, alistair.francis, bmeng, zhiwei_liu, palmer


On 2023/4/28 04:57, Daniel Henrique Barboza wrote:
> Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
> vta fields in the vtype register, while also defining that QEMU doesn't
> need to have a tail agnostic policy to be compliant with the RVV spec.
> It ended up removing all tail handling code as well. Later, commit
> 752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
> instructions") reintroduced the tail agnostic fill for vector load/store
> instructions only.
>
> This puts QEMU in a situation where some functions are 1-filling the
> tail elements and others don't. This is still a valid implementation,
> but the process of 1-filling the tail elements takes valuable emulation
> time that can be used doing anything else. If the spec doesn't demand a
> specific tail-agostic policy, a proper software wouldn't expect any
> policy to be in place. This means that, more often than not, the work
> we're doing by 1-filling tail elements is wasted. We would be better of
> if vext_set_tail_elems_1s() is removed entirely from the code.
>
> All this said, there's still a debug value associated with it. So,
> instead of removing it, let's gate it with cpu->cfg.debug. This way
> software can enable this code if desirable, but for the regular case we
> shouldn't waste time with it.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>   target/riscv/vector_helper.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 8e6c99e573..e0a292ac24 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
>       uint32_t vta = vext_vta(desc);
>       int k;
>   
> -    if (vta == 0) {
> +    if (vta == 0 || !riscv_cpu_cfg(env)->debug)  {

I think this is not correct. 'debug' property is used for debug spec. 
And this feature is controlled by another property 'rvv_ta_all_1s' .

By the way, cfg.rvv_ta_all_1s have been ANDed intovta value. So 
additional check on it  is also unnecessary here.

Regards,

Weiwei Li

>           return;
>       }
>   



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only
  2023-04-28  1:22   ` Weiwei Li
@ 2023-04-28  9:16     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Henrique Barboza @ 2023-04-28  9:16 UTC (permalink / raw)
  To: Weiwei Li, qemu-devel
  Cc: qemu-riscv, alistair.francis, bmeng, zhiwei_liu, palmer



On 4/27/23 22:22, Weiwei Li wrote:
> 
> On 2023/4/28 04:57, Daniel Henrique Barboza wrote:
>> Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
>> vta fields in the vtype register, while also defining that QEMU doesn't
>> need to have a tail agnostic policy to be compliant with the RVV spec.
>> It ended up removing all tail handling code as well. Later, commit
>> 752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
>> instructions") reintroduced the tail agnostic fill for vector load/store
>> instructions only.
>>
>> This puts QEMU in a situation where some functions are 1-filling the
>> tail elements and others don't. This is still a valid implementation,
>> but the process of 1-filling the tail elements takes valuable emulation
>> time that can be used doing anything else. If the spec doesn't demand a
>> specific tail-agostic policy, a proper software wouldn't expect any
>> policy to be in place. This means that, more often than not, the work
>> we're doing by 1-filling tail elements is wasted. We would be better of
>> if vext_set_tail_elems_1s() is removed entirely from the code.
>>
>> All this said, there's still a debug value associated with it. So,
>> instead of removing it, let's gate it with cpu->cfg.debug. This way
>> software can enable this code if desirable, but for the regular case we
>> shouldn't waste time with it.
>>
>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>> ---
>>   target/riscv/vector_helper.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
>> index 8e6c99e573..e0a292ac24 100644
>> --- a/target/riscv/vector_helper.c
>> +++ b/target/riscv/vector_helper.c
>> @@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
>>       uint32_t vta = vext_vta(desc);
>>       int k;
>> -    if (vta == 0) {
>> +    if (vta == 0 || !riscv_cpu_cfg(env)->debug)  {
> 
> I think this is not correct. 'debug' property is used for debug spec. And this feature is controlled by another property 'rvv_ta_all_1s' .

You're right. I wasn't aware that this flag exists:


$ git grep 'rvv_ta_all_1s'
target/riscv/cpu.c:    DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
target/riscv/cpu.h:    bool rvv_ta_all_1s;
target/riscv/translate.c:    ctx->vta = FIELD_EX32(tb_flags, TB_FLAGS, VTA) && cpu->cfg.rvv_ta_all_1s;
target/riscv/translate.c:    ctx->cfg_vta_all_1s = cpu->cfg.rvv_ta_all_1s;




> 
> By the way, cfg.rvv_ta_all_1s have been ANDed intovta value. So additional check on it  is also unnecessary here.


Yes. We can drop this patch then since 'vta' is already accounting for ta_all_1s.


Thanks,

Daniel


> 
> Regards,
> 
> Weiwei Li
> 
>>           return;
>>       }
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] target/riscv: RVV 1-fill tail element changes
  2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
                   ` (2 preceding siblings ...)
  2023-04-27 21:15 ` [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Palmer Dabbelt
@ 2023-04-28  9:30 ` Dickon Hood
  3 siblings, 0 replies; 10+ messages in thread
From: Dickon Hood @ 2023-04-28  9:30 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liweiwei,
	zhiwei_liu, palmer

On Thu, Apr 27, 2023 at 17:57:06 -0300, Daniel Henrique Barboza wrote:

: Second patch makes the function debug only. The logic is explained in
: the commit message, but long story short: we don't have to implement any
: tail-agnostic policy at all to be spec compliant, but this function has
: its uses for debug purposes, so keeping it as a debug option allow users
: to disable it on demand.

Yes, please don't remove this functionality completely -- our internal
stress test tool for the vector crypto patches relies on it.

-- 
Dickon Hood                                    Senior Software Engineer
                                                         Codethink Ltd.
                                 3rd Floor, Dale House, 35 Dale Street,
https://www.codethink.co.uk/        Manchester, M1 2HF, United Kingdom.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
  2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
  2023-04-28  1:08   ` Weiwei Li
@ 2023-05-07 23:26   ` Alistair Francis
  2023-05-07 23:30   ` Alistair Francis
  2 siblings, 0 replies; 10+ messages in thread
From: Alistair Francis @ 2023-05-07 23:26 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liweiwei,
	zhiwei_liu, palmer

On Fri, Apr 28, 2023 at 6:58 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> The function is a no-op if 'vta' is zero but we're still doing a lot of
> stuff in this function regardless. vext_set_elems_1s() will ignore every
> single time (since vta is zero) and we just wasted time.
>
> Skip it altogether in this case. Aside from the code simplification
> there's a noticeable emulation performance gain by doing it. For a
> regular C binary that does a vectors operation like this:
>
> =======
>  #define SZ 10000000
>
> int main ()
> {
>   int *a = malloc (SZ * sizeof (int));
>   int *b = malloc (SZ * sizeof (int));
>   int *c = malloc (SZ * sizeof (int));
>
>   for (int i = 0; i < SZ; i++)
>     c[i] = a[i] + b[i];
>   return c[SZ - 1];
> }
> =======
>
> Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
>     -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real    0m0.303s
> user    0m0.281s
> sys     0m0.023s
>
> With this skip we take ~0.275 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
>     -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real    0m0.274s
> user    0m0.252s
> sys     0m0.019s
>
> This performance gain adds up fast when executing heavy benchmarks like
> SPEC.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/vector_helper.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index f4d0438988..8e6c99e573 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
>                                     void *vd, uint32_t desc, uint32_t nf,
>                                     uint32_t esz, uint32_t max_elems)
>  {
> -    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> -    uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +    uint32_t total_elems, vlenb, registers_used;
>      uint32_t vta = vext_vta(desc);
> -    uint32_t registers_used;
>      int k;
>
> +    if (vta == 0) {
> +        return;
> +    }
> +
> +    total_elems = vext_get_total_elems(env, desc, esz);
> +    vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +
>      for (k = 0; k < nf; ++k) {
>          vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
>                            (k * max_elems + max_elems) * esz);
> --
> 2.40.0
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero
  2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
  2023-04-28  1:08   ` Weiwei Li
  2023-05-07 23:26   ` Alistair Francis
@ 2023-05-07 23:30   ` Alistair Francis
  2 siblings, 0 replies; 10+ messages in thread
From: Alistair Francis @ 2023-05-07 23:30 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, qemu-riscv, alistair.francis, bmeng, liweiwei,
	zhiwei_liu, palmer

On Fri, Apr 28, 2023 at 6:58 AM Daniel Henrique Barboza
<dbarboza@ventanamicro.com> wrote:
>
> The function is a no-op if 'vta' is zero but we're still doing a lot of
> stuff in this function regardless. vext_set_elems_1s() will ignore every
> single time (since vta is zero) and we just wasted time.
>
> Skip it altogether in this case. Aside from the code simplification
> there's a noticeable emulation performance gain by doing it. For a
> regular C binary that does a vectors operation like this:
>
> =======
>  #define SZ 10000000
>
> int main ()
> {
>   int *a = malloc (SZ * sizeof (int));
>   int *b = malloc (SZ * sizeof (int));
>   int *c = malloc (SZ * sizeof (int));
>
>   for (int i = 0; i < SZ; i++)
>     c[i] = a[i] + b[i];
>   return c[SZ - 1];
> }
> =======
>
> Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
>     -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real    0m0.303s
> user    0m0.281s
> sys     0m0.023s
>
> With this skip we take ~0.275 sec:
>
> $ time ~/work/qemu/build/qemu-riscv64 \
>     -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out
>
> real    0m0.274s
> user    0m0.252s
> sys     0m0.019s
>
> This performance gain adds up fast when executing heavy benchmarks like
> SPEC.
>
> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/vector_helper.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index f4d0438988..8e6c99e573 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,
>                                     void *vd, uint32_t desc, uint32_t nf,
>                                     uint32_t esz, uint32_t max_elems)
>  {
> -    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> -    uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +    uint32_t total_elems, vlenb, registers_used;
>      uint32_t vta = vext_vta(desc);
> -    uint32_t registers_used;
>      int k;
>
> +    if (vta == 0) {
> +        return;
> +    }
> +
> +    total_elems = vext_get_total_elems(env, desc, esz);
> +    vlenb = riscv_cpu_cfg(env)->vlen >> 3;
> +
>      for (k = 0; k < nf; ++k) {
>          vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
>                            (k * max_elems + max_elems) * esz);
> --
> 2.40.0
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-07 23:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27 20:57 [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Daniel Henrique Barboza
2023-04-27 20:57 ` [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero Daniel Henrique Barboza
2023-04-28  1:08   ` Weiwei Li
2023-05-07 23:26   ` Alistair Francis
2023-05-07 23:30   ` Alistair Francis
2023-04-27 20:57 ` [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only Daniel Henrique Barboza
2023-04-28  1:22   ` Weiwei Li
2023-04-28  9:16     ` Daniel Henrique Barboza
2023-04-27 21:15 ` [PATCH 0/2] target/riscv: RVV 1-fill tail element changes Palmer Dabbelt
2023-04-28  9:30 ` Dickon Hood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).