qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] target/riscv: Use a direct cast for better performance
@ 2023-10-08 21:50 Richard W.M. Jones
  2023-10-08 21:50 ` Richard W.M. Jones
  0 siblings, 1 reply; 6+ messages in thread
From: Richard W.M. Jones @ 2023-10-08 21:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
	dbarboza, zhiwei_liu, pbonzini

v1 was here:
https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg02021.html

v2 is functionally exactly the same, except I changed
s/qemu cast/QOM cast/ in the comment, and added the Reviewed-by tag
received for the first version.

Rich.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] target/riscv: Use a direct cast for better performance
  2023-10-08 21:50 [PATCH v2] target/riscv: Use a direct cast for better performance Richard W.M. Jones
@ 2023-10-08 21:50 ` Richard W.M. Jones
  2023-10-09 10:23   ` Philippe Mathieu-Daudé
  2023-10-09 12:36   ` LIU Zhiwei
  0 siblings, 2 replies; 6+ messages in thread
From: Richard W.M. Jones @ 2023-10-08 21:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
	dbarboza, zhiwei_liu, pbonzini

RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
this adds about 5% total overhead when emulating RV64 on x86-64 host.

Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
disk.  The guest has a copy of the qemu source tree.  The test
involves compiling the qemu source tree with 'make clean; time make -j16'.

Before making this change the compile step took 449 & 447 seconds over
two consecutive runs.

After making this change, 428 & 422 seconds.

The saving is about 5%.

Thanks: Paolo Bonzini
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
---
 target/riscv/cpu_helper.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 3a02079290..479d9863ae 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
                           uint64_t *cs_base, uint32_t *pflags)
 {
     CPUState *cs = env_cpu(env);
-    RISCVCPU *cpu = RISCV_CPU(cs);
+    /*
+     * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
+     * QOM cast debugging is enabled, so use a direct cast instead.
+     */
+    RISCVCPU *cpu = (RISCVCPU *)cs;
     RISCVExtStatus fs, vs;
     uint32_t flags = 0;
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] target/riscv: Use a direct cast for better performance
  2023-10-08 21:50 ` Richard W.M. Jones
@ 2023-10-09 10:23   ` Philippe Mathieu-Daudé
  2023-10-09 12:36   ` LIU Zhiwei
  1 sibling, 0 replies; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-10-09 10:23 UTC (permalink / raw)
  To: Richard W.M. Jones, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
	dbarboza, zhiwei_liu, pbonzini

Hi Richard,

On 8/10/23 23:50, Richard W.M. Jones wrote:
> RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
> this adds about 5% total overhead when emulating RV64 on x86-64 host.
> 
> Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
> disk.  The guest has a copy of the qemu source tree.  The test
> involves compiling the qemu source tree with 'make clean; time make -j16'.
> 
> Before making this change the compile step took 449 & 447 seconds over
> two consecutive runs.
> 
> After making this change, 428 & 422 seconds.
> 
> The saving is about 5%.
> 
> Thanks: Paolo Bonzini
> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>   target/riscv/cpu_helper.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 3a02079290..479d9863ae 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
>                             uint64_t *cs_base, uint32_t *pflags)
>   {
>       CPUState *cs = env_cpu(env);
> -    RISCVCPU *cpu = RISCV_CPU(cs);

You might want to use:

        RISCVCPU *cpu =  env_archcpu(env);

Other occurences in target/riscv/internals.h.

> +    /*
> +     * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
> +     * QOM cast debugging is enabled, so use a direct cast instead.
> +     */
> +    RISCVCPU *cpu = (RISCVCPU *)cs;
>       RISCVExtStatus fs, vs;
>       uint32_t flags = 0;
>   



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] target/riscv: Use a direct cast for better performance
  2023-10-08 21:50 ` Richard W.M. Jones
  2023-10-09 10:23   ` Philippe Mathieu-Daudé
@ 2023-10-09 12:36   ` LIU Zhiwei
  2023-10-09 12:53     ` Richard W.M. Jones
  1 sibling, 1 reply; 6+ messages in thread
From: LIU Zhiwei @ 2023-10-09 12:36 UTC (permalink / raw)
  To: Richard W.M. Jones, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
	dbarboza, pbonzini


On 2023/10/9 5:50, Richard W.M. Jones wrote:
> RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
> this adds about 5% total overhead when emulating RV64 on x86-64 host.
>
> Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
> disk.  The guest has a copy of the qemu source tree.  The test
> involves compiling the qemu source tree with 'make clean; time make -j16'.
>
> Before making this change the compile step took 449 & 447 seconds over
> two consecutive runs.
>
> After making this change, 428 & 422 seconds.
>
> The saving is about 5%.
>
> Thanks: Paolo Bonzini
> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> ---
>   target/riscv/cpu_helper.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 3a02079290..479d9863ae 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
>                             uint64_t *cs_base, uint32_t *pflags)
>   {
>       CPUState *cs = env_cpu(env);
> -    RISCVCPU *cpu = RISCV_CPU(cs);
> +    /*
> +     * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
> +     * QOM cast debugging is enabled, so use a direct cast instead.
> +     */
> +    RISCVCPU *cpu = (RISCVCPU *)cs;

This function is very hot. Maybe we should cache the tbflags instead of 
calculate it here. Otherwise,

Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>

Zhiwei

>       RISCVExtStatus fs, vs;
>       uint32_t flags = 0;
>   


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] target/riscv: Use a direct cast for better performance
  2023-10-09 12:36   ` LIU Zhiwei
@ 2023-10-09 12:53     ` Richard W.M. Jones
  2023-10-09 12:58       ` LIU Zhiwei
  0 siblings, 1 reply; 6+ messages in thread
From: Richard W.M. Jones @ 2023-10-09 12:53 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel, qemu-riscv, palmer, alistair.francis, bin.meng,
	liweiwei, dbarboza, pbonzini

On Mon, Oct 09, 2023 at 08:36:28PM +0800, LIU Zhiwei wrote:
> 
> On 2023/10/9 5:50, Richard W.M. Jones wrote:
> >RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
> >this adds about 5% total overhead when emulating RV64 on x86-64 host.
> >
> >Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
> >disk.  The guest has a copy of the qemu source tree.  The test
> >involves compiling the qemu source tree with 'make clean; time make -j16'.
> >
> >Before making this change the compile step took 449 & 447 seconds over
> >two consecutive runs.
> >
> >After making this change, 428 & 422 seconds.
> >
> >The saving is about 5%.
> >
> >Thanks: Paolo Bonzini
> >Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> >Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> >---
> >  target/riscv/cpu_helper.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> >diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> >index 3a02079290..479d9863ae 100644
> >--- a/target/riscv/cpu_helper.c
> >+++ b/target/riscv/cpu_helper.c
> >@@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
> >                            uint64_t *cs_base, uint32_t *pflags)
> >  {
> >      CPUState *cs = env_cpu(env);
> >-    RISCVCPU *cpu = RISCV_CPU(cs);
> >+    /*
> >+     * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
> >+     * QOM cast debugging is enabled, so use a direct cast instead.
> >+     */
> >+    RISCVCPU *cpu = (RISCVCPU *)cs;
> 
> This function is very hot. Maybe we should cache the tbflags instead
> of calculate it here. Otherwise,

This function is indeed very hot, taking over 20% of total host time
in my guest stress test.

How would we cache the flags?  AIUI they simply depend on machine
state and we must recalculate them either when the machine state
changes (sprinkle "update_tbflags" everywhere) or here.  If you have
any suggestions I can try things.

> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>

I posted a v3 based on Philippe's feedback.

Rich.

> Zhiwei
> 
> >      RISCVExtStatus fs, vs;
> >      uint32_t flags = 0;

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] target/riscv: Use a direct cast for better performance
  2023-10-09 12:53     ` Richard W.M. Jones
@ 2023-10-09 12:58       ` LIU Zhiwei
  0 siblings, 0 replies; 6+ messages in thread
From: LIU Zhiwei @ 2023-10-09 12:58 UTC (permalink / raw)
  To: Richard W.M. Jones, LIU Zhiwei
  Cc: qemu-devel, qemu-riscv, palmer, alistair.francis, bin.meng,
	liweiwei, dbarboza, pbonzini


On 2023/10/9 20:53, Richard W.M. Jones wrote:
> On Mon, Oct 09, 2023 at 08:36:28PM +0800, LIU Zhiwei wrote:
>> On 2023/10/9 5:50, Richard W.M. Jones wrote:
>>> RISCV_CPU(cs) uses a checked cast.  When QOM cast debugging is enabled
>>> this adds about 5% total overhead when emulating RV64 on x86-64 host.
>>>
>>> Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
>>> disk.  The guest has a copy of the qemu source tree.  The test
>>> involves compiling the qemu source tree with 'make clean; time make -j16'.
>>>
>>> Before making this change the compile step took 449 & 447 seconds over
>>> two consecutive runs.
>>>
>>> After making this change, 428 & 422 seconds.
>>>
>>> The saving is about 5%.
>>>
>>> Thanks: Paolo Bonzini
>>> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
>>> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>>> ---
>>>   target/riscv/cpu_helper.c | 6 +++++-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>>> index 3a02079290..479d9863ae 100644
>>> --- a/target/riscv/cpu_helper.c
>>> +++ b/target/riscv/cpu_helper.c
>>> @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
>>>                             uint64_t *cs_base, uint32_t *pflags)
>>>   {
>>>       CPUState *cs = env_cpu(env);
>>> -    RISCVCPU *cpu = RISCV_CPU(cs);
>>> +    /*
>>> +     * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
>>> +     * QOM cast debugging is enabled, so use a direct cast instead.
>>> +     */
>>> +    RISCVCPU *cpu = (RISCVCPU *)cs;
>> This function is very hot. Maybe we should cache the tbflags instead
>> of calculate it here. Otherwise,
> This function is indeed very hot, taking over 20% of total host time
> in my guest stress test.
>
> How would we cache the flags?  AIUI they simply depend on machine
> state and we must recalculate them either when the machine state
> changes (sprinkle "update_tbflags" everywhere)
Yes, we should do in this way.
>   or here.  If you have
> any suggestions I can try things.
I think it exceeds this patch.
>
>> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
> I posted a v3 based on Philippe's feedback.

OK.

Thanks
Zhiwei

>
> Rich.
>
>> Zhiwei
>>
>>>       RISCVExtStatus fs, vs;
>>>       uint32_t flags = 0;


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-09 13:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-08 21:50 [PATCH v2] target/riscv: Use a direct cast for better performance Richard W.M. Jones
2023-10-08 21:50 ` Richard W.M. Jones
2023-10-09 10:23   ` Philippe Mathieu-Daudé
2023-10-09 12:36   ` LIU Zhiwei
2023-10-09 12:53     ` Richard W.M. Jones
2023-10-09 12:58       ` LIU Zhiwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).