* [PATCH] target/riscv: Use a direct cast for better performance
@ 2023-10-07 9:02 Richard W.M. Jones
2023-10-07 9:10 ` Richard W.M. Jones
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Richard W.M. Jones @ 2023-10-07 9:02 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
dbarboza, zhiwei_liu, pbonzini
RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled
this adds about 5% total overhead when emulating RV64 on x86-64 host.
Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
disk. The guest has a copy of the qemu source tree. The test
involves compiling the qemu source tree with 'make clean; time make -j16'.
Before making this change the compile step took 449 & 447 seconds over
two consecutive runs.
After making this change, 428 & 422 seconds.
The saving is about 5%.
Thanks: Paolo Bonzini
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
---
target/riscv/cpu_helper.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 3a02079290..6174d99fb2 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
uint64_t *cs_base, uint32_t *pflags)
{
CPUState *cs = env_cpu(env);
- RISCVCPU *cpu = RISCV_CPU(cs);
+ /*
+ * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
+ * qemu cast debugging is enabled, so use a direct cast instead.
+ */
+ RISCVCPU *cpu = (RISCVCPU *)cs;
RISCVExtStatus fs, vs;
uint32_t flags = 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] target/riscv: Use a direct cast for better performance
2023-10-07 9:02 [PATCH] target/riscv: Use a direct cast for better performance Richard W.M. Jones
@ 2023-10-07 9:10 ` Richard W.M. Jones
2023-10-07 12:08 ` Daniel Henrique Barboza
2023-10-10 16:31 ` Richard Henderson
2 siblings, 0 replies; 4+ messages in thread
From: Richard W.M. Jones @ 2023-10-07 9:10 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
dbarboza, zhiwei_liu, pbonzini
If you're interested in how I found this problem, it was done using
'perf report -a -g' & flamegraphs. This is the flamegraph of qemu (on
the host) when the guest is running the parallel compile:
http://oirase.annexia.org/tmp/qemu-riscv.svg
If you click into 'CPU_0/TCG' at the bottom left (all the vCPUs
basically act alike), and then go to 'cpu_get_tb_cpu_state' you can
see the call to 'object_dynamic_cast_assert' taking considerable time.
If you zoom out, hit Ctrl F and type 'object_dynamic_cast_assert' into
the search box then the flamegraph will tell you this call takes about
6.6% of total time (not all, but most, attributable to the call from
'cpu_get_tb_cpu_state' -> 'object_dynamic_cast_assert').
There are several other issues in the flamegraph which I'm trying to
address, but this was the simplest one.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] target/riscv: Use a direct cast for better performance
2023-10-07 9:02 [PATCH] target/riscv: Use a direct cast for better performance Richard W.M. Jones
2023-10-07 9:10 ` Richard W.M. Jones
@ 2023-10-07 12:08 ` Daniel Henrique Barboza
2023-10-10 16:31 ` Richard Henderson
2 siblings, 0 replies; 4+ messages in thread
From: Daniel Henrique Barboza @ 2023-10-07 12:08 UTC (permalink / raw)
To: Richard W.M. Jones, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
zhiwei_liu, pbonzini
On 10/7/23 06:02, Richard W.M. Jones wrote:
> RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled
> this adds about 5% total overhead when emulating RV64 on x86-64 host.
>
> Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
> disk. The guest has a copy of the qemu source tree. The test
> involves compiling the qemu source tree with 'make clean; time make -j16'.
>
> Before making this change the compile step took 449 & 447 seconds over
> two consecutive runs.
>
> After making this change, 428 & 422 seconds.
>
> The saving is about 5%.
>
> Thanks: Paolo Bonzini
> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> ---
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
> target/riscv/cpu_helper.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 3a02079290..6174d99fb2 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
> uint64_t *cs_base, uint32_t *pflags)
> {
> CPUState *cs = env_cpu(env);
> - RISCVCPU *cpu = RISCV_CPU(cs);
> + /*
> + * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
> + * qemu cast debugging is enabled, so use a direct cast instead.
> + */
> + RISCVCPU *cpu = (RISCVCPU *)cs;
> RISCVExtStatus fs, vs;
> uint32_t flags = 0;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] target/riscv: Use a direct cast for better performance
2023-10-07 9:02 [PATCH] target/riscv: Use a direct cast for better performance Richard W.M. Jones
2023-10-07 9:10 ` Richard W.M. Jones
2023-10-07 12:08 ` Daniel Henrique Barboza
@ 2023-10-10 16:31 ` Richard Henderson
2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2023-10-10 16:31 UTC (permalink / raw)
To: Richard W.M. Jones, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, bin.meng, liweiwei,
dbarboza, zhiwei_liu, pbonzini
On 10/7/23 02:02, Richard W.M. Jones wrote:
> RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled
> this adds about 5% total overhead when emulating RV64 on x86-64 host.
>
> Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk
> disk. The guest has a copy of the qemu source tree. The test
> involves compiling the qemu source tree with 'make clean; time make -j16'.
>
> Before making this change the compile step took 449 & 447 seconds over
> two consecutive runs.
>
> After making this change, 428 & 422 seconds.
>
> The saving is about 5%.
>
> Thanks: Paolo Bonzini
> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> ---
> target/riscv/cpu_helper.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 3a02079290..6174d99fb2 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
> uint64_t *cs_base, uint32_t *pflags)
> {
> CPUState *cs = env_cpu(env);
> - RISCVCPU *cpu = RISCV_CPU(cs);
> + /*
> + * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when
> + * qemu cast debugging is enabled, so use a direct cast instead.
> + */
> + RISCVCPU *cpu = (RISCVCPU *)cs;
RISCVCPU *cpu = env_archcpu(env);
and avoid "CPUState *cs" entirely.
r~
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-10 16:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-07 9:02 [PATCH] target/riscv: Use a direct cast for better performance Richard W.M. Jones
2023-10-07 9:10 ` Richard W.M. Jones
2023-10-07 12:08 ` Daniel Henrique Barboza
2023-10-10 16:31 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).