qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] Do not enable QOM debugging by default
@ 2018-07-16 19:47 BALATON Zoltan
  2018-07-16 20:41 ` Peter Maydell
  0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-16 19:47 UTC (permalink / raw)
  To: qemu-devel, qemu-trivial; +Cc: Paolo Bonzini

Commit 3556c233d 5 years ago added an option to disable QOM debugging,
noting that it can have high performance cost but left the default to on.
Change the default to off and only enable it when debugging is requested
to avoid the performance penalty for those compiling without debug options.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
 configure | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 2a7796e..a8fc1ed 100755
--- a/configure
+++ b/configure
@@ -415,7 +415,7 @@ bsd_user="no"
 blobs="yes"
 pkgversion=""
 pie=""
-qom_cast_debug="yes"
+qom_cast_debug="no"
 trace_backends="log"
 trace_file="trace"
 spice=""
@@ -1062,6 +1062,7 @@ for opt do
       debug_tcg="yes"
       debug_mutex="yes"
       debug="yes"
+      qom_cast_debug="yes"
       strip_opt="no"
       fortify_source="no"
   ;;
-- 
2.7.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] Do not enable QOM debugging by default
  2018-07-16 19:47 [Qemu-devel] [PATCH] Do not enable QOM debugging by default BALATON Zoltan
@ 2018-07-16 20:41 ` Peter Maydell
  2018-07-16 21:08   ` BALATON Zoltan
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Peter Maydell @ 2018-07-16 20:41 UTC (permalink / raw)
  To: BALATON Zoltan; +Cc: QEMU Developers, QEMU Trivial, Paolo Bonzini

On 16 July 2018 at 20:47, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> Commit 3556c233d 5 years ago added an option to disable QOM debugging,
> noting that it can have high performance cost but left the default to on.
> Change the default to off and only enable it when debugging is requested
> to avoid the performance penalty for those compiling without debug options.
>
> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>

Looking at the code, doesn't this disable all the type checking
on the QOM casts? That is not something I think we should drop,
in the same way that we do not disable assert()s even in the
non-debug build.

Is this coming up as significant in profiling? In the past we've
done some special casing if there are particular QOM casts in
hot paths that use significant runtime.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] Do not enable QOM debugging by default
  2018-07-16 20:41 ` Peter Maydell
@ 2018-07-16 21:08   ` BALATON Zoltan
  2018-07-17  9:40   ` Daniel P. Berrangé
  2018-07-17 15:09   ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
  2 siblings, 0 replies; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-16 21:08 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, QEMU Trivial, Paolo Bonzini

On Mon, 16 Jul 2018, Peter Maydell wrote:
> On 16 July 2018 at 20:47, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>> Commit 3556c233d 5 years ago added an option to disable QOM debugging,
>> noting that it can have high performance cost but left the default to on.
>> Change the default to off and only enable it when debugging is requested
>> to avoid the performance penalty for those compiling without debug options.
>>
>> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
>
> Looking at the code, doesn't this disable all the type checking
> on the QOM casts? That is not something I think we should drop,
> in the same way that we do not disable assert()s even in the
> non-debug build.
>
> Is this coming up as significant in profiling? In the past we've
> done some special casing if there are particular QOM casts in
> hot paths that use significant runtime.

My profiling results are not very accurate (could only gather sample based 
results with oprofile as --enable-gprof haven't worked for me) so I don't 
have numbers but I did see object_class_dynamic_cast_assert coming out 
among the top few without disabling this option but I don't know which 
call sites are responsible.

Are there any recommendations for profiling QEMU (especially TCG) 
somewhere that I should be aware of?

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH] Do not enable QOM debugging by default
  2018-07-16 20:41 ` Peter Maydell
  2018-07-16 21:08   ` BALATON Zoltan
@ 2018-07-17  9:40   ` Daniel P. Berrangé
  2018-07-17 15:09   ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2018-07-17  9:40 UTC (permalink / raw)
  To: Peter Maydell
  Cc: BALATON Zoltan, QEMU Trivial, Paolo Bonzini, QEMU Developers

On Mon, Jul 16, 2018 at 09:41:12PM +0100, Peter Maydell wrote:
> On 16 July 2018 at 20:47, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> > Commit 3556c233d 5 years ago added an option to disable QOM debugging,
> > noting that it can have high performance cost but left the default to on.
> > Change the default to off and only enable it when debugging is requested
> > to avoid the performance penalty for those compiling without debug options.
> >
> > Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
> 
> Looking at the code, doesn't this disable all the type checking
> on the QOM casts? That is not something I think we should drop,
> in the same way that we do not disable assert()s even in the
> non-debug build.
> 
> Is this coming up as significant in profiling? In the past we've
> done some special casing if there are particular QOM casts in
> hot paths that use significant runtime.

There might also be scope for doing optimization of the
object_dynamic_cast_assert() implementation too.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default)
  2018-07-16 20:41 ` Peter Maydell
  2018-07-16 21:08   ` BALATON Zoltan
  2018-07-17  9:40   ` Daniel P. Berrangé
@ 2018-07-17 15:09   ` BALATON Zoltan
  2018-07-17 17:09     ` [Qemu-devel] [Qemu-ppc] Profiling results Mark Cave-Ayland
  2 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-17 15:09 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-ppc

On Mon, 16 Jul 2018, Peter Maydell wrote:
> Is this coming up as significant in profiling? In the past we've

This seems to depend on the workload. From the cases I'm interested in 
AROS and AmigaOS on qemu-system-ppc -M sam460ex does not seem to be 
effected much (object_class_dynamic_cast_assert is not in top 10 with <2%) 
but for MorphOS on mac99 this seems to be significant. This is with 
default configure (--enable-qom-cast-debug):

%       cum. %     linenr info                 symbol name
9.7057   9.7057    exec-all.h:410              helper_lookup_tb_ptr
8.0330  17.7387    object.c:711                object_class_dynamic_cast_assert
6.9411  24.6798    cputlb.c:793                io_readx
6.3219  31.0017    sm501_template.h:62         draw_line16_32
5.3601  36.3617    cputlb.c:114                tlb_flush_nocheck
3.6170  39.9787    translate-all.c:749         page_trylock_add
3.1188  43.0975    translate-all.c:803         page_collection_lock
3.0405  46.1380    exec.c:3025                 iotlb_to_section
2.7044  48.8424    softmmu_template.h:112      helper_ret_ldub_mmu
2.4154  51.2578    memory.c:1350               memory_region_access_valid

and improves a bit (but not much) with --disable-qom-cast-debug

%        cum. %     linenr info                 symbol name
10.2063  10.2063    exec-all.h:410              helper_lookup_tb_ptr
  7.1581  17.3644    object.c:711                object_class_dynamic_cast_assert
  5.9297  23.2941    sm501_template.h:62         draw_line16_32
  5.9227  29.2168    cputlb.c:793                io_readx
  5.3030  34.5198    cputlb.c:114                tlb_flush_nocheck
  3.6445  38.1643    memory.c:1350               memory_region_access_valid
  3.5499  41.7142    softmmu_template.h:112      helper_ret_ldub_mmu
  3.0383  44.7525    translate-all.c:803         page_collection_lock
  2.9735  47.7260    memory.c:1415               memory_region_dispatch_read
  2.9503  50.6763    translate-all.c:749         page_trylock_add

But the workloads may not have been 100% identical so this is not 
conclusive, maybe this debug code is not that expensive at the moment.

AROS on sam460ex has a different profile:

%        cum. %     linenr info                 symbol name
8.9905   8.9905     translate-all.c:749         page_trylock_add
8.7658  17.7563     exec-all.h:410              helper_lookup_tb_ptr
7.7349  25.4911     translate-all.c:803         page_collection_lock
5.8246  31.3158     cputlb.c:924                victim_tlb_hit
3.1640  34.4797     cpus.c:347                  cpu_get_clock
3.1538  37.6335     translate-all.c:788         tb_page_addr_cmp
2.7969  40.4304     exec.c:435                  address_space_translate_internal
2.6647  43.0951     memory.c:571                access_with_adjusted_size
2.0615  45.1567     exec.c:569                  flatview_do_translate
1.9586  47.1153     memory.c:1350               memory_region_access_valid

Would anyone be able to guess what are the places that should be looked at 
or what to check to get more info on this?

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] Profiling results
  2018-07-17 15:09   ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
@ 2018-07-17 17:09     ` Mark Cave-Ayland
  2018-07-17 19:35       ` BALATON Zoltan
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Cave-Ayland @ 2018-07-17 17:09 UTC (permalink / raw)
  To: BALATON Zoltan, Peter Maydell; +Cc: qemu-ppc, qemu-devel

On 17/07/18 16:09, BALATON Zoltan wrote:

> On Mon, 16 Jul 2018, Peter Maydell wrote:
>> Is this coming up as significant in profiling? In the past we've
> 
> This seems to depend on the workload. From the cases I'm interested in 
> AROS and AmigaOS on qemu-system-ppc -M sam460ex does not seem to be 
> effected much (object_class_dynamic_cast_assert is not in top 10 with 
> <2%) but for MorphOS on mac99 this seems to be significant. This is with 
> default configure (--enable-qom-cast-debug):
> 
> %       cum. %     linenr info                 symbol name
> 9.7057   9.7057    exec-all.h:410              helper_lookup_tb_ptr
> 8.0330  17.7387    object.c:711                
> object_class_dynamic_cast_assert
> 6.9411  24.6798    cputlb.c:793                io_readx
> 6.3219  31.0017    sm501_template.h:62         draw_line16_32
> 5.3601  36.3617    cputlb.c:114                tlb_flush_nocheck
> 3.6170  39.9787    translate-all.c:749         page_trylock_add
> 3.1188  43.0975    translate-all.c:803         page_collection_lock
> 3.0405  46.1380    exec.c:3025                 iotlb_to_section
> 2.7044  48.8424    softmmu_template.h:112      helper_ret_ldub_mmu
> 2.4154  51.2578    memory.c:1350               memory_region_access_valid
> 
> and improves a bit (but not much) with --disable-qom-cast-debug
> 
> %        cum. %     linenr info                 symbol name
> 10.2063  10.2063    exec-all.h:410              helper_lookup_tb_ptr
>   7.1581  17.3644    object.c:711                
> object_class_dynamic_cast_assert
>   5.9297  23.2941    sm501_template.h:62         draw_line16_32
>   5.9227  29.2168    cputlb.c:793                io_readx
>   5.3030  34.5198    cputlb.c:114                tlb_flush_nocheck
>   3.6445  38.1643    memory.c:1350               memory_region_access_valid
>   3.5499  41.7142    softmmu_template.h:112      helper_ret_ldub_mmu
>   3.0383  44.7525    translate-all.c:803         page_collection_lock
>   2.9735  47.7260    memory.c:1415               
> memory_region_dispatch_read
>   2.9503  50.6763    translate-all.c:749         page_trylock_add
> 
> But the workloads may not have been 100% identical so this is not 
> conclusive, maybe this debug code is not that expensive at the moment.
> 
> AROS on sam460ex has a different profile:
> 
> %        cum. %     linenr info                 symbol name
> 8.9905   8.9905     translate-all.c:749         page_trylock_add
> 8.7658  17.7563     exec-all.h:410              helper_lookup_tb_ptr
> 7.7349  25.4911     translate-all.c:803         page_collection_lock
> 5.8246  31.3158     cputlb.c:924                victim_tlb_hit
> 3.1640  34.4797     cpus.c:347                  cpu_get_clock
> 3.1538  37.6335     translate-all.c:788         tb_page_addr_cmp
> 2.7969  40.4304     exec.c:435                  
> address_space_translate_internal
> 2.6647  43.0951     memory.c:571                access_with_adjusted_size
> 2.0615  45.1567     exec.c:569                  flatview_do_translate
> 1.9586  47.1153     memory.c:1350               memory_region_access_valid
> 
> Would anyone be able to guess what are the places that should be looked 
> at or what to check to get more info on this?

My first thought is that there is a QOM cast somewhere in a hot path on 
-M mac99 - can you show us the call stack information from the profile?

I had a similar issue with SPARC32 whereby each DMA request needs to be 
manually word-swapped, so instead of adding the QOM cast into these 
routines I did a direct C cast from the opaque to ensure that the 
overhead was as little as possible.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] Profiling results
  2018-07-17 17:09     ` [Qemu-devel] [Qemu-ppc] Profiling results Mark Cave-Ayland
@ 2018-07-17 19:35       ` BALATON Zoltan
  2018-07-17 19:57         ` Mark Cave-Ayland
  0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-17 19:35 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: Peter Maydell, qemu-ppc, qemu-devel

On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>> MorphOS on mac99 this seems to be significant. This is with default 
>> configure (--enable-qom-cast-debug):
>> 
>> %?????? cum. %???? linenr info???????????????? symbol name
>> 9.7057?? 9.7057??? exec-all.h:410????????????? helper_lookup_tb_ptr
>> 8.0330? 17.7387??? object.c:711 
>> object_class_dynamic_cast_assert
>> 6.9411? 24.6798??? cputlb.c:793??????????????? io_readx
>> 6.3219? 31.0017??? sm501_template.h:62???????? draw_line16_32
>> 5.3601? 36.3617??? cputlb.c:114??????????????? tlb_flush_nocheck
>> 3.6170? 39.9787??? translate-all.c:749???????? page_trylock_add
>> 3.1188? 43.0975??? translate-all.c:803???????? page_collection_lock
>> 3.0405? 46.1380??? exec.c:3025???????????????? iotlb_to_section
>> 2.7044? 48.8424??? softmmu_template.h:112????? helper_ret_ldub_mmu
>> 2.4154? 51.2578??? memory.c:1350?????????????? memory_region_access_valid
[...]
> My first thought is that there is a QOM cast somewhere in a hot path on -M 
> mac99 - can you show us the call stack information from the profile?

Not really. Oprofile that gave me this profile could not display call 
graph for this call. I've tried again with the perf tool but I'm not quite 
sure how to interpret its results. If I'm not mistaken it points me in the 
direction of cpu_asidx_from_attrs, called from iotlb_to_section, called 
from io_readx. The object_class_dynamic_cast_assert call likely comes from 
OBJECT_CLASS_CHECK, expanded from OBJECT_GET_CLASS, expanded from 
CPU_GET_CLASS. Would that make any sense? Any idea what to do about it?

Thank you,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] Profiling results
  2018-07-17 19:35       ` BALATON Zoltan
@ 2018-07-17 19:57         ` Mark Cave-Ayland
  2018-07-17 20:46           ` BALATON Zoltan
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Cave-Ayland @ 2018-07-17 19:57 UTC (permalink / raw)
  To: BALATON Zoltan; +Cc: Peter Maydell, qemu-ppc, qemu-devel

On 17/07/18 20:35, BALATON Zoltan wrote:

> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>>> MorphOS on mac99 this seems to be significant. This is with default 
>>> configure (--enable-qom-cast-debug):
>>>
>>> %?????? cum. %???? linenr info???????????????? symbol name
>>> 9.7057?? 9.7057??? exec-all.h:410????????????? helper_lookup_tb_ptr
>>> 8.0330? 17.7387??? object.c:711 object_class_dynamic_cast_assert
>>> 6.9411? 24.6798??? cputlb.c:793??????????????? io_readx
>>> 6.3219? 31.0017??? sm501_template.h:62???????? draw_line16_32
>>> 5.3601? 36.3617??? cputlb.c:114??????????????? tlb_flush_nocheck
>>> 3.6170? 39.9787??? translate-all.c:749???????? page_trylock_add
>>> 3.1188? 43.0975??? translate-all.c:803???????? page_collection_lock
>>> 3.0405? 46.1380??? exec.c:3025???????????????? iotlb_to_section
>>> 2.7044? 48.8424??? softmmu_template.h:112????? helper_ret_ldub_mmu
>>> 2.4154? 51.2578??? memory.c:1350?????????????? 
>>> memory_region_access_valid
> [...]
>> My first thought is that there is a QOM cast somewhere in a hot path 
>> on -M mac99 - can you show us the call stack information from the 
>> profile?
> 
> Not really. Oprofile that gave me this profile could not display call 
> graph for this call. I've tried again with the perf tool but I'm not 
> quite sure how to interpret its results. If I'm not mistaken it points 
> me in the direction of cpu_asidx_from_attrs, called from 
> iotlb_to_section, called from io_readx. The 
> object_class_dynamic_cast_assert call likely comes from 
> OBJECT_CLASS_CHECK, expanded from OBJECT_GET_CLASS, expanded from 
> CPU_GET_CLASS. Would that make any sense? Any idea what to do about it?

Good question. A quick grep for 'asidx_from_attrs' shows that 
cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test 
does replacing the inline function cpu_asidx_from_attrs() in 
include/qom/cpu.h with a simple "return 0" change the profile at all?


ATB,

Mark.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] Profiling results
  2018-07-17 19:57         ` Mark Cave-Ayland
@ 2018-07-17 20:46           ` BALATON Zoltan
  2018-07-17 21:53             ` Peter Maydell
  0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-17 20:46 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: Peter Maydell, qemu-ppc, qemu-devel

On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
> On 17/07/18 20:35, BALATON Zoltan wrote:
>> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>>>> MorphOS on mac99 this seems to be significant. This is with default 
>>>> configure (--enable-qom-cast-debug):
>>>> 
>>>> %?????? cum. %???? linenr info???????????????? symbol name
>>>> 9.7057?? 9.7057??? exec-all.h:410????????????? helper_lookup_tb_ptr
>>>> 8.0330? 17.7387??? object.c:711 object_class_dynamic_cast_assert
>>>> 6.9411? 24.6798??? cputlb.c:793??????????????? io_readx
>>>> 6.3219? 31.0017??? sm501_template.h:62???????? draw_line16_32
>>>> 5.3601? 36.3617??? cputlb.c:114??????????????? tlb_flush_nocheck
>>>> 3.6170? 39.9787??? translate-all.c:749???????? page_trylock_add
>>>> 3.1188? 43.0975??? translate-all.c:803???????? page_collection_lock
>>>> 3.0405? 46.1380??? exec.c:3025???????????????? iotlb_to_section
>>>> 2.7044? 48.8424??? softmmu_template.h:112????? helper_ret_ldub_mmu
>>>> 2.4154? 51.2578??? memory.c:1350?????????????? memory_region_access_valid
>> [...]
>>> My first thought is that there is a QOM cast somewhere in a hot path on -M 
>>> mac99 - can you show us the call stack information from the profile?
>> 
>> Not really. Oprofile that gave me this profile could not display call graph 
>> for this call. I've tried again with the perf tool but I'm not quite sure 
>> how to interpret its results. If I'm not mistaken it points me in the 
>> direction of cpu_asidx_from_attrs, called from iotlb_to_section, called 
>> from io_readx. The object_class_dynamic_cast_assert call likely comes from 
>> OBJECT_CLASS_CHECK, expanded from OBJECT_GET_CLASS, expanded from 
>> CPU_GET_CLASS. Would that make any sense? Any idea what to do about it?
>
> Good question. A quick grep for 'asidx_from_attrs' shows that 
> cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test does 
> replacing the inline function cpu_asidx_from_attrs() in include/qom/cpu.h 
> with a simple "return 0" change the profile at all?

It does seem to lessen its impact but it's still higher than I expected:

%        cum. %     linenr info                 symbol name
10.7949  10.7949    exec-all.h:410              helper_lookup_tb_ptr
  7.8663  18.6612    cputlb.c:793                io_readx
  6.0265  24.6878    cputlb.c:114                tlb_flush_nocheck
  4.0671  28.7548    sm501_template.h:62         draw_line16_32
  4.0559  32.8107    object.c:765                object_class_dynamic_cast_assert
  3.3780  36.1887    memory.c:1350               memory_region_access_valid
  2.8920  39.0808    qemu-thread-posix.c:61      qemu_mutex_lock_impl
  2.7187  41.7995    memory.c:1415               memory_region_dispatch_read
  2.6011  44.4006    qht.c:487                   qht_lookup_custom
  2.5356  46.9362    softmmu_template.h:112      helper_ret_ldub_mmu

Maybe it's called from somewhere else too? I know draw_line16_32 but I 
wonder where could helper_lookup_tb_ptr and tlb flushes come from? Those 
seem to be significant. And io_readx in itself seems to be too high on the 
list too. I wonder if it may have something to do with the background task 
trying to read non-implemented i2c stuff frequently (as discussed in point 
2. in http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos).

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] Profiling results
  2018-07-17 20:46           ` BALATON Zoltan
@ 2018-07-17 21:53             ` Peter Maydell
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2018-07-17 21:53 UTC (permalink / raw)
  To: BALATON Zoltan; +Cc: Mark Cave-Ayland, qemu-ppc, QEMU Developers

On 17 July 2018 at 21:46, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>> Good question. A quick grep for 'asidx_from_attrs' shows that
>> cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test does
>> replacing the inline function cpu_asidx_from_attrs() in include/qom/cpu.h
>> with a simple "return 0" change the profile at all?
>
>
> It does seem to lessen its impact but it's still higher than I expected:

It may be worth special-casing the CPU method lookups (or at
least that one) if we can, then...

> %        cum. %     linenr info                 symbol name
> 10.7949  10.7949    exec-all.h:410              helper_lookup_tb_ptr
>  7.8663  18.6612    cputlb.c:793                io_readx
>  6.0265  24.6878    cputlb.c:114                tlb_flush_nocheck
>  4.0671  28.7548    sm501_template.h:62         draw_line16_32
>  4.0559  32.8107    object.c:765
> object_class_dynamic_cast_assert
>  3.3780  36.1887    memory.c:1350               memory_region_access_valid
>  2.8920  39.0808    qemu-thread-posix.c:61      qemu_mutex_lock_impl
>  2.7187  41.7995    memory.c:1415               memory_region_dispatch_read
>  2.6011  44.4006    qht.c:487                   qht_lookup_custom
>  2.5356  46.9362    softmmu_template.h:112      helper_ret_ldub_mmu
>
> Maybe it's called from somewhere else too? I know draw_line16_32 but I
> wonder where could helper_lookup_tb_ptr and tlb flushes come from? Those
> seem to be significant. And io_readx in itself seems to be too high on the
> list too.

helper_lookup_tb_ptr is part of TCG -- it's where we look for
the next TB to go to. Any non-computed branch to a different page
will result in our calling this. So it's high on the profile
because we do it a lot, I think, but that's not necessarily a
problem as such.

io_readx is the slow path for guest memory accesses -- any
guest access to something that's not RAM will have to go through
here. My first guess (given the other things in the profile,
especially helper_ret_ldub_mmu, memory_region_dispatch_read
and memory_region_access_valid) is that the guest is in a tight
loop doing a read on a device register a lot of the time.

> I wonder if it may have something to do with the background task
> trying to read non-implemented i2c stuff frequently (as discussed in point
> 2. in http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos).

Could be, or some similar thing. If you suspect the i2c you
could try putting in an unimplemented-device stub in the
right place and see how often -d unimp yells about reads to it.

So overall I'd be a little wary of optimizing based on this
profile, because I suspect it's atypical -- the guest is sat
in a tight polling loop and the profile says "all the functions
in the code path for doing device access are really hot".
The fix is to improve our model so the guest doesn't get
stuck like that, not to try to slightly improve the speed
of device accesses (we call it the "slow path" for a reason :-))

(But places like asidx_from_attrs are likely to be on hot
paths in general, so having the QOM class lookup there be
overly heavyweight is maybe worth fixing anyhow.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-17 21:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-16 19:47 [Qemu-devel] [PATCH] Do not enable QOM debugging by default BALATON Zoltan
2018-07-16 20:41 ` Peter Maydell
2018-07-16 21:08   ` BALATON Zoltan
2018-07-17  9:40   ` Daniel P. Berrangé
2018-07-17 15:09   ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
2018-07-17 17:09     ` [Qemu-devel] [Qemu-ppc] Profiling results Mark Cave-Ayland
2018-07-17 19:35       ` BALATON Zoltan
2018-07-17 19:57         ` Mark Cave-Ayland
2018-07-17 20:46           ` BALATON Zoltan
2018-07-17 21:53             ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).