* [Qemu-devel] [PATCH] Do not enable QOM debugging by default
@ 2018-07-16 19:47 BALATON Zoltan
2018-07-16 20:41 ` Peter Maydell
0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-16 19:47 UTC (permalink / raw)
To: qemu-devel, qemu-trivial; +Cc: Paolo Bonzini
Commit 3556c233d 5 years ago added an option to disable QOM debugging,
noting that it can have high performance cost but left the default to on.
Change the default to off and only enable it when debugging is requested
to avoid the performance penalty for those compiling without debug options.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
---
configure | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/configure b/configure
index 2a7796e..a8fc1ed 100755
--- a/configure
+++ b/configure
@@ -415,7 +415,7 @@ bsd_user="no"
blobs="yes"
pkgversion=""
pie=""
-qom_cast_debug="yes"
+qom_cast_debug="no"
trace_backends="log"
trace_file="trace"
spice=""
@@ -1062,6 +1062,7 @@ for opt do
debug_tcg="yes"
debug_mutex="yes"
debug="yes"
+ qom_cast_debug="yes"
strip_opt="no"
fortify_source="no"
;;
--
2.7.6
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH] Do not enable QOM debugging by default
2018-07-16 19:47 [Qemu-devel] [PATCH] Do not enable QOM debugging by default BALATON Zoltan
@ 2018-07-16 20:41 ` Peter Maydell
2018-07-16 21:08 ` BALATON Zoltan
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Peter Maydell @ 2018-07-16 20:41 UTC (permalink / raw)
To: BALATON Zoltan; +Cc: QEMU Developers, QEMU Trivial, Paolo Bonzini
On 16 July 2018 at 20:47, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> Commit 3556c233d 5 years ago added an option to disable QOM debugging,
> noting that it can have high performance cost but left the default to on.
> Change the default to off and only enable it when debugging is requested
> to avoid the performance penalty for those compiling without debug options.
>
> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Looking at the code, doesn't this disable all the type checking
on the QOM casts? That is not something I think we should drop,
in the same way that we do not disable assert()s even in the
non-debug build.
Is this coming up as significant in profiling? In the past we've
done some special casing if there are particular QOM casts in
hot paths that use significant runtime.
thanks
-- PMM
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH] Do not enable QOM debugging by default
2018-07-16 20:41 ` Peter Maydell
@ 2018-07-16 21:08 ` BALATON Zoltan
2018-07-17 9:40 ` Daniel P. Berrangé
2018-07-17 15:09 ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
2 siblings, 0 replies; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-16 21:08 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, QEMU Trivial, Paolo Bonzini
On Mon, 16 Jul 2018, Peter Maydell wrote:
> On 16 July 2018 at 20:47, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>> Commit 3556c233d 5 years ago added an option to disable QOM debugging,
>> noting that it can have high performance cost but left the default to on.
>> Change the default to off and only enable it when debugging is requested
>> to avoid the performance penalty for those compiling without debug options.
>>
>> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
>
> Looking at the code, doesn't this disable all the type checking
> on the QOM casts? That is not something I think we should drop,
> in the same way that we do not disable assert()s even in the
> non-debug build.
>
> Is this coming up as significant in profiling? In the past we've
> done some special casing if there are particular QOM casts in
> hot paths that use significant runtime.
My profiling results are not very accurate (could only gather sample based
results with oprofile as --enable-gprof haven't worked for me) so I don't
have numbers but I did see object_class_dynamic_cast_assert coming out
among the top few without disabling this option but I don't know which
call sites are responsible.
Are there any recommendations for profiling QEMU (especially TCG)
somewhere that I should be aware of?
Regards,
BALATON Zoltan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH] Do not enable QOM debugging by default
2018-07-16 20:41 ` Peter Maydell
2018-07-16 21:08 ` BALATON Zoltan
@ 2018-07-17 9:40 ` Daniel P. Berrangé
2018-07-17 15:09 ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
2 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2018-07-17 9:40 UTC (permalink / raw)
To: Peter Maydell
Cc: BALATON Zoltan, QEMU Trivial, Paolo Bonzini, QEMU Developers
On Mon, Jul 16, 2018 at 09:41:12PM +0100, Peter Maydell wrote:
> On 16 July 2018 at 20:47, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> > Commit 3556c233d 5 years ago added an option to disable QOM debugging,
> > noting that it can have high performance cost but left the default to on.
> > Change the default to off and only enable it when debugging is requested
> > to avoid the performance penalty for those compiling without debug options.
> >
> > Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
>
> Looking at the code, doesn't this disable all the type checking
> on the QOM casts? That is not something I think we should drop,
> in the same way that we do not disable assert()s even in the
> non-debug build.
>
> Is this coming up as significant in profiling? In the past we've
> done some special casing if there are particular QOM casts in
> hot paths that use significant runtime.
There might also be scope for doing optimization of the
object_dynamic_cast_assert() implementation too.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default)
2018-07-16 20:41 ` Peter Maydell
2018-07-16 21:08 ` BALATON Zoltan
2018-07-17 9:40 ` Daniel P. Berrangé
@ 2018-07-17 15:09 ` BALATON Zoltan
2018-07-17 17:09 ` [Qemu-devel] [Qemu-ppc] Profiling results Mark Cave-Ayland
2 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-17 15:09 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, qemu-ppc
On Mon, 16 Jul 2018, Peter Maydell wrote:
> Is this coming up as significant in profiling? In the past we've
This seems to depend on the workload. From the cases I'm interested in
AROS and AmigaOS on qemu-system-ppc -M sam460ex does not seem to be
effected much (object_class_dynamic_cast_assert is not in top 10 with <2%)
but for MorphOS on mac99 this seems to be significant. This is with
default configure (--enable-qom-cast-debug):
% cum. % linenr info symbol name
9.7057 9.7057 exec-all.h:410 helper_lookup_tb_ptr
8.0330 17.7387 object.c:711 object_class_dynamic_cast_assert
6.9411 24.6798 cputlb.c:793 io_readx
6.3219 31.0017 sm501_template.h:62 draw_line16_32
5.3601 36.3617 cputlb.c:114 tlb_flush_nocheck
3.6170 39.9787 translate-all.c:749 page_trylock_add
3.1188 43.0975 translate-all.c:803 page_collection_lock
3.0405 46.1380 exec.c:3025 iotlb_to_section
2.7044 48.8424 softmmu_template.h:112 helper_ret_ldub_mmu
2.4154 51.2578 memory.c:1350 memory_region_access_valid
and improves a bit (but not much) with --disable-qom-cast-debug
% cum. % linenr info symbol name
10.2063 10.2063 exec-all.h:410 helper_lookup_tb_ptr
7.1581 17.3644 object.c:711 object_class_dynamic_cast_assert
5.9297 23.2941 sm501_template.h:62 draw_line16_32
5.9227 29.2168 cputlb.c:793 io_readx
5.3030 34.5198 cputlb.c:114 tlb_flush_nocheck
3.6445 38.1643 memory.c:1350 memory_region_access_valid
3.5499 41.7142 softmmu_template.h:112 helper_ret_ldub_mmu
3.0383 44.7525 translate-all.c:803 page_collection_lock
2.9735 47.7260 memory.c:1415 memory_region_dispatch_read
2.9503 50.6763 translate-all.c:749 page_trylock_add
But the workloads may not have been 100% identical so this is not
conclusive, maybe this debug code is not that expensive at the moment.
AROS on sam460ex has a different profile:
% cum. % linenr info symbol name
8.9905 8.9905 translate-all.c:749 page_trylock_add
8.7658 17.7563 exec-all.h:410 helper_lookup_tb_ptr
7.7349 25.4911 translate-all.c:803 page_collection_lock
5.8246 31.3158 cputlb.c:924 victim_tlb_hit
3.1640 34.4797 cpus.c:347 cpu_get_clock
3.1538 37.6335 translate-all.c:788 tb_page_addr_cmp
2.7969 40.4304 exec.c:435 address_space_translate_internal
2.6647 43.0951 memory.c:571 access_with_adjusted_size
2.0615 45.1567 exec.c:569 flatview_do_translate
1.9586 47.1153 memory.c:1350 memory_region_access_valid
Would anyone be able to guess what are the places that should be looked at
or what to check to get more info on this?
Regards,
BALATON Zoltan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] Profiling results
2018-07-17 15:09 ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
@ 2018-07-17 17:09 ` Mark Cave-Ayland
2018-07-17 19:35 ` BALATON Zoltan
0 siblings, 1 reply; 10+ messages in thread
From: Mark Cave-Ayland @ 2018-07-17 17:09 UTC (permalink / raw)
To: BALATON Zoltan, Peter Maydell; +Cc: qemu-ppc, qemu-devel
On 17/07/18 16:09, BALATON Zoltan wrote:
> On Mon, 16 Jul 2018, Peter Maydell wrote:
>> Is this coming up as significant in profiling? In the past we've
>
> This seems to depend on the workload. From the cases I'm interested in
> AROS and AmigaOS on qemu-system-ppc -M sam460ex does not seem to be
> effected much (object_class_dynamic_cast_assert is not in top 10 with
> <2%) but for MorphOS on mac99 this seems to be significant. This is with
> default configure (--enable-qom-cast-debug):
>
> % cum. % linenr info symbol name
> 9.7057 9.7057 exec-all.h:410 helper_lookup_tb_ptr
> 8.0330 17.7387 object.c:711
> object_class_dynamic_cast_assert
> 6.9411 24.6798 cputlb.c:793 io_readx
> 6.3219 31.0017 sm501_template.h:62 draw_line16_32
> 5.3601 36.3617 cputlb.c:114 tlb_flush_nocheck
> 3.6170 39.9787 translate-all.c:749 page_trylock_add
> 3.1188 43.0975 translate-all.c:803 page_collection_lock
> 3.0405 46.1380 exec.c:3025 iotlb_to_section
> 2.7044 48.8424 softmmu_template.h:112 helper_ret_ldub_mmu
> 2.4154 51.2578 memory.c:1350 memory_region_access_valid
>
> and improves a bit (but not much) with --disable-qom-cast-debug
>
> % cum. % linenr info symbol name
> 10.2063 10.2063 exec-all.h:410 helper_lookup_tb_ptr
> 7.1581 17.3644 object.c:711
> object_class_dynamic_cast_assert
> 5.9297 23.2941 sm501_template.h:62 draw_line16_32
> 5.9227 29.2168 cputlb.c:793 io_readx
> 5.3030 34.5198 cputlb.c:114 tlb_flush_nocheck
> 3.6445 38.1643 memory.c:1350 memory_region_access_valid
> 3.5499 41.7142 softmmu_template.h:112 helper_ret_ldub_mmu
> 3.0383 44.7525 translate-all.c:803 page_collection_lock
> 2.9735 47.7260 memory.c:1415
> memory_region_dispatch_read
> 2.9503 50.6763 translate-all.c:749 page_trylock_add
>
> But the workloads may not have been 100% identical so this is not
> conclusive, maybe this debug code is not that expensive at the moment.
>
> AROS on sam460ex has a different profile:
>
> % cum. % linenr info symbol name
> 8.9905 8.9905 translate-all.c:749 page_trylock_add
> 8.7658 17.7563 exec-all.h:410 helper_lookup_tb_ptr
> 7.7349 25.4911 translate-all.c:803 page_collection_lock
> 5.8246 31.3158 cputlb.c:924 victim_tlb_hit
> 3.1640 34.4797 cpus.c:347 cpu_get_clock
> 3.1538 37.6335 translate-all.c:788 tb_page_addr_cmp
> 2.7969 40.4304 exec.c:435
> address_space_translate_internal
> 2.6647 43.0951 memory.c:571 access_with_adjusted_size
> 2.0615 45.1567 exec.c:569 flatview_do_translate
> 1.9586 47.1153 memory.c:1350 memory_region_access_valid
>
> Would anyone be able to guess what are the places that should be looked
> at or what to check to get more info on this?
My first thought is that there is a QOM cast somewhere in a hot path on
-M mac99 - can you show us the call stack information from the profile?
I had a similar issue with SPARC32 whereby each DMA request needs to be
manually word-swapped, so instead of adding the QOM cast into these
routines I did a direct C cast from the opaque to ensure that the
overhead was as little as possible.
ATB,
Mark.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] Profiling results
2018-07-17 17:09 ` [Qemu-devel] [Qemu-ppc] Profiling results Mark Cave-Ayland
@ 2018-07-17 19:35 ` BALATON Zoltan
2018-07-17 19:57 ` Mark Cave-Ayland
0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-17 19:35 UTC (permalink / raw)
To: Mark Cave-Ayland; +Cc: Peter Maydell, qemu-ppc, qemu-devel
On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>> MorphOS on mac99 this seems to be significant. This is with default
>> configure (--enable-qom-cast-debug):
>>
>> %?????? cum. %???? linenr info???????????????? symbol name
>> 9.7057?? 9.7057??? exec-all.h:410????????????? helper_lookup_tb_ptr
>> 8.0330? 17.7387??? object.c:711
>> object_class_dynamic_cast_assert
>> 6.9411? 24.6798??? cputlb.c:793??????????????? io_readx
>> 6.3219? 31.0017??? sm501_template.h:62???????? draw_line16_32
>> 5.3601? 36.3617??? cputlb.c:114??????????????? tlb_flush_nocheck
>> 3.6170? 39.9787??? translate-all.c:749???????? page_trylock_add
>> 3.1188? 43.0975??? translate-all.c:803???????? page_collection_lock
>> 3.0405? 46.1380??? exec.c:3025???????????????? iotlb_to_section
>> 2.7044? 48.8424??? softmmu_template.h:112????? helper_ret_ldub_mmu
>> 2.4154? 51.2578??? memory.c:1350?????????????? memory_region_access_valid
[...]
> My first thought is that there is a QOM cast somewhere in a hot path on -M
> mac99 - can you show us the call stack information from the profile?
Not really. Oprofile that gave me this profile could not display call
graph for this call. I've tried again with the perf tool but I'm not quite
sure how to interpret its results. If I'm not mistaken it points me in the
direction of cpu_asidx_from_attrs, called from iotlb_to_section, called
from io_readx. The object_class_dynamic_cast_assert call likely comes from
OBJECT_CLASS_CHECK, expanded from OBJECT_GET_CLASS, expanded from
CPU_GET_CLASS. Would that make any sense? Any idea what to do about it?
Thank you,
BALATON Zoltan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] Profiling results
2018-07-17 19:35 ` BALATON Zoltan
@ 2018-07-17 19:57 ` Mark Cave-Ayland
2018-07-17 20:46 ` BALATON Zoltan
0 siblings, 1 reply; 10+ messages in thread
From: Mark Cave-Ayland @ 2018-07-17 19:57 UTC (permalink / raw)
To: BALATON Zoltan; +Cc: Peter Maydell, qemu-ppc, qemu-devel
On 17/07/18 20:35, BALATON Zoltan wrote:
> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>>> MorphOS on mac99 this seems to be significant. This is with default
>>> configure (--enable-qom-cast-debug):
>>>
>>> %?????? cum. %???? linenr info???????????????? symbol name
>>> 9.7057?? 9.7057??? exec-all.h:410????????????? helper_lookup_tb_ptr
>>> 8.0330? 17.7387??? object.c:711 object_class_dynamic_cast_assert
>>> 6.9411? 24.6798??? cputlb.c:793??????????????? io_readx
>>> 6.3219? 31.0017??? sm501_template.h:62???????? draw_line16_32
>>> 5.3601? 36.3617??? cputlb.c:114??????????????? tlb_flush_nocheck
>>> 3.6170? 39.9787??? translate-all.c:749???????? page_trylock_add
>>> 3.1188? 43.0975??? translate-all.c:803???????? page_collection_lock
>>> 3.0405? 46.1380??? exec.c:3025???????????????? iotlb_to_section
>>> 2.7044? 48.8424??? softmmu_template.h:112????? helper_ret_ldub_mmu
>>> 2.4154? 51.2578??? memory.c:1350??????????????
>>> memory_region_access_valid
> [...]
>> My first thought is that there is a QOM cast somewhere in a hot path
>> on -M mac99 - can you show us the call stack information from the
>> profile?
>
> Not really. Oprofile that gave me this profile could not display call
> graph for this call. I've tried again with the perf tool but I'm not
> quite sure how to interpret its results. If I'm not mistaken it points
> me in the direction of cpu_asidx_from_attrs, called from
> iotlb_to_section, called from io_readx. The
> object_class_dynamic_cast_assert call likely comes from
> OBJECT_CLASS_CHECK, expanded from OBJECT_GET_CLASS, expanded from
> CPU_GET_CLASS. Would that make any sense? Any idea what to do about it?
Good question. A quick grep for 'asidx_from_attrs' shows that
cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test
does replacing the inline function cpu_asidx_from_attrs() in
include/qom/cpu.h with a simple "return 0" change the profile at all?
ATB,
Mark.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] Profiling results
2018-07-17 19:57 ` Mark Cave-Ayland
@ 2018-07-17 20:46 ` BALATON Zoltan
2018-07-17 21:53 ` Peter Maydell
0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2018-07-17 20:46 UTC (permalink / raw)
To: Mark Cave-Ayland; +Cc: Peter Maydell, qemu-ppc, qemu-devel
On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
> On 17/07/18 20:35, BALATON Zoltan wrote:
>> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>>>> MorphOS on mac99 this seems to be significant. This is with default
>>>> configure (--enable-qom-cast-debug):
>>>>
>>>> %?????? cum. %???? linenr info???????????????? symbol name
>>>> 9.7057?? 9.7057??? exec-all.h:410????????????? helper_lookup_tb_ptr
>>>> 8.0330? 17.7387??? object.c:711 object_class_dynamic_cast_assert
>>>> 6.9411? 24.6798??? cputlb.c:793??????????????? io_readx
>>>> 6.3219? 31.0017??? sm501_template.h:62???????? draw_line16_32
>>>> 5.3601? 36.3617??? cputlb.c:114??????????????? tlb_flush_nocheck
>>>> 3.6170? 39.9787??? translate-all.c:749???????? page_trylock_add
>>>> 3.1188? 43.0975??? translate-all.c:803???????? page_collection_lock
>>>> 3.0405? 46.1380??? exec.c:3025???????????????? iotlb_to_section
>>>> 2.7044? 48.8424??? softmmu_template.h:112????? helper_ret_ldub_mmu
>>>> 2.4154? 51.2578??? memory.c:1350?????????????? memory_region_access_valid
>> [...]
>>> My first thought is that there is a QOM cast somewhere in a hot path on -M
>>> mac99 - can you show us the call stack information from the profile?
>>
>> Not really. Oprofile that gave me this profile could not display call graph
>> for this call. I've tried again with the perf tool but I'm not quite sure
>> how to interpret its results. If I'm not mistaken it points me in the
>> direction of cpu_asidx_from_attrs, called from iotlb_to_section, called
>> from io_readx. The object_class_dynamic_cast_assert call likely comes from
>> OBJECT_CLASS_CHECK, expanded from OBJECT_GET_CLASS, expanded from
>> CPU_GET_CLASS. Would that make any sense? Any idea what to do about it?
>
> Good question. A quick grep for 'asidx_from_attrs' shows that
> cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test does
> replacing the inline function cpu_asidx_from_attrs() in include/qom/cpu.h
> with a simple "return 0" change the profile at all?
It does seem to lessen its impact but it's still higher than I expected:
% cum. % linenr info symbol name
10.7949 10.7949 exec-all.h:410 helper_lookup_tb_ptr
7.8663 18.6612 cputlb.c:793 io_readx
6.0265 24.6878 cputlb.c:114 tlb_flush_nocheck
4.0671 28.7548 sm501_template.h:62 draw_line16_32
4.0559 32.8107 object.c:765 object_class_dynamic_cast_assert
3.3780 36.1887 memory.c:1350 memory_region_access_valid
2.8920 39.0808 qemu-thread-posix.c:61 qemu_mutex_lock_impl
2.7187 41.7995 memory.c:1415 memory_region_dispatch_read
2.6011 44.4006 qht.c:487 qht_lookup_custom
2.5356 46.9362 softmmu_template.h:112 helper_ret_ldub_mmu
Maybe it's called from somewhere else too? I know draw_line16_32 but I
wonder where could helper_lookup_tb_ptr and tlb flushes come from? Those
seem to be significant. And io_readx in itself seems to be too high on the
list too. I wonder if it may have something to do with the background task
trying to read non-implemented i2c stuff frequently (as discussed in point
2. in http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos).
Regards,
BALATON Zoltan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] Profiling results
2018-07-17 20:46 ` BALATON Zoltan
@ 2018-07-17 21:53 ` Peter Maydell
0 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2018-07-17 21:53 UTC (permalink / raw)
To: BALATON Zoltan; +Cc: Mark Cave-Ayland, qemu-ppc, QEMU Developers
On 17 July 2018 at 21:46, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>> Good question. A quick grep for 'asidx_from_attrs' shows that
>> cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test does
>> replacing the inline function cpu_asidx_from_attrs() in include/qom/cpu.h
>> with a simple "return 0" change the profile at all?
>
>
> It does seem to lessen its impact but it's still higher than I expected:
It may be worth special-casing the CPU method lookups (or at
least that one) if we can, then...
> % cum. % linenr info symbol name
> 10.7949 10.7949 exec-all.h:410 helper_lookup_tb_ptr
> 7.8663 18.6612 cputlb.c:793 io_readx
> 6.0265 24.6878 cputlb.c:114 tlb_flush_nocheck
> 4.0671 28.7548 sm501_template.h:62 draw_line16_32
> 4.0559 32.8107 object.c:765
> object_class_dynamic_cast_assert
> 3.3780 36.1887 memory.c:1350 memory_region_access_valid
> 2.8920 39.0808 qemu-thread-posix.c:61 qemu_mutex_lock_impl
> 2.7187 41.7995 memory.c:1415 memory_region_dispatch_read
> 2.6011 44.4006 qht.c:487 qht_lookup_custom
> 2.5356 46.9362 softmmu_template.h:112 helper_ret_ldub_mmu
>
> Maybe it's called from somewhere else too? I know draw_line16_32 but I
> wonder where could helper_lookup_tb_ptr and tlb flushes come from? Those
> seem to be significant. And io_readx in itself seems to be too high on the
> list too.
helper_lookup_tb_ptr is part of TCG -- it's where we look for
the next TB to go to. Any non-computed branch to a different page
will result in our calling this. So it's high on the profile
because we do it a lot, I think, but that's not necessarily a
problem as such.
io_readx is the slow path for guest memory accesses -- any
guest access to something that's not RAM will have to go through
here. My first guess (given the other things in the profile,
especially helper_ret_ldub_mmu, memory_region_dispatch_read
and memory_region_access_valid) is that the guest is in a tight
loop doing a read on a device register a lot of the time.
> I wonder if it may have something to do with the background task
> trying to read non-implemented i2c stuff frequently (as discussed in point
> 2. in http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos).
Could be, or some similar thing. If you suspect the i2c you
could try putting in an unimplemented-device stub in the
right place and see how often -d unimp yells about reads to it.
So overall I'd be a little wary of optimizing based on this
profile, because I suspect it's atypical -- the guest is sat
in a tight polling loop and the profile says "all the functions
in the code path for doing device access are really hot".
The fix is to improve our model so the guest doesn't get
stuck like that, not to try to slightly improve the speed
of device accesses (we call it the "slow path" for a reason :-))
(But places like asidx_from_attrs are likely to be on hot
paths in general, so having the QOM class lookup there be
overly heavyweight is maybe worth fixing anyhow.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-07-17 21:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-16 19:47 [Qemu-devel] [PATCH] Do not enable QOM debugging by default BALATON Zoltan
2018-07-16 20:41 ` Peter Maydell
2018-07-16 21:08 ` BALATON Zoltan
2018-07-17 9:40 ` Daniel P. Berrangé
2018-07-17 15:09 ` [Qemu-devel] Profiling results (was: Re: [PATCH] Do not enable QOM debugging by default) BALATON Zoltan
2018-07-17 17:09 ` [Qemu-devel] [Qemu-ppc] Profiling results Mark Cave-Ayland
2018-07-17 19:35 ` BALATON Zoltan
2018-07-17 19:57 ` Mark Cave-Ayland
2018-07-17 20:46 ` BALATON Zoltan
2018-07-17 21:53 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).