* [Qemu-devel] Profiling sparc64 emulation
@ 2013-05-08 21:02 Artyom Tarasenko
2013-05-09 18:30 ` Aurelien Jarno
0 siblings, 1 reply; 6+ messages in thread
From: Artyom Tarasenko @ 2013-05-08 21:02 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel, Torbjorn Granlund
On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
>> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <tg@gmplib.org> wrote:
>> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all
>> > emulated systems I am using, most of which are qemu-based.
>>
>> Do I read it correct that qemu-system-ppc64 with the slowdown factor
>> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
>> factor of 96 ?
>> Do they both use Debian Wheezy guest? You have a remark that ppc64 has
>> problems with its clock. Was it taken into account when the slowdown
>> factors were calculated?
>>
>
> Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
> slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
> some type of load like perl scripts.
That's interesting. Actually it should be possible to lauch perl under user
mode qemu-sparc32plus. Is it possible to launch perl under user mode
qemu-ppc{32,64} too?
That would allow to understand whether the bad performance has to do
with TCG or the rest of the system emulation.
Artyom
--
Regards,
Artyom Tarasenko
linux/sparc and solaris/sparc under qemu blog:
http://tyom.blogspot.com/search/label/qemu
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Profiling sparc64 emulation 2013-05-08 21:02 [Qemu-devel] Profiling sparc64 emulation Artyom Tarasenko @ 2013-05-09 18:30 ` Aurelien Jarno 2013-05-09 19:14 ` Paolo Bonzini 2013-05-09 20:11 ` Artyom Tarasenko 0 siblings, 2 replies; 6+ messages in thread From: Aurelien Jarno @ 2013-05-09 18:30 UTC (permalink / raw) To: Artyom Tarasenko; +Cc: qemu-devel, Torbjorn Granlund On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote: > On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <aurelien@aurel32.net> wrote: > > On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote: > >> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <tg@gmplib.org> wrote: > >> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all > >> > emulated systems I am using, most of which are qemu-based. > >> > >> Do I read it correct that qemu-system-ppc64 with the slowdown factor > >> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown > >> factor of 96 ? > >> Do they both use Debian Wheezy guest? You have a remark that ppc64 has > >> problems with its clock. Was it taken into account when the slowdown > >> factors were calculated? > >> > > > > Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly > > slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on > > some type of load like perl scripts. > > That's interesting. Actually it should be possible to lauch perl under user > mode qemu-sparc32plus. Is it possible to launch perl under user mode > qemu-ppc{32,64} too? > > That would allow to understand whether the bad performance has to do > with TCG or the rest of the system emulation. I haven't done that yet, but I have run perf top while running perl script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The results are quite different: qemu-system-ppc64 ----------------- 49,73% perf-10672.map [.] 0x7f7853ab4e0f 13,23% qemu-system-ppc64 [.] cpu_ppc_exec 13,16% libglib-2.0.so.0.3200.4 [.] g_hash_table_lookup 8,18% libglib-2.0.so.0.3200.4 [.] g_str_hash 2,47% qemu-system-ppc64 [.] object_class_dynamic_cast 1,97% qemu-system-ppc64 [.] type_is_ancestor 1,05% libglib-2.0.so.0.3200.4 [.] g_str_equal 0,91% qemu-system-ppc64 [.] ppc_cpu_do_interrupt 0,90% qemu-system-ppc64 [.] object_dynamic_cast_assert 0,79% libc-2.13.so [.] __sigsetjmp 0,62% qemu-system-ppc64 [.] type_get_parent.isra.3 0,58% qemu-system-ppc64 [.] type_get_by_name 0,57% qemu-system-ppc64 [.] qemu_log_mask 0,54% qemu-system-ppc64 [.] object_dynamic_cast qemu-system-sparc64 ------------------- 17,43% perf-8154.map [.] 0x7f6ac10245c8 10,46% qemu-system-sparc64 [.] tcg_optimize 10,36% qemu-system-sparc64 [.] cpu_sparc_exec 6,35% qemu-system-sparc64 [.] tb_flush_jmp_cache 4,75% qemu-system-sparc64 [.] get_physical_address_data 4,45% qemu-system-sparc64 [.] tcg_liveness_analysis 4,35% qemu-system-sparc64 [.] tcg_reg_alloc_op 2,90% qemu-system-sparc64 [.] tlb_flush_page 2,35% qemu-system-sparc64 [.] disas_sparc_insn 2,28% qemu-system-sparc64 [.] get_physical_address_code 2,21% qemu-system-sparc64 [.] tlb_flush 1,64% qemu-system-sparc64 [.] tcg_out_opc 1,22% qemu-system-sparc64 [.] tcg_out_modrm_sib_offset.constprop.41 1,20% qemu-system-sparc64 [.] helper_ld_asi 1,14% qemu-system-sparc64 [.] gen_intermediate_code_pc 1,04% qemu-system-sparc64 [.] helper_st_asi 1,00% qemu-system-sparc64 [.] object_class_dynamic_cast 0,98% qemu-system-sparc64 [.] tb_find_pc 0,94% qemu-system-sparc64 [.] get_page_addr_code 0,92% qemu-system-sparc64 [.] tcg_gen_code_search_pc 0,91% qemu-system-sparc64 [.] tlb_set_page 0,83% qemu-system-sparc64 [.] reset_temp 0,82% qemu-system-sparc64 [.] tcg_reg_alloc_start The perf-xxxx.map correspond to the code execution. As you can see it's a lot lower on sparc, while a lot of smaller code generation/mmu code appears. It's seems that the optimizations have to be focused on the system part, not the TCG part, at least for now. A quick look at the MMU seems to show some performance issue here, due to the split code/data MMU on SPARC64, while the QEMU TLB is a joint one. As a consequence one can see a lot of ping pong, setting a given page to read or read/write, then execution, and later read or read/write again. My guess is that it's related to constants table in the same page than the code. It should also be noted that the tcg_optimize starts to take a non-negligible time, in both cases. The code grew up quite a lot recently, and it might be time to rework it. It's nice to have optimized code, but not if the gain is lower than the optimization time. I am also surprised to see glib code that high on the qemu-system-ppc64 perf report. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Profiling sparc64 emulation 2013-05-09 18:30 ` Aurelien Jarno @ 2013-05-09 19:14 ` Paolo Bonzini 2013-05-10 14:22 ` Anthony Liguori 2013-05-09 20:11 ` Artyom Tarasenko 1 sibling, 1 reply; 6+ messages in thread From: Paolo Bonzini @ 2013-05-09 19:14 UTC (permalink / raw) To: Aurelien Jarno; +Cc: qemu-devel, Artyom Tarasenko, Torbjorn Granlund Il 09/05/2013 20:30, Aurelien Jarno ha scritto: > 13,16% libglib-2.0.so.0.3200.4 [.] g_hash_table_lookup > 8,18% libglib-2.0.so.0.3200.4 [.] g_str_hash > 2,47% qemu-system-ppc64 [.] object_class_dynamic_cast > 1,97% qemu-system-ppc64 [.] type_is_ancestor That's worrisome, but should be easy to fix... can you make a callgraph profile? Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Profiling sparc64 emulation 2013-05-09 19:14 ` Paolo Bonzini @ 2013-05-10 14:22 ` Anthony Liguori 0 siblings, 0 replies; 6+ messages in thread From: Anthony Liguori @ 2013-05-10 14:22 UTC (permalink / raw) To: Paolo Bonzini, Aurelien Jarno Cc: qemu-devel, Artyom Tarasenko, Torbjorn Granlund Paolo Bonzini <pbonzini@redhat.com> writes: > Il 09/05/2013 20:30, Aurelien Jarno ha scritto: >> 13,16% libglib-2.0.so.0.3200.4 [.] g_hash_table_lookup >> 8,18% libglib-2.0.so.0.3200.4 [.] g_str_hash >> 2,47% qemu-system-ppc64 [.] object_class_dynamic_cast >> 1,97% qemu-system-ppc64 [.] type_is_ancestor > > That's worrisome, but should be easy to fix... can you make a callgraph > profile? So percentage of a profiling run doesn't imply a performance regression. Are there real performance numbers here? Regards, Anthony Liguori > > Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Profiling sparc64 emulation 2013-05-09 18:30 ` Aurelien Jarno 2013-05-09 19:14 ` Paolo Bonzini @ 2013-05-09 20:11 ` Artyom Tarasenko 2013-05-09 20:53 ` Richard Henderson 1 sibling, 1 reply; 6+ messages in thread From: Artyom Tarasenko @ 2013-05-09 20:11 UTC (permalink / raw) To: Aurelien Jarno; +Cc: qemu-devel, Torbjorn Granlund On Thu, May 9, 2013 at 8:30 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: > On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote: >> On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <aurelien@aurel32.net> wrote: >> > On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote: >> >> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <tg@gmplib.org> wrote: >> >> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all >> >> > emulated systems I am using, most of which are qemu-based. >> >> >> >> Do I read it correct that qemu-system-ppc64 with the slowdown factor >> >> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown >> >> factor of 96 ? >> >> Do they both use Debian Wheezy guest? You have a remark that ppc64 has >> >> problems with its clock. Was it taken into account when the slowdown >> >> factors were calculated? >> >> >> > >> > Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly >> > slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on >> > some type of load like perl scripts. >> >> That's interesting. Actually it should be possible to lauch perl under user >> mode qemu-sparc32plus. Is it possible to launch perl under user mode >> qemu-ppc{32,64} too? >> >> That would allow to understand whether the bad performance has to do >> with TCG or the rest of the system emulation. > > I haven't done that yet, but I have run perf top while running perl > script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The > results are quite different: > > qemu-system-ppc64 > ----------------- > 49,73% perf-10672.map [.] 0x7f7853ab4e0f > 13,23% qemu-system-ppc64 [.] cpu_ppc_exec > 13,16% libglib-2.0.so.0.3200.4 [.] g_hash_table_lookup > 8,18% libglib-2.0.so.0.3200.4 [.] g_str_hash > 2,47% qemu-system-ppc64 [.] object_class_dynamic_cast > 1,97% qemu-system-ppc64 [.] type_is_ancestor > 1,05% libglib-2.0.so.0.3200.4 [.] g_str_equal > 0,91% qemu-system-ppc64 [.] ppc_cpu_do_interrupt > 0,90% qemu-system-ppc64 [.] object_dynamic_cast_assert > 0,79% libc-2.13.so [.] __sigsetjmp > 0,62% qemu-system-ppc64 [.] type_get_parent.isra.3 > 0,58% qemu-system-ppc64 [.] type_get_by_name > 0,57% qemu-system-ppc64 [.] qemu_log_mask > 0,54% qemu-system-ppc64 [.] object_dynamic_cast > > qemu-system-sparc64 > ------------------- > 17,43% perf-8154.map [.] 0x7f6ac10245c8 > 10,46% qemu-system-sparc64 [.] tcg_optimize > 10,36% qemu-system-sparc64 [.] cpu_sparc_exec > 6,35% qemu-system-sparc64 [.] tb_flush_jmp_cache > 4,75% qemu-system-sparc64 [.] get_physical_address_data > 4,45% qemu-system-sparc64 [.] tcg_liveness_analysis > 4,35% qemu-system-sparc64 [.] tcg_reg_alloc_op > 2,90% qemu-system-sparc64 [.] tlb_flush_page > 2,35% qemu-system-sparc64 [.] disas_sparc_insn > 2,28% qemu-system-sparc64 [.] get_physical_address_code > 2,21% qemu-system-sparc64 [.] tlb_flush > 1,64% qemu-system-sparc64 [.] tcg_out_opc > 1,22% qemu-system-sparc64 [.] tcg_out_modrm_sib_offset.constprop.41 > 1,20% qemu-system-sparc64 [.] helper_ld_asi > 1,14% qemu-system-sparc64 [.] gen_intermediate_code_pc > 1,04% qemu-system-sparc64 [.] helper_st_asi > 1,00% qemu-system-sparc64 [.] object_class_dynamic_cast > 0,98% qemu-system-sparc64 [.] tb_find_pc > 0,94% qemu-system-sparc64 [.] get_page_addr_code > 0,92% qemu-system-sparc64 [.] tcg_gen_code_search_pc > 0,91% qemu-system-sparc64 [.] tlb_set_page > 0,83% qemu-system-sparc64 [.] reset_temp > 0,82% qemu-system-sparc64 [.] tcg_reg_alloc_start > > > The perf-xxxx.map correspond to the code execution. As you can see it's > a lot lower on sparc, while a lot of smaller code generation/mmu code > appears. It's seems that the optimizations have to be focused on the > system part, not the TCG part, at least for now. > > A quick look at the MMU seems to show some performance issue here, due > to the split code/data MMU on SPARC64, while the QEMU TLB is a joint > one. As a consequence one can see a lot of ping pong, setting a given > page to read or read/write, then execution, and later read or read/write > again. My guess is that it's related to constants table in the same page > than the code. > > It should also be noted that the tcg_optimize starts to take a > non-negligible time, in both cases. The code grew up quite a lot > recently, and it might be time to rework it. It's nice to have optimized > code, but not if the gain is lower than the optimization time. Is it possible to disable some optimisations, or the whole optimisation completely? I see no command line switches for that. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Profiling sparc64 emulation 2013-05-09 20:11 ` Artyom Tarasenko @ 2013-05-09 20:53 ` Richard Henderson 0 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2013-05-09 20:53 UTC (permalink / raw) To: Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno, Torbjorn Granlund On 05/09/2013 01:11 PM, Artyom Tarasenko wrote: > Is it possible to disable some optimisations, or the whole > optimisation completely? > I see no command line switches for that. No command-line switches. See the top lines of tcg/tcg.c for compile-time disabling. r~ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-10 14:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-08 21:02 [Qemu-devel] Profiling sparc64 emulation Artyom Tarasenko 2013-05-09 18:30 ` Aurelien Jarno 2013-05-09 19:14 ` Paolo Bonzini 2013-05-10 14:22 ` Anthony Liguori 2013-05-09 20:11 ` Artyom Tarasenko 2013-05-09 20:53 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).