[Qemu-devel] Profiling sparc64 emulation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Profiling sparc64 emulation
@ 2013-05-08 21:02 Artyom Tarasenko
  2013-05-09 18:30 ` Aurelien Jarno
  0 siblings, 1 reply; 6+ messages in thread
From: Artyom Tarasenko @ 2013-05-08 21:02 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel, Torbjorn Granlund

On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
>> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <tg@gmplib.org> wrote:
>> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all
>> > emulated systems I am using, most of which are qemu-based.
>>
>> Do I read it correct that qemu-system-ppc64 with the slowdown factor
>> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
>> factor of 96 ?
>> Do they both use Debian Wheezy guest? You have a remark that ppc64 has
>> problems with its clock. Was it taken into account when the slowdown
>> factors were calculated?
>>
>
> Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
> slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
> some type of load like perl scripts.

That's interesting. Actually it should be possible to lauch perl under user
mode qemu-sparc32plus. Is it possible to launch perl under user mode
qemu-ppc{32,64} too?

That would allow to understand whether the bad performance has to do
with TCG or the rest of the system emulation.

Artyom

--
Regards,
Artyom Tarasenko

linux/sparc and solaris/sparc under qemu blog:
http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Profiling sparc64 emulation
  2013-05-08 21:02 [Qemu-devel] Profiling sparc64 emulation Artyom Tarasenko
@ 2013-05-09 18:30 ` Aurelien Jarno
  2013-05-09 19:14   ` Paolo Bonzini
  2013-05-09 20:11   ` Artyom Tarasenko
  0 siblings, 2 replies; 6+ messages in thread
From: Aurelien Jarno @ 2013-05-09 18:30 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Torbjorn Granlund

On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote:
> On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
> >> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <tg@gmplib.org> wrote:
> >> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all
> >> > emulated systems I am using, most of which are qemu-based.
> >>
> >> Do I read it correct that qemu-system-ppc64 with the slowdown factor
> >> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
> >> factor of 96 ?
> >> Do they both use Debian Wheezy guest? You have a remark that ppc64 has
> >> problems with its clock. Was it taken into account when the slowdown
> >> factors were calculated?
> >>
> >
> > Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
> > slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
> > some type of load like perl scripts.
> 
> That's interesting. Actually it should be possible to lauch perl under user
> mode qemu-sparc32plus. Is it possible to launch perl under user mode
> qemu-ppc{32,64} too?
> 
> That would allow to understand whether the bad performance has to do
> with TCG or the rest of the system emulation.

I haven't done that yet, but I have run perf top while running perl
script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The
results are quite different:

qemu-system-ppc64
-----------------
 49,73%  perf-10672.map           [.] 0x7f7853ab4e0f
 13,23%  qemu-system-ppc64        [.] cpu_ppc_exec
 13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
  8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
  2,47%  qemu-system-ppc64        [.] object_class_dynamic_cast
  1,97%  qemu-system-ppc64        [.] type_is_ancestor
  1,05%  libglib-2.0.so.0.3200.4  [.] g_str_equal
  0,91%  qemu-system-ppc64        [.] ppc_cpu_do_interrupt
  0,90%  qemu-system-ppc64        [.] object_dynamic_cast_assert
  0,79%  libc-2.13.so             [.] __sigsetjmp
  0,62%  qemu-system-ppc64        [.] type_get_parent.isra.3
  0,58%  qemu-system-ppc64        [.] type_get_by_name
  0,57%  qemu-system-ppc64        [.] qemu_log_mask
  0,54%  qemu-system-ppc64        [.] object_dynamic_cast

qemu-system-sparc64
-------------------
 17,43%  perf-8154.map            [.] 0x7f6ac10245c8                                                                                                      
 10,46%  qemu-system-sparc64      [.] tcg_optimize                                                                                                        
 10,36%  qemu-system-sparc64      [.] cpu_sparc_exec                                                                                                      
  6,35%  qemu-system-sparc64      [.] tb_flush_jmp_cache                                                                                                  
  4,75%  qemu-system-sparc64      [.] get_physical_address_data                                                                                           
  4,45%  qemu-system-sparc64      [.] tcg_liveness_analysis                                                                                               
  4,35%  qemu-system-sparc64      [.] tcg_reg_alloc_op                                                                                                    
  2,90%  qemu-system-sparc64      [.] tlb_flush_page                                                                                                      
  2,35%  qemu-system-sparc64      [.] disas_sparc_insn                                                                                                    
  2,28%  qemu-system-sparc64      [.] get_physical_address_code                                                                                           
  2,21%  qemu-system-sparc64      [.] tlb_flush                                                                                                           
  1,64%  qemu-system-sparc64      [.] tcg_out_opc                                                                                                         
  1,22%  qemu-system-sparc64      [.] tcg_out_modrm_sib_offset.constprop.41                                                                               
  1,20%  qemu-system-sparc64      [.] helper_ld_asi                                                                                                       
  1,14%  qemu-system-sparc64      [.] gen_intermediate_code_pc                                                                                            
  1,04%  qemu-system-sparc64      [.] helper_st_asi                                                                                                       
  1,00%  qemu-system-sparc64      [.] object_class_dynamic_cast                                                                                           
  0,98%  qemu-system-sparc64      [.] tb_find_pc                                                                                                          
  0,94%  qemu-system-sparc64      [.] get_page_addr_code
  0,92%  qemu-system-sparc64      [.] tcg_gen_code_search_pc
  0,91%  qemu-system-sparc64      [.] tlb_set_page
  0,83%  qemu-system-sparc64      [.] reset_temp
  0,82%  qemu-system-sparc64      [.] tcg_reg_alloc_start


The perf-xxxx.map correspond to the code execution. As you can see it's
a lot lower on sparc, while a lot of smaller code generation/mmu code
appears. It's seems that the optimizations have to be focused on the
system part, not the TCG part, at least for now.

A quick look at the MMU seems to show some performance issue here, due
to the split code/data MMU on SPARC64, while the QEMU TLB is a joint
one. As a consequence one can see a lot of ping pong, setting a given
page to read or read/write, then execution, and later read or read/write
again. My guess is that it's related to constants table in the same page
than the code.

It should also be noted that the tcg_optimize starts to take a
non-negligible time, in both cases. The code grew up quite a lot
recently, and it might be time to rework it. It's nice to have optimized
code, but not if the gain is lower than the optimization time.

I am also surprised to see glib code that high on the qemu-system-ppc64
perf report.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Profiling sparc64 emulation
  2013-05-09 18:30 ` Aurelien Jarno
@ 2013-05-09 19:14   ` Paolo Bonzini
  2013-05-10 14:22     ` Anthony Liguori
  2013-05-09 20:11   ` Artyom Tarasenko
  1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2013-05-09 19:14 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Artyom Tarasenko, Torbjorn Granlund

Il 09/05/2013 20:30, Aurelien Jarno ha scritto:
>  13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
>   8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
>   2,47%  qemu-system-ppc64        [.] object_class_dynamic_cast
>   1,97%  qemu-system-ppc64        [.] type_is_ancestor

That's worrisome, but should be easy to fix... can you make a callgraph
profile?

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Profiling sparc64 emulation
  2013-05-09 19:14   ` Paolo Bonzini
@ 2013-05-10 14:22     ` Anthony Liguori
  0 siblings, 0 replies; 6+ messages in thread
From: Anthony Liguori @ 2013-05-10 14:22 UTC (permalink / raw)
  To: Paolo Bonzini, Aurelien Jarno
  Cc: qemu-devel, Artyom Tarasenko, Torbjorn Granlund

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 09/05/2013 20:30, Aurelien Jarno ha scritto:
>>  13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
>>   8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
>>   2,47%  qemu-system-ppc64        [.] object_class_dynamic_cast
>>   1,97%  qemu-system-ppc64        [.] type_is_ancestor
>
> That's worrisome, but should be easy to fix... can you make a callgraph
> profile?

So percentage of a profiling run doesn't imply a performance regression.

Are there real performance numbers here?

Regards,

Anthony Liguori

>
> Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Profiling sparc64 emulation
  2013-05-09 18:30 ` Aurelien Jarno
  2013-05-09 19:14   ` Paolo Bonzini
@ 2013-05-09 20:11   ` Artyom Tarasenko
  2013-05-09 20:53     ` Richard Henderson
  1 sibling, 1 reply; 6+ messages in thread
From: Artyom Tarasenko @ 2013-05-09 20:11 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Torbjorn Granlund

On Thu, May 9, 2013 at 8:30 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote:
>> On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> > On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
>> >> On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund <tg@gmplib.org> wrote:
>> >> > The 2nd table of http://gmplib.org/devel/testsystems.html shows all
>> >> > emulated systems I am using, most of which are qemu-based.
>> >>
>> >> Do I read it correct that qemu-system-ppc64 with the slowdown factor
>> >> of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
>> >> factor of 96 ?
>> >> Do they both use Debian Wheezy guest? You have a remark that ppc64 has
>> >> problems with its clock. Was it taken into account when the slowdown
>> >> factors were calculated?
>> >>
>> >
>> > Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
>> > slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
>> > some type of load like perl scripts.
>>
>> That's interesting. Actually it should be possible to lauch perl under user
>> mode qemu-sparc32plus. Is it possible to launch perl under user mode
>> qemu-ppc{32,64} too?
>>
>> That would allow to understand whether the bad performance has to do
>> with TCG or the rest of the system emulation.
>
> I haven't done that yet, but I have run perf top while running perl
> script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The
> results are quite different:
>
> qemu-system-ppc64
> -----------------
>  49,73%  perf-10672.map           [.] 0x7f7853ab4e0f
>  13,23%  qemu-system-ppc64        [.] cpu_ppc_exec
>  13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
>   8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
>   2,47%  qemu-system-ppc64        [.] object_class_dynamic_cast
>   1,97%  qemu-system-ppc64        [.] type_is_ancestor
>   1,05%  libglib-2.0.so.0.3200.4  [.] g_str_equal
>   0,91%  qemu-system-ppc64        [.] ppc_cpu_do_interrupt
>   0,90%  qemu-system-ppc64        [.] object_dynamic_cast_assert
>   0,79%  libc-2.13.so             [.] __sigsetjmp
>   0,62%  qemu-system-ppc64        [.] type_get_parent.isra.3
>   0,58%  qemu-system-ppc64        [.] type_get_by_name
>   0,57%  qemu-system-ppc64        [.] qemu_log_mask
>   0,54%  qemu-system-ppc64        [.] object_dynamic_cast
>
> qemu-system-sparc64
> -------------------
>  17,43%  perf-8154.map            [.] 0x7f6ac10245c8
>  10,46%  qemu-system-sparc64      [.] tcg_optimize
>  10,36%  qemu-system-sparc64      [.] cpu_sparc_exec
>   6,35%  qemu-system-sparc64      [.] tb_flush_jmp_cache
>   4,75%  qemu-system-sparc64      [.] get_physical_address_data
>   4,45%  qemu-system-sparc64      [.] tcg_liveness_analysis
>   4,35%  qemu-system-sparc64      [.] tcg_reg_alloc_op
>   2,90%  qemu-system-sparc64      [.] tlb_flush_page
>   2,35%  qemu-system-sparc64      [.] disas_sparc_insn
>   2,28%  qemu-system-sparc64      [.] get_physical_address_code
>   2,21%  qemu-system-sparc64      [.] tlb_flush
>   1,64%  qemu-system-sparc64      [.] tcg_out_opc
>   1,22%  qemu-system-sparc64      [.] tcg_out_modrm_sib_offset.constprop.41
>   1,20%  qemu-system-sparc64      [.] helper_ld_asi
>   1,14%  qemu-system-sparc64      [.] gen_intermediate_code_pc
>   1,04%  qemu-system-sparc64      [.] helper_st_asi
>   1,00%  qemu-system-sparc64      [.] object_class_dynamic_cast
>   0,98%  qemu-system-sparc64      [.] tb_find_pc
>   0,94%  qemu-system-sparc64      [.] get_page_addr_code
>   0,92%  qemu-system-sparc64      [.] tcg_gen_code_search_pc
>   0,91%  qemu-system-sparc64      [.] tlb_set_page
>   0,83%  qemu-system-sparc64      [.] reset_temp
>   0,82%  qemu-system-sparc64      [.] tcg_reg_alloc_start
>
>
> The perf-xxxx.map correspond to the code execution. As you can see it's
> a lot lower on sparc, while a lot of smaller code generation/mmu code
> appears. It's seems that the optimizations have to be focused on the
> system part, not the TCG part, at least for now.
>
> A quick look at the MMU seems to show some performance issue here, due
> to the split code/data MMU on SPARC64, while the QEMU TLB is a joint
> one. As a consequence one can see a lot of ping pong, setting a given
> page to read or read/write, then execution, and later read or read/write
> again. My guess is that it's related to constants table in the same page
> than the code.
>
> It should also be noted that the tcg_optimize starts to take a
> non-negligible time, in both cases. The code grew up quite a lot
> recently, and it might be time to rework it. It's nice to have optimized
> code, but not if the gain is lower than the optimization time.

Is it possible to disable some optimisations, or the whole
optimisation completely?
I see no command line switches for that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Profiling sparc64 emulation
  2013-05-09 20:11   ` Artyom Tarasenko
@ 2013-05-09 20:53     ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2013-05-09 20:53 UTC (permalink / raw)
  To: Artyom Tarasenko; +Cc: qemu-devel, Aurelien Jarno, Torbjorn Granlund

On 05/09/2013 01:11 PM, Artyom Tarasenko wrote:
> Is it possible to disable some optimisations, or the whole
> optimisation completely?
> I see no command line switches for that.

No command-line switches.

See the top lines of tcg/tcg.c for compile-time disabling.


r~

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-10 14:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-08 21:02 [Qemu-devel] Profiling sparc64 emulation Artyom Tarasenko
2013-05-09 18:30 ` Aurelien Jarno
2013-05-09 19:14   ` Paolo Bonzini
2013-05-10 14:22     ` Anthony Liguori
2013-05-09 20:11   ` Artyom Tarasenko
2013-05-09 20:53     ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).