qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?
@ 2019-01-10 19:31 Matwey V. Kornilov
  2019-01-11  9:52 ` Peter Maydell
  0 siblings, 1 reply; 6+ messages in thread
From: Matwey V. Kornilov @ 2019-01-10 19:31 UTC (permalink / raw)
  To: qemu-devel

Hello,

I am running the same application compiled for aarch64 and armv7l on
x86_64 platform using qemu-user-linux tools.

I see dramatic performance difference (30 times) between emulated
architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours.
I do understand that CPU architecture emulation is inherently slow
thing, but my question is about the difference.

How could I debug to understand what is the reason for such a big
difference? I've already tried to run stress-ng compiled for this two
architectures, but it leads to the same performance per second.

I am running qemu 2.11, should I try other version?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?
  2019-01-10 19:31 [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm? Matwey V. Kornilov
@ 2019-01-11  9:52 ` Peter Maydell
  2019-01-11 19:24   ` Matwey V. Kornilov
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2019-01-11  9:52 UTC (permalink / raw)
  To: Matwey V. Kornilov; +Cc: QEMU Developers

On Thu, 10 Jan 2019 at 19:33, Matwey V. Kornilov
<matwey.kornilov@gmail.com> wrote:
> I am running the same application compiled for aarch64 and armv7l on
> x86_64 platform using qemu-user-linux tools.
>
> I see dramatic performance difference (30 times) between emulated
> architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours.
> I do understand that CPU architecture emulation is inherently slow
> thing, but my question is about the difference.
>
> How could I debug to understand what is the reason for such a big
> difference? I've already tried to run stress-ng compiled for this two
> architectures, but it leads to the same performance per second.
>
> I am running qemu 2.11, should I try other version?

Yes, do try 3.1 -- we have done some overall TCG performance
improvements.

For a big difference between target architectures like that,
I would try starting by using some host performance tools on
the two runs to see where all the time is being taken in
the armv7l guest run -- is it all in translated guest code,
or is there more time (proportionally) spent in particular
parts of the QEMU C code? Does the armv7l version do
many more or different syscalls (check with the QEMU -strace
option) ?

Also you should check performance on h/w 32 bit vs
64-bit Arm if you can, to confirm that it's not just
that the guest application runs much slower there.
(If you don't have the arm hardware you could at least
check x86 32-bit vs 64-bit.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?
  2019-01-11  9:52 ` Peter Maydell
@ 2019-01-11 19:24   ` Matwey V. Kornilov
  2019-01-13  7:31     ` Matwey V. Kornilov
  0 siblings, 1 reply; 6+ messages in thread
From: Matwey V. Kornilov @ 2019-01-11 19:24 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

пт, 11 янв. 2019 г. в 12:52, Peter Maydell <peter.maydell@linaro.org>:
>
> On Thu, 10 Jan 2019 at 19:33, Matwey V. Kornilov
> <matwey.kornilov@gmail.com> wrote:
> > I am running the same application compiled for aarch64 and armv7l on
> > x86_64 platform using qemu-user-linux tools.
> >
> > I see dramatic performance difference (30 times) between emulated
> > architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours.
> > I do understand that CPU architecture emulation is inherently slow
> > thing, but my question is about the difference.
> >
> > How could I debug to understand what is the reason for such a big
> > difference? I've already tried to run stress-ng compiled for this two
> > architectures, but it leads to the same performance per second.
> >
> > I am running qemu 2.11, should I try other version?
>
> Yes, do try 3.1 -- we have done some overall TCG performance
> improvements.

Indeed, qemu-arm from master runs for 4 minutes where 2.11 runs for 2
hours for me. It is impressive improvement.

>
> For a big difference between target architectures like that,
> I would try starting by using some host performance tools on
> the two runs to see where all the time is being taken in
> the armv7l guest run -- is it all in translated guest code,
> or is there more time (proportionally) spent in particular
> parts of the QEMU C code? Does the armv7l version do
> many more or different syscalls (check with the QEMU -strace
> option) ?
>
> Also you should check performance on h/w 32 bit vs
> 64-bit Arm if you can, to confirm that it's not just
> that the guest application runs much slower there.
> (If you don't have the arm hardware you could at least
> check x86 32-bit vs 64-bit.)
>
> thanks
> -- PMM



-- 
With best regards,
Matwey V. Kornilov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?
  2019-01-11 19:24   ` Matwey V. Kornilov
@ 2019-01-13  7:31     ` Matwey V. Kornilov
  2019-01-14 10:23       ` Peter Maydell
  0 siblings, 1 reply; 6+ messages in thread
From: Matwey V. Kornilov @ 2019-01-13  7:31 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

пт, 11 янв. 2019 г. в 22:24, Matwey V. Kornilov <matwey.kornilov@gmail.com>:
>
> пт, 11 янв. 2019 г. в 12:52, Peter Maydell <peter.maydell@linaro.org>:
> >
> > On Thu, 10 Jan 2019 at 19:33, Matwey V. Kornilov
> > <matwey.kornilov@gmail.com> wrote:
> > > I am running the same application compiled for aarch64 and armv7l on
> > > x86_64 platform using qemu-user-linux tools.
> > >
> > > I see dramatic performance difference (30 times) between emulated
> > > architectures: aarch64 runs for ~4 minutes, armv7l runs for ~2 hours.
> > > I do understand that CPU architecture emulation is inherently slow
> > > thing, but my question is about the difference.
> > >
> > > How could I debug to understand what is the reason for such a big
> > > difference? I've already tried to run stress-ng compiled for this two
> > > architectures, but it leads to the same performance per second.
> > >
> > > I am running qemu 2.11, should I try other version?
> >
> > Yes, do try 3.1 -- we have done some overall TCG performance
> > improvements.
>
> Indeed, qemu-arm from master runs for 4 minutes where 2.11 runs for 2
> hours for me. It is impressive improvement.

I've managed to bisected the first good (fast) commit:

commit 2a53535af471f4bee9d6cb5b363746b8d5ed21dd
Author: Luke Shumaker <lukeshu@parabola.nu>
Date:   Thu Dec 28 13:08:13 2017 -0500

    linux-user: init_guest_space: Try to make ARM space+commpage continuous

Though I am not sure, how does it help.

>
> >
> > For a big difference between target architectures like that,
> > I would try starting by using some host performance tools on
> > the two runs to see where all the time is being taken in
> > the armv7l guest run -- is it all in translated guest code,
> > or is there more time (proportionally) spent in particular
> > parts of the QEMU C code? Does the armv7l version do
> > many more or different syscalls (check with the QEMU -strace
> > option) ?
> >
> > Also you should check performance on h/w 32 bit vs
> > 64-bit Arm if you can, to confirm that it's not just
> > that the guest application runs much slower there.
> > (If you don't have the arm hardware you could at least
> > check x86 32-bit vs 64-bit.)
> >
> > thanks
> > -- PMM
>
>
>
> --
> With best regards,
> Matwey V. Kornilov



-- 
With best regards,
Matwey V. Kornilov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?
  2019-01-13  7:31     ` Matwey V. Kornilov
@ 2019-01-14 10:23       ` Peter Maydell
  2019-01-14 10:30         ` Matwey V. Kornilov
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2019-01-14 10:23 UTC (permalink / raw)
  To: Matwey V. Kornilov; +Cc: QEMU Developers

On Sun, 13 Jan 2019 at 07:31, Matwey V. Kornilov
<matwey.kornilov@gmail.com> wrote:
>
> пт, 11 янв. 2019 г. в 22:24, Matwey V. Kornilov <matwey.kornilov@gmail.com>:
> > Indeed, qemu-arm from master runs for 4 minutes where 2.11 runs for 2
> > hours for me. It is impressive improvement.
>
> I've managed to bisected the first good (fast) commit:
>
> commit 2a53535af471f4bee9d6cb5b363746b8d5ed21dd
> Author: Luke Shumaker <lukeshu@parabola.nu>
> Date:   Thu Dec 28 13:08:13 2017 -0500
>
>     linux-user: init_guest_space: Try to make ARM space+commpage continuous
>
> Though I am not sure, how does it help.

Oh, right, you were running into that bug:
https://bugs.launchpad.net/qemu/+bug/1740219

The problem was that we were not putting things in the right
place in memory for 32-bit Arm guest binaries in particular,
which could mean that we spent a long time trying a lot of
placements for memory mappings that failed instead of getting
an arrangement that worked first time. This meant startup
time for the guest binary was pretty slow. I guess your
application does a lot of exec()ing of new processes?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm?
  2019-01-14 10:23       ` Peter Maydell
@ 2019-01-14 10:30         ` Matwey V. Kornilov
  0 siblings, 0 replies; 6+ messages in thread
From: Matwey V. Kornilov @ 2019-01-14 10:30 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

пн, 14 янв. 2019 г. в 13:24, Peter Maydell <peter.maydell@linaro.org>:
>
> On Sun, 13 Jan 2019 at 07:31, Matwey V. Kornilov
> <matwey.kornilov@gmail.com> wrote:
> >
> > пт, 11 янв. 2019 г. в 22:24, Matwey V. Kornilov <matwey.kornilov@gmail.com>:
> > > Indeed, qemu-arm from master runs for 4 minutes where 2.11 runs for 2
> > > hours for me. It is impressive improvement.
> >
> > I've managed to bisected the first good (fast) commit:
> >
> > commit 2a53535af471f4bee9d6cb5b363746b8d5ed21dd
> > Author: Luke Shumaker <lukeshu@parabola.nu>
> > Date:   Thu Dec 28 13:08:13 2017 -0500
> >
> >     linux-user: init_guest_space: Try to make ARM space+commpage continuous
> >
> > Though I am not sure, how does it help.
>
> Oh, right, you were running into that bug:
> https://bugs.launchpad.net/qemu/+bug/1740219
>
> The problem was that we were not putting things in the right
> place in memory for 32-bit Arm guest binaries in particular,
> which could mean that we spent a long time trying a lot of
> placements for memory mappings that failed instead of getting
> an arrangement that worked first time. This meant startup
> time for the guest binary was pretty slow. I guess your
> application does a lot of exec()ing of new processes?

Indeed. Thank you.

>
> thanks
> -- PMM



-- 
With best regards,
Matwey V. Kornilov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-14 10:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-10 19:31 [Qemu-devel] qemu-user-linux: how could I measure performance for aarch64 and arm? Matwey V. Kornilov
2019-01-11  9:52 ` Peter Maydell
2019-01-11 19:24   ` Matwey V. Kornilov
2019-01-13  7:31     ` Matwey V. Kornilov
2019-01-14 10:23       ` Peter Maydell
2019-01-14 10:30         ` Matwey V. Kornilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).