* [Qemu-devel] emulated ARM performance vs real processor ?
@ 2011-09-01 7:32 Julien Heyman
2011-09-02 14:31 ` David Gilbert
2011-09-04 17:42 ` Antti P Miettinen
0 siblings, 2 replies; 7+ messages in thread
From: Julien Heyman @ 2011-09-01 7:32 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 609 bytes --]
Hi,
I was wondering if anyone had some data regarding the relative performance
of any given ARM board emulated in QEMU versus the real thing. Yes, I do
know this depends a lot on the host PC running qemu, but some
ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor
on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I
expect, one way or the other ? For example, for boot time.
I have no idea whether the overhead of emulation is over-compensated by the
huge processing power of the host compared to the real HW target, and by
which factor.
Regards,
Julien
[-- Attachment #2: Type: text/html, Size: 639 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] emulated ARM performance vs real processor ?
2011-09-01 7:32 [Qemu-devel] emulated ARM performance vs real processor ? Julien Heyman
@ 2011-09-02 14:31 ` David Gilbert
2011-09-02 16:04 ` Julien Heyman
2011-09-04 17:42 ` Antti P Miettinen
1 sibling, 1 reply; 7+ messages in thread
From: David Gilbert @ 2011-09-02 14:31 UTC (permalink / raw)
To: Julien Heyman; +Cc: qemu-devel
On 1 September 2011 08:32, Julien Heyman <bidsomail@gmail.com> wrote:
> Hi,
>
> I was wondering if anyone had some data regarding the relative performance
> of any given ARM board emulated in QEMU versus the real thing. Yes, I do
> know this depends a lot on the host PC running qemu, but some
> ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9 processor
> on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio should I
> expect, one way or the other ? For example, for boot time.
> I have no idea whether the overhead of emulation is over-compensated by the
> huge processing power of the host compared to the real HW target, and by
> which factor.
Comparing performance is always a bit tricky, and I've not really got
a solid set of benchmarks
ready to run to try it but to give some numbers:
1) Boot times
Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop
Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD
card) - 2minutes to desktop
QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a
Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop
(The times are scarily close to exact minutes - timeout somewhere?)
Now, QEMU system mode only ever uses one host core when emulating
multiple cores, so there is a factor 2 disadvantage there, but
on the plus side the memory bandwidth of the host and the disk speed
is probably much higher than the Panda.
2) Simple md5sum benchmark
As a really simple benchmark the test:
time (dd if=/dev/zero bs=1024k count=1000 | md5sum)
Panda board 14.5s real, 10.7 user, 3.8s system
Emulated Overo board (single A8 processor on same laptop as above)
- 41s real, 24.7s user, 16.4s system
User mode emulated - 14.2s real, 14s user, 0.5s system
Native on x86 host - 3.2s real, 2.5s user, 1.2s system
So, that's two sets of pretty bogus dummy simple benchmarks!
I suppose one observation is that the boot time isn't that bad
compared to the real (different) hardware, the user mode emulation
was comparable to the Panda, but the system emulation on a simple test
seems a lot slower.
These things will vary wildly depending what your benchmark is; but as
a summary I'd say that the ARM system mode emulation is
fast enough to use interactively but CPU wise is noticeably slower
than user mode emulation.
Dave
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] emulated ARM performance vs real processor ?
2011-09-02 14:31 ` David Gilbert
@ 2011-09-02 16:04 ` Julien Heyman
2011-09-02 16:10 ` David Gilbert
2011-09-02 16:56 ` M P
0 siblings, 2 replies; 7+ messages in thread
From: Julien Heyman @ 2011-09-02 16:04 UTC (permalink / raw)
To: David Gilbert; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 3172 bytes --]
Thanks Dave.
I use system emulation, and my main concern is "just" to know that the
actual board will run faster than the emulation. So based on your example,
and even though my target board (mini2440) is nowhere as fast as a Panda
board, this should be the case by a comfortable margin. Now, as I am
focusing on boot time, the time to read from flash (i.e. much faster in the
emulated context than on the real flash) will counter-balance this a lot.
Hopefully these two factors will even out and what I measure now will not be
dramatically different than what I will get on the real board, but...we'll
see.
Regards,
Julien
On Fri, Sep 2, 2011 at 4:31 PM, David Gilbert <david.gilbert@linaro.org>wrote:
> On 1 September 2011 08:32, Julien Heyman <bidsomail@gmail.com> wrote:
> > Hi,
> >
> > I was wondering if anyone had some data regarding the relative
> performance
> > of any given ARM board emulated in QEMU versus the real thing. Yes, I do
> > know this depends a lot on the host PC running qemu, but some
> > ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9
> processor
> > on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio
> should I
> > expect, one way or the other ? For example, for boot time.
> > I have no idea whether the overhead of emulation is over-compensated by
> the
> > huge processing power of the host compared to the real HW target, and by
> > which factor.
>
> Comparing performance is always a bit tricky, and I've not really got
> a solid set of benchmarks
> ready to run to try it but to give some numbers:
>
> 1) Boot times
> Comparing the Linaro 11.08 ubuntu desktop images, time to boot to desktop
>
> Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD
> card) - 2minutes to desktop
> QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a
> Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop
>
> (The times are scarily close to exact minutes - timeout somewhere?)
> Now, QEMU system mode only ever uses one host core when emulating
> multiple cores, so there is a factor 2 disadvantage there, but
> on the plus side the memory bandwidth of the host and the disk speed
> is probably much higher than the Panda.
>
> 2) Simple md5sum benchmark
> As a really simple benchmark the test:
>
> time (dd if=/dev/zero bs=1024k count=1000 | md5sum)
>
> Panda board 14.5s real, 10.7 user, 3.8s system
> Emulated Overo board (single A8 processor on same laptop as above)
> - 41s real, 24.7s user, 16.4s system
> User mode emulated - 14.2s real, 14s user, 0.5s system
> Native on x86 host - 3.2s real, 2.5s user, 1.2s system
>
> So, that's two sets of pretty bogus dummy simple benchmarks!
>
> I suppose one observation is that the boot time isn't that bad
> compared to the real (different) hardware, the user mode emulation
> was comparable to the Panda, but the system emulation on a simple test
> seems a lot slower.
>
> These things will vary wildly depending what your benchmark is; but as
> a summary I'd say that the ARM system mode emulation is
> fast enough to use interactively but CPU wise is noticeably slower
> than user mode emulation.
>
> Dave
>
[-- Attachment #2: Type: text/html, Size: 3783 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] emulated ARM performance vs real processor ?
2011-09-02 16:04 ` Julien Heyman
@ 2011-09-02 16:10 ` David Gilbert
2011-09-02 16:56 ` M P
1 sibling, 0 replies; 7+ messages in thread
From: David Gilbert @ 2011-09-02 16:10 UTC (permalink / raw)
To: Julien Heyman; +Cc: qemu-devel
On 2 September 2011 17:04, Julien Heyman <bidsomail@gmail.com> wrote:
> Thanks Dave.
> I use system emulation, and my main concern is "just" to know that the
> actual board will run faster than the emulation. So based on your example,
> and even though my target board (mini2440) is nowhere as fast as a Panda
> board, this should be the case by a comfortable margin.
OK, but be careful - you will occasionally trip over something where the
emulation of it is particularly dire and the real board might be faster;
for example with the default flags SD card writes can be a factor of 10 slower
than real hardware, so relying on the real hardware always being faster
is dangerous. You'll probably get similar CPU emulation artefacts where
there are some instructions that are particularly nasty to emulate but
really cheap on the hardware.
Dave
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] emulated ARM performance vs real processor ?
2011-09-02 16:04 ` Julien Heyman
2011-09-02 16:10 ` David Gilbert
@ 2011-09-02 16:56 ` M P
1 sibling, 0 replies; 7+ messages in thread
From: M P @ 2011-09-02 16:56 UTC (permalink / raw)
To: Julien Heyman; +Cc: David Gilbert, qemu-devel
On Fri, Sep 2, 2011 at 5:04 PM, Julien Heyman <bidsomail@gmail.com> wrote:
> Thanks Dave.
> I use system emulation, and my main concern is "just" to know that the
> actual board will run faster than the emulation. So based on your example,
> and even though my target board (mini2440) is nowhere as fast as a Panda
> board, this should be the case by a comfortable margin. Now, as I am
> focusing on boot time, the time to read from flash (i.e. much faster in the
> emulated context than on the real flash) will counter-balance this a lot.
> Hopefully these two factors will even out and what I measure now will not be
> dramatically different than what I will get on the real board, but...we'll
> see.
I wrote the mini2440 support for qemu and used it a LOT, and it can
pretty easily emulate full speed on a core2. Some stuff is a bit
slower, but most is quite a bit faster somehow.
Note that emulates more than an armv4t, so if the code you run is not
compiled properly, it might just work in qemu, and fail miserably on
the real hardware..
Michael
> Regards,
> Julien
>
> On Fri, Sep 2, 2011 at 4:31 PM, David Gilbert <david.gilbert@linaro.org>
> wrote:
>>
>> On 1 September 2011 08:32, Julien Heyman <bidsomail@gmail.com> wrote:
>> > Hi,
>> >
>> > I was wondering if anyone had some data regarding the relative
>> > performance
>> > of any given ARM board emulated in QEMU versus the real thing. Yes, I do
>> > know this depends a lot on the host PC running qemu, but some
>> > ballpark/example figures would help. Say, I emulate a 400 Mhz ARM9
>> > processor
>> > on a Core2Duo laptop @ 2 Ghz, what kind of performance/timing ratio
>> > should I
>> > expect, one way or the other ? For example, for boot time.
>> > I have no idea whether the overhead of emulation is over-compensated by
>> > the
>> > huge processing power of the host compared to the real HW target, and by
>> > which factor.
>>
>> Comparing performance is always a bit tricky, and I've not really got
>> a solid set of benchmarks
>> ready to run to try it but to give some numbers:
>>
>> 1) Boot times
>> Comparing the Linaro 11.08 ubuntu desktop images, time to boot to
>> desktop
>>
>> Real Panda board (dual core A9 at 1GHz, 1GB RAM, running off SD
>> card) - 2minutes to desktop
>> QEMU vexpress (2xA9 core, 1GB RAM, emulated sd card, running on a
>> Core2 Duo T9400 2.53GHz laptop) - 3minutes to desktop
>>
>> (The times are scarily close to exact minutes - timeout somewhere?)
>> Now, QEMU system mode only ever uses one host core when emulating
>> multiple cores, so there is a factor 2 disadvantage there, but
>> on the plus side the memory bandwidth of the host and the disk speed
>> is probably much higher than the Panda.
>>
>> 2) Simple md5sum benchmark
>> As a really simple benchmark the test:
>>
>> time (dd if=/dev/zero bs=1024k count=1000 | md5sum)
>>
>> Panda board 14.5s real, 10.7 user, 3.8s system
>> Emulated Overo board (single A8 processor on same laptop as above)
>> - 41s real, 24.7s user, 16.4s system
>> User mode emulated - 14.2s real, 14s user, 0.5s system
>> Native on x86 host - 3.2s real, 2.5s user, 1.2s system
>>
>> So, that's two sets of pretty bogus dummy simple benchmarks!
>>
>> I suppose one observation is that the boot time isn't that bad
>> compared to the real (different) hardware, the user mode emulation
>> was comparable to the Panda, but the system emulation on a simple test
>> seems a lot slower.
>>
>> These things will vary wildly depending what your benchmark is; but as
>> a summary I'd say that the ARM system mode emulation is
>> fast enough to use interactively but CPU wise is noticeably slower
>> than user mode emulation.
>>
>> Dave
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] emulated ARM performance vs real processor ?
2011-09-01 7:32 [Qemu-devel] emulated ARM performance vs real processor ? Julien Heyman
2011-09-02 14:31 ` David Gilbert
@ 2011-09-04 17:42 ` Antti P Miettinen
2011-09-04 18:44 ` Peter Maydell
1 sibling, 1 reply; 7+ messages in thread
From: Antti P Miettinen @ 2011-09-04 17:42 UTC (permalink / raw)
To: qemu-devel
Julien Heyman <bidsomail@gmail.com> writes:
> Hi,
>
> I was wondering if anyone had some data regarding the relative performance of
> any given ARM board emulated in QEMU versus the real thing. Yes, I do know
> this depends a lot on the host PC running qemu, but some ballpark/example
> figures would help. Say, I emulate a 400 Mhz ARM9 processor on a Core2Duo
> laptop @ 2 Ghz, what kind of performance/timing ratio should I expect, one way
> or the other ? For example, for boot time.
> I have no idea whether the overhead of emulation is over-compensated by the
> huge processing power of the host compared to the real HW target, and by which
> factor.
>
> Regards,
> Julien
>
Taking a look at:
http://adt.cs.upb.de/quf/quf2011_proceedings.pdf
page 20 (24th page in the PDF), figure 1b, the noprof bars, I'd expect
2GHz host to be on average faster than native target. The emulation
speed depends on how core intensive vs memory intensive your workload
is. Workloads that are memory bound in the target (e.g. gzip ASCII
compression) can me emulated much faster (e.g. factor of two) than core
bound workloads (e.g. mcrypt encryption).
--
http://www.iki.fi/~ananaza/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] emulated ARM performance vs real processor ?
2011-09-04 17:42 ` Antti P Miettinen
@ 2011-09-04 18:44 ` Peter Maydell
0 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2011-09-04 18:44 UTC (permalink / raw)
To: Antti P Miettinen; +Cc: qemu-devel
On 4 September 2011 18:42, Antti P Miettinen <ananaza@iki.fi> wrote:
> The emulation
> speed depends on how core intensive vs memory intensive your workload
> is. Workloads that are memory bound in the target (e.g. gzip ASCII
> compression) can me emulated much faster (e.g. factor of two) than core
> bound workloads (e.g. mcrypt encryption).
Another factor is that if the workload makes heavy use of floating point
or SIMD instructions (VFP and Neon) then QEMU will do comparatively worse
than for a pure integer workload, because we have to emulate all the
fp calculations in software.
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-09-04 18:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-01 7:32 [Qemu-devel] emulated ARM performance vs real processor ? Julien Heyman
2011-09-02 14:31 ` David Gilbert
2011-09-02 16:04 ` Julien Heyman
2011-09-02 16:10 ` David Gilbert
2011-09-02 16:56 ` M P
2011-09-04 17:42 ` Antti P Miettinen
2011-09-04 18:44 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).