* [Qemu-devel] performance monitor
@ 2008-01-03 20:36 Clemens Kolbitsch
2008-01-03 21:29 ` Paul Brook
0 siblings, 1 reply; 10+ messages in thread
From: Clemens Kolbitsch @ 2008-01-03 20:36 UTC (permalink / raw)
To: qemu-devel
hi!
has anyone ever used some "real" performance monitoring tools (like papiex,
perfex, pfmon, etc.) on qemu? i'm running a debian linux and would like to
time some applications inside qemu and have tried the perfmon2 kernel-patch
(http://perfmon2.sourceforge.net/) for testing.
sadly, it does not work... dmesg tells me that the CPU is not identified
correctly ("unsupported family=6"). Now i am not really sure what type of
hardware-support the monitor relies on (i think PMU is the correct term, but
I'm not sure about that) and what CPUs are supported (dmesg tells me that
qemu simulates a Pentium M, but that's probably because I've compiled the
kernel on my *real* Pentium M).
... Ok, to cut a long question short: Is there any hardware support im qemu
for doing monitoring (that goes deeper than using "time") and has anyone ever
tested something that could work?
Thanks!
Clemens
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 20:36 [Qemu-devel] performance monitor Clemens Kolbitsch
@ 2008-01-03 21:29 ` Paul Brook
2008-01-03 21:38 ` Clemens Kolbitsch
0 siblings, 1 reply; 10+ messages in thread
From: Paul Brook @ 2008-01-03 21:29 UTC (permalink / raw)
To: qemu-devel; +Cc: Clemens Kolbitsch
> ... Ok, to cut a long question short: Is there any hardware support im qemu
> for doing monitoring (that goes deeper than using "time") and has anyone
> ever tested something that could work?
Probably your application wants the performance counters. Qemu doesn't emulate
those.
Besides which, qemu is not cycle accurate. Any performance measurements your
make are pretty much meaningless, and bear absolutely no relationship to real
hardware.
Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 21:29 ` Paul Brook
@ 2008-01-03 21:38 ` Clemens Kolbitsch
2008-01-03 22:07 ` Paul Brook
2008-01-04 8:49 ` Rob Landley
0 siblings, 2 replies; 10+ messages in thread
From: Clemens Kolbitsch @ 2008-01-03 21:38 UTC (permalink / raw)
To: Paul Brook; +Cc: qemu-devel
On Thursday 03 January 2008 22:29:06 Paul Brook wrote:
> > ... Ok, to cut a long question short: Is there any hardware support im
> > qemu for doing monitoring (that goes deeper than using "time") and has
> > anyone ever tested something that could work?
>
> Probably your application wants the performance counters. Qemu doesn't
> emulate those.
>
> Besides which, qemu is not cycle accurate. Any performance measurements
> your make are pretty much meaningless, and bear absolutely no relationship
> to real hardware.
Thanks for the quick answer Paul! Not really what I wanted to hear, but
probably true ;-)
Does anyone have an idea on how I can measure performance in qemu to a
somewhat accurate level? I have modified qemu (the memory handling) and the
linux kernel and want to find out the penalty this introduced... does anyone
have any comments / ideas on this?
Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 21:38 ` Clemens Kolbitsch
@ 2008-01-03 22:07 ` Paul Brook
2008-01-03 22:11 ` Clemens Kolbitsch
2008-01-04 8:49 ` Rob Landley
1 sibling, 1 reply; 10+ messages in thread
From: Paul Brook @ 2008-01-03 22:07 UTC (permalink / raw)
To: qemu-devel; +Cc: Clemens Kolbitsch
> Does anyone have an idea on how I can measure performance in qemu to a
> somewhat accurate level? I have modified qemu (the memory handling) and the
> linux kernel and want to find out the penalty this introduced... does
> anyone have any comments / ideas on this?
Short answer is you probably can't. And even if you can I won't believe tyour
results unless you've verified them on real hardware :-)
With the exception of some very small embedded cores, Modern CPUs have complex
out of order execution pipelines and multi-level cache hierarchies. It's
common for performance to be dominated by these secondary factors rather than
raw instruction throughput.
Exactly what features dominate performance is very application specific.
Determining which factor dominates is unlikely to be something qemu can help
with.
However if e.g. you know that for your application there's a good correlation
was between performance and L2 cache misses you could instrument qemu to and
a L1/L2 cache model. The overhead will be fairly severe (easily 10x slower),
and completely screw up any realtime measurements. However it would produce
some useful cache use statistics that you could use to guesstimate actual
performance. This is similar to how cachegrind works. Obviously if your
application isn't cache bound then these figures will be meaningless.
Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 22:07 ` Paul Brook
@ 2008-01-03 22:11 ` Clemens Kolbitsch
2008-01-03 22:18 ` Paul Brook
2008-01-03 22:19 ` Laurent Desnogues
0 siblings, 2 replies; 10+ messages in thread
From: Clemens Kolbitsch @ 2008-01-03 22:11 UTC (permalink / raw)
To: qemu-devel
On Thursday 03 January 2008 23:07:07 you wrote:
> > Does anyone have an idea on how I can measure performance in qemu to a
> > somewhat accurate level? I have modified qemu (the memory handling) and
> > the linux kernel and want to find out the penalty this introduced... does
> > anyone have any comments / ideas on this?
>
> Short answer is you probably can't. And even if you can I won't believe
> tyour results unless you've verified them on real hardware :-)
>
> With the exception of some very small embedded cores, Modern CPUs have
> complex out of order execution pipelines and multi-level cache hierarchies.
> It's common for performance to be dominated by these secondary factors
> rather than raw instruction throughput.
>
> Exactly what features dominate performance is very application specific.
> Determining which factor dominates is unlikely to be something qemu can
> help with.
>
> However if e.g. you know that for your application there's a good
> correlation was between performance and L2 cache misses you could
> instrument qemu to and a L1/L2 cache model. The overhead will be fairly
> severe (easily 10x slower), and completely screw up any realtime
> measurements. However it would produce some useful cache use statistics
> that you could use to guesstimate actual performance. This is similar to
> how cachegrind works. Obviously if your application isn't cache bound then
> these figures will be meaningless.
Well, the measuring I had in mind partly concentrats on TLB misses, page
faults, etc. (in addition to the cycle measuring). guess i'll have to
implement something for myself in qemu :-/
But thanks a lot for helping me out!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 22:11 ` Clemens Kolbitsch
@ 2008-01-03 22:18 ` Paul Brook
2008-01-03 22:21 ` Clemens Kolbitsch
2008-01-03 22:19 ` Laurent Desnogues
1 sibling, 1 reply; 10+ messages in thread
From: Paul Brook @ 2008-01-03 22:18 UTC (permalink / raw)
To: qemu-devel; +Cc: Clemens Kolbitsch
> Well, the measuring I had in mind partly concentrats on TLB misses, page
> faults, etc. (in addition to the cycle measuring). guess i'll have to
> implement something for myself in qemu :-/
Be aware that the TLB qemu uses behaves very differently to a real CPU TLB. If
you want to get TLB miss statistics you'll need to model a "real" TLB for
that separately.
Page faults should be straightforward, but any half-decent guest OS would be
able to tell you those anyway.
Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 22:11 ` Clemens Kolbitsch
2008-01-03 22:18 ` Paul Brook
@ 2008-01-03 22:19 ` Laurent Desnogues
1 sibling, 0 replies; 10+ messages in thread
From: Laurent Desnogues @ 2008-01-03 22:19 UTC (permalink / raw)
To: qemu-devel
On Jan 3, 2008 11:11 PM, Clemens Kolbitsch <clemens.kol@gmx.at> wrote:
>
> Well, the measuring I had in mind partly concentrats on TLB misses, page
> faults, etc. (in addition to the cycle measuring). guess i'll have to
> implement something for myself in qemu :-/
There's something not clear here: do you want to measure your kernel
changes or do you want to profile Qemu?
As Paul clearly explained you can't do both :)
If you want to measure kernel performance oprofile is probably worth
looking at. But you will need the real hardware.
Another option, though much more intrusive, would be to add explicit
performance counters in places you need to look at (this method can
be applied to both Qemu too).
And to say it again: nobody can expect to measure OS performance
on a simulator, unless the simulator is directly derived from the HDL
code written by designers. At least I would never trust such a
result ;)
Laurent
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 22:18 ` Paul Brook
@ 2008-01-03 22:21 ` Clemens Kolbitsch
0 siblings, 0 replies; 10+ messages in thread
From: Clemens Kolbitsch @ 2008-01-03 22:21 UTC (permalink / raw)
To: Paul Brook; +Cc: qemu-devel
On Thursday 03 January 2008 23:18:58 Paul Brook wrote:
> > Well, the measuring I had in mind partly concentrats on TLB misses, page
> > faults, etc. (in addition to the cycle measuring). guess i'll have to
> > implement something for myself in qemu :-/
>
> Be aware that the TLB qemu uses behaves very differently to a real CPU TLB.
> If you want to get TLB miss statistics you'll need to model a "real" TLB
> for that separately.
Sure, yes. But I don't even care what it would be like on a real CPU. I just
want to know the impact it has on the emulated CPU ;-)
> Page faults should be straightforward, but any half-decent guest OS would
> be able to tell you those anyway.
True *g*
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-03 21:38 ` Clemens Kolbitsch
2008-01-03 22:07 ` Paul Brook
@ 2008-01-04 8:49 ` Rob Landley
2008-01-04 15:09 ` Clemens Kolbitsch
1 sibling, 1 reply; 10+ messages in thread
From: Rob Landley @ 2008-01-04 8:49 UTC (permalink / raw)
To: qemu-devel; +Cc: Clemens Kolbitsch, Paul Brook
On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote:
> Does anyone have an idea on how I can measure performance in qemu to a
> somewhat accurate level?
hwclock --show > time1
tar xvjf linux-2.6.23.tar.bz2 && cd linux-2.6.23 && make allnoconfig && make
cd ..
hwclock --show > time2
Do that on host and client, and you've got a ratio of the performance of qemu
to your host that should be good to within a few percent.
> I have modified qemu (the memory handling) and the
> linux kernel and want to find out the penalty this introduced... does
> anyone have any comments / ideas on this?
If it's something big, you can compare the result in minutes and seconds.
That's probably the best you're going to do. (Although really you want
hwclock --show before and after, and then do the math. That tunnels out to
the host system to get its idea of the time, which doesn't get thrown off by
timer interrupt delivery (as a signal) getting deferred by the host system's
scheduler. Of course the fact that hwclock _takes_ a second or so to read
the clock is a bit of a downer, but anything that takes less than a minute or
so to run isn't going to give you a very accurate time because the
performance of qemu isn't constant, and your results are going to skew all
over the place.
Especially for small things, the performance varies from run to run. Start by
imagining qemu as having the mother of all page fault latencies. The cost of
faulting code into the L2 cache includes dynamic recompilation, which is
expensive.
Worse, when the dynamic recompilation buffer fills up it blanks the whole
thing, and recompiles every new page it hits one at a time until the buffer
fills up again. (What is it these days, 16 megs of translated code before it
resets?) No LRU or anything, no cache management at _all_, just "when the
bucket fills up, dump it and start over". (Well, that's what it did back
around the last stable release anyway. It has been almost a year since then,
so maybe it's changed. I've been busy with other things and not really
keeping track of changes that didn't affect what I could and couldn't get to
run.)
So anyway, depending on what code you run in what order, the performance can
_differ_ from one run to the next due to when the cache gets blanked and
stuff gets retranslated. By a lot. There's no obvious way to predict this
or control it. And the "software" clock inside your emulated system can lie
to you about it if timer interrupts get deferred.
All this should pretty much average out if you do something big with lots of
execs (like build a linux kernel from source). But if you do something small
expect serious butterfly effects. Expect microbenchmarks to swing around
wildly.
Quick analogy: you know the performance difference faulting your executable in
from disk vs running it out of cache? Imagine a daemon that makes random
intermittent calls to "echo 1 > /proc/sys/vm/drop_caches", and now try to do
a sane benchmark. No matter what you use to measure, what you're measuring
isn't going to be consistent from one run to the next.
Performance should be better (and more stable) with kqemu or kvm. Maybe that
you can benchmark sanely, I wouldn't know. Ask somebody else. :)
P.S. Take the above with a large grain of salt, I'm not close to an expert in
this area...
Rob
--
"One of my most productive days was throwing away 1000 lines of code."
- Ken Thompson.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] performance monitor
2008-01-04 8:49 ` Rob Landley
@ 2008-01-04 15:09 ` Clemens Kolbitsch
0 siblings, 0 replies; 10+ messages in thread
From: Clemens Kolbitsch @ 2008-01-04 15:09 UTC (permalink / raw)
To: Rob Landley; +Cc: qemu-devel, Paul Brook paul
On Friday 04 January 2008 09:49:22 Rob Landley wrote:
> On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote:
> > Does anyone have an idea on how I can measure performance in qemu to a
> > somewhat accurate level?
>
> hwclock --show > time1
> tar xvjf linux-2.6.23.tar.bz2 && cd linux-2.6.23 && make allnoconfig &&
> make cd ..
> hwclock --show > time2
>
> Do that on host and client, and you've got a ratio of the performance of
> qemu to your host that should be good to within a few percent.
>
> > I have modified qemu (the memory handling) and the
> > linux kernel and want to find out the penalty this introduced... does
> > anyone have any comments / ideas on this?
>
> If it's something big, you can compare the result in minutes and seconds.
> That's probably the best you're going to do. (Although really you want
> hwclock --show before and after, and then do the math. That tunnels out to
> the host system to get its idea of the time, which doesn't get thrown off
> by timer interrupt delivery (as a signal) getting deferred by the host
> system's scheduler. Of course the fact that hwclock _takes_ a second or so
> to read the clock is a bit of a downer, but anything that takes less than a
> minute or so to run isn't going to give you a very accurate time because
> the performance of qemu isn't constant, and your results are going to skew
> all over the place.
>
> Especially for small things, the performance varies from run to run. Start
> by imagining qemu as having the mother of all page fault latencies. The
> cost of faulting code into the L2 cache includes dynamic recompilation,
> which is expensive.
>
> Worse, when the dynamic recompilation buffer fills up it blanks the whole
> thing, and recompiles every new page it hits one at a time until the buffer
> fills up again. (What is it these days, 16 megs of translated code before
> it resets?) No LRU or anything, no cache management at _all_, just "when
> the bucket fills up, dump it and start over". (Well, that's what it did
> back around the last stable release anyway. It has been almost a year
> since then, so maybe it's changed. I've been busy with other things and
> not really keeping track of changes that didn't affect what I could and
> couldn't get to run.)
>
> So anyway, depending on what code you run in what order, the performance
> can _differ_ from one run to the next due to when the cache gets blanked
> and stuff gets retranslated. By a lot. There's no obvious way to predict
> this or control it. And the "software" clock inside your emulated system
> can lie to you about it if timer interrupts get deferred.
>
> All this should pretty much average out if you do something big with lots
> of execs (like build a linux kernel from source). But if you do something
> small expect serious butterfly effects. Expect microbenchmarks to swing
> around wildly.
>
> Quick analogy: you know the performance difference faulting your executable
> in
>
> >from disk vs running it out of cache? Imagine a daemon that makes random
>
> intermittent calls to "echo 1 > /proc/sys/vm/drop_caches", and now try to
> do a sane benchmark. No matter what you use to measure, what you're
> measuring isn't going to be consistent from one run to the next.
>
> Performance should be better (and more stable) with kqemu or kvm. Maybe
> that you can benchmark sanely, I wouldn't know. Ask somebody else. :)
>
> P.S. Take the above with a large grain of salt, I'm not close to an expert
> in this area...
:-)
Ok. What you've said pretty much covers how I've made up my mind in the last
couple of hours trying to think about the problem *g*
Guess I'll have to be happy counting TLB misses and page faults, adding up
executed instructions (in user/kernel mode) per process and doing some timing
stuff... then running the examples a lot of times, making an average of all
numbers and finally just ignoring them since I *know* that they are bogus ;-)
No, seriously... I understand the problem, but I think the above is the best I
can do since I'm really only interested in the effekt it has on QEMU for the
moment :-)
Thanks again for your ideas!!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-01-04 15:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-03 20:36 [Qemu-devel] performance monitor Clemens Kolbitsch
2008-01-03 21:29 ` Paul Brook
2008-01-03 21:38 ` Clemens Kolbitsch
2008-01-03 22:07 ` Paul Brook
2008-01-03 22:11 ` Clemens Kolbitsch
2008-01-03 22:18 ` Paul Brook
2008-01-03 22:21 ` Clemens Kolbitsch
2008-01-03 22:19 ` Laurent Desnogues
2008-01-04 8:49 ` Rob Landley
2008-01-04 15:09 ` Clemens Kolbitsch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).