* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
@ 2006-09-12 15:08 Laurent DESNOGUES
2006-09-12 15:19 ` Markus Schiltknecht
0 siblings, 1 reply; 13+ messages in thread
From: Laurent DESNOGUES @ 2006-09-12 15:08 UTC (permalink / raw)
To: Markus Schiltknecht, qemu-devel
> > The most complex thing to accurately simulate a modern
> > CPU (including ARMs) is the data cache and by far.
>
> Hm... you have to elaborate on that one. Aren't those caches like other
> caches, too? With well known algorithms like LRU?
Data caches typically do many things in one cycle; for
instance, if you make a load, it could start looking in
a small area between the cache and the core (called
write or store buffer) and at the same time look into
the real cache to find the data; then depending on the
outcome, an external request could be started, this
cycle or later depending on previous requests still
pending. And this is a simple example ;)
On top of that try to find a specification for data
side behaviour, these beasts are not documented for
two reasons:
- they are heavily optimized and so not easily
described
- they often define the efficiency of a CPU and
so are considered as secret.
> Simulating branch prediction seems more complex to me (probably because
> I'm thinking x86, not ARM).
Branch prediction has become very complex on ARM
but not as much as data side.
Laurent
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 15:08 [Qemu-devel] ARM CPU Speed simulated by Qemu? Laurent DESNOGUES
@ 2006-09-12 15:19 ` Markus Schiltknecht
0 siblings, 0 replies; 13+ messages in thread
From: Markus Schiltknecht @ 2006-09-12 15:19 UTC (permalink / raw)
To: laurent.desnogues; +Cc: qemu-devel
Laurent DESNOGUES wrote:
> On top of that try to find a specification for data
> side behaviour, these beasts are not documented for
> two reasons:
> - they are heavily optimized and so not easily
> described
> - they often define the efficiency of a CPU and
> so are considered as secret.
That might be the hardest part. Simulating any level of caches should
not be _that_ hard. (And a write or store buffer looks just exactly like
yet another cache).
>> Simulating branch prediction seems more complex to me (probably because
>> I'm thinking x86, not ARM).
>
> Branch prediction has become very complex on ARM
> but not as much as data side.
Sorry, I meant to refer to the pipeline, which is significantly shorter
on ARM than on the NetBurst CPUs from Intel... :-)
Thank you for your help.
Markus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
@ 2006-09-12 15:26 Laurent DESNOGUES
0 siblings, 0 replies; 13+ messages in thread
From: Laurent DESNOGUES @ 2006-09-12 15:26 UTC (permalink / raw)
To: Markus Schiltknecht; +Cc: qemu-devel
> > On top of that try to find a specification for data
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
@ 2006-09-12 14:44 Laurent DESNOGUES
2006-09-12 14:58 ` Markus Schiltknecht
2006-09-12 17:42 ` K. Richard Pixley
0 siblings, 2 replies; 13+ messages in thread
From: Laurent DESNOGUES @ 2006-09-12 14:44 UTC (permalink / raw)
To: qemu-devel
> Now, CPUs is where I have only a vague idea of what would be needed to
> simulate. I know there are up to three levels of caches and main memory,
> which all have different access times. The CPU itself has a pipeline and
> branch prediction and such which could invalidate the contents of
> pipeline up to a given point (of branching).
>
> I think the most time consuming operation which should be properly
> simulated is memory access. For this to work properly, all levels of
> caches must be emulated, too.
>
> How much do misses on the branch prediction level cost? How much
> pipeline interlocks? I don't think those would be _that_ dramatic. Since
> today's compilers are said to be optimizing quite well...
The most complex thing to accurately simulate a modern
CPU (including ARMs) is the data cache and by far. In
comparison, getting accurate core pipeline simulation
is *very* easy.
There is a company that claims to be able to accurately
simulate an at 200 Mhz (http://www.vastsystems.com). I
bet there are using statistical cycle counting and so
are probably very wrong :)
Laurent
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 14:44 Laurent DESNOGUES
@ 2006-09-12 14:58 ` Markus Schiltknecht
2006-09-12 17:42 ` K. Richard Pixley
1 sibling, 0 replies; 13+ messages in thread
From: Markus Schiltknecht @ 2006-09-12 14:58 UTC (permalink / raw)
To: laurent.desnogues, qemu-devel
Hi,
Laurent DESNOGUES wrote:
> The most complex thing to accurately simulate a modern
> CPU (including ARMs) is the data cache and by far.
Hm... you have to elaborate on that one. Aren't those caches like other
caches, too? With well known algorithms like LRU?
> In
> comparison, getting accurate core pipeline simulation
> is *very* easy.
Simulating branch prediction seems more complex to me (probably because
I'm thinking x86, not ARM).
What to others think about the network and hard disk simulation I've
mention in my previous post?
Regards
Markus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 14:44 Laurent DESNOGUES
2006-09-12 14:58 ` Markus Schiltknecht
@ 2006-09-12 17:42 ` K. Richard Pixley
1 sibling, 0 replies; 13+ messages in thread
From: K. Richard Pixley @ 2006-09-12 17:42 UTC (permalink / raw)
To: qemu-devel
I have at least one 450Mhz k6 in my spare bedroom. I'll by happy to
sell it to you as a platform for running debian and qemu. I'm sure it's
performance would be lower than most of the current amd processors,
though it might not be slower than some of the current intel chips,
(*grin*).
--rich
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] Re: qemu-system-sparc video problem on 16 bitdisplays
@ 2006-09-12 0:17 Stuart Brady
2006-09-12 7:20 ` [Qemu-devel] ARM CPU Speed simulated by Qemu? Tieu Ma Dau
0 siblings, 1 reply; 13+ messages in thread
From: Stuart Brady @ 2006-09-12 0:17 UTC (permalink / raw)
To: qemu-devel
On Fri, Sep 08, 2006 at 09:26:52PM +0200, Blue Swirl wrote:
> Implementing full audio support for CS4231 would not be too difficult. Is
> this chip used anywhere else? The data sheet mentions some ISA card and
> Windows 3.1.
The "Windows Sound System" cards apparently used it.
--
Stuart Brady
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 0:17 [Qemu-devel] Re: qemu-system-sparc video problem on 16 bitdisplays Stuart Brady
@ 2006-09-12 7:20 ` Tieu Ma Dau
2006-09-12 11:03 ` nyos
2006-09-12 12:43 ` Paul Brook
0 siblings, 2 replies; 13+ messages in thread
From: Tieu Ma Dau @ 2006-09-12 7:20 UTC (permalink / raw)
To: qemu-devel
Hi all,
I found that Qemu ARM system simulates ARM926EJ-S and
ARM1026EJ-S processor. And I found on ARM website that
the speed of these CPUs vary from 266 to 540 MHz.
Could you tell me the exact speed of the ARM926EJ-S
and ARM1026EJ-S processor simulated by Qemu? It's very
important for me to finish my report.
Thanks and best regards,
Tieu
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 7:20 ` [Qemu-devel] ARM CPU Speed simulated by Qemu? Tieu Ma Dau
@ 2006-09-12 11:03 ` nyos
2006-09-12 12:43 ` Paul Brook
1 sibling, 0 replies; 13+ messages in thread
From: nyos @ 2006-09-12 11:03 UTC (permalink / raw)
To: qemu-devel
Tieu Ma Dau wrote:
> Hi all,
> I found that Qemu ARM system simulates ARM926EJ-S and
> ARM1026EJ-S processor. And I found on ARM website that
> the speed of these CPUs vary from 266 to 540 MHz.
> Could you tell me the exact speed of the ARM926EJ-S
> and ARM1026EJ-S processor simulated by Qemu? It's very
> important for me to finish my report.
> Thanks and best regards,
> Tieu
Hi,
You can guess cpu speed by running some benchmarks, but it's impossible
to tell exactly.
Qemu translates arm blocks into host cpu instructions dynamically at
run-time, and executes them. So, the processor doesn't run the same
instructions, only the instructions whose effect is the same on the
other architecture.
Nyos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 7:20 ` [Qemu-devel] ARM CPU Speed simulated by Qemu? Tieu Ma Dau
2006-09-12 11:03 ` nyos
@ 2006-09-12 12:43 ` Paul Brook
2006-09-12 13:19 ` Markus Schiltknecht
1 sibling, 1 reply; 13+ messages in thread
From: Paul Brook @ 2006-09-12 12:43 UTC (permalink / raw)
To: qemu-devel
On Tuesday 12 September 2006 08:20, Tieu Ma Dau wrote:
> Hi all,
> I found that Qemu ARM system simulates ARM926EJ-S and
> ARM1026EJ-S processor. And I found on ARM website that
> the speed of these CPUs vary from 266 to 540 MHz.
> Could you tell me the exact speed of the ARM926EJ-S
> and ARM1026EJ-S processor simulated by Qemu? It's very
> important for me to finish my report.
Qemu is not cycle accurate. There is no particularly meaningful way to
translate between qemu performance and how fast something will run on real
hardware.
Modern CPUs are complicated, with many factors effecting execution speed
(pipeline interlocks, multiple levels of cache). A cycle accurate simulator
will generally be orders of magnitude slower than qemu.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 12:43 ` Paul Brook
@ 2006-09-12 13:19 ` Markus Schiltknecht
2006-09-12 13:39 ` Paul Brook
0 siblings, 1 reply; 13+ messages in thread
From: Markus Schiltknecht @ 2006-09-12 13:19 UTC (permalink / raw)
To: qemu-devel
Paul Brook wrote:
> Modern CPUs are complicated, with many factors effecting execution speed
> (pipeline interlocks, multiple levels of cache). A cycle accurate simulator
> will generally be orders of magnitude slower than qemu.
Would it be feasible to just limit the number of instructions qemu
simulates? Of course that's not very precise and most probably far from
cycle accurate. But it would allow to easily scale down a virtual
machine. Which then would allow comparable benchmarks or such.
One step, somewhat more accurate would be to weight all the different
assembler commands with known (measured) average execution times. Taking
into account 40% cache misses, for example. (Hm.. but since cache misses
are very expensive, that might not lead to a much better approximation,
I guess)
Any documents or discussion threads related to that topic? Other
projects trying to achieve that?
Since my benchmarking application is mostly disk- and network-I/O bound,
thus I'll first try to scale that down, which seems far easier to do
(much less performance penalty for that).
Regards
Markus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 13:19 ` Markus Schiltknecht
@ 2006-09-12 13:39 ` Paul Brook
2006-09-12 14:21 ` Markus Schiltknecht
0 siblings, 1 reply; 13+ messages in thread
From: Paul Brook @ 2006-09-12 13:39 UTC (permalink / raw)
To: qemu-devel
On Tuesday 12 September 2006 14:19, Markus Schiltknecht wrote:
> Paul Brook wrote:
> > Modern CPUs are complicated, with many factors effecting execution speed
> > (pipeline interlocks, multiple levels of cache). A cycle accurate
> > simulator will generally be orders of magnitude slower than qemu.
>
> Would it be feasible to just limit the number of instructions qemu
> simulates? Of course that's not very precise and most probably far from
> cycle accurate. But it would allow to easily scale down a virtual
> machine. Which then would allow comparable benchmarks or such.
>
> One step, somewhat more accurate would be to weight all the different
> assembler commands with known (measured) average execution times. Taking
> into account 40% cache misses, for example. (Hm.. but since cache misses
> are very expensive, that might not lead to a much better approximation,
> I guess)
I'd be surprised if you managed to get any sort of reliable results.
Most of what you described can be achieved with profiling and static analysis.
You could maybe get order-of-magnitude estimates (ie. do you need a 20MHz cpu
or a 2GHz cpu), but I certainly wouldn't trust the results for deciding
between eg. 500MHz and 200MHz cores.
IMHO a benchmarking setup that doesn't reliably correspond to real system
performance is worse than useless.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 13:39 ` Paul Brook
@ 2006-09-12 14:21 ` Markus Schiltknecht
2006-09-12 14:34 ` Paul Brook
0 siblings, 1 reply; 13+ messages in thread
From: Markus Schiltknecht @ 2006-09-12 14:21 UTC (permalink / raw)
To: Paul Brook; +Cc: qemu-devel
Paul Brook wrote:
> IMHO a benchmarking setup that doesn't reliably correspond to real system
> performance is worse than useless.
Agreed. So let's see what's needed to get a reliably corresponding
system. I'm interested in three layers: CPU, hard disk and network.
Networking is the simplest, I think. I would need to be able to define
throughput and latency and be able to simulate network collisions (i.e.
packet loss percentage).
Emulating a hard disk is already more complicated. One could likewise
define throughput and latency (= avg. seek time). But in most cases that
would not be sufficient, because depending on the head position, seek
time may vary wildly. Thus more like a rotation speed and head position
emulation is necessary. Most disks today also have different zones with
varying data density (more sectors on the outer tracks). And last but
not least, there are caches in most drives...
Now, CPUs is where I have only a vague idea of what would be needed to
simulate. I know there are up to three levels of caches and main memory,
which all have different access times. The CPU itself has a pipeline and
branch prediction and such which could invalidate the contents of
pipeline up to a given point (of branching).
I think the most time consuming operation which should be properly
simulated is memory access. For this to work properly, all levels of
caches must be emulated, too.
How much do misses on the branch prediction level cost? How much
pipeline interlocks? I don't think those would be _that_ dramatic. Since
today's compilers are said to be optimizing quite well...
I agree that implementing all of that would be a significant amount of
work, already. What else do I miss?
Regards
Markus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] ARM CPU Speed simulated by Qemu?
2006-09-12 14:21 ` Markus Schiltknecht
@ 2006-09-12 14:34 ` Paul Brook
0 siblings, 0 replies; 13+ messages in thread
From: Paul Brook @ 2006-09-12 14:34 UTC (permalink / raw)
To: qemu-devel
> How much do misses on the branch prediction level cost? How much
> pipeline interlocks? I don't think those would be _that_ dramatic. Since
> today's compilers are said to be optimizing quite well...
It all depends on the code you're running. Certainly branch prediction can
have a major effect if the hardware consistently gets it wrong. Even arm
hardware has 5/7 stage pipelines that need flushing on a mispredict.
Pipeline interlocks are likely to be relatively small, for most Arm hardware
at least (ia64 is a completely different story :-). As a compiler author I'd
generally expect a few % performance improvement from a good scheduling
model.
I wouldn't assuming the compiler gets everything right, as it's quite common
for systems to be built for the lowest common denominator target.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-09-12 17:43 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-12 15:08 [Qemu-devel] ARM CPU Speed simulated by Qemu? Laurent DESNOGUES
2006-09-12 15:19 ` Markus Schiltknecht
-- strict thread matches above, loose matches on Subject: below --
2006-09-12 15:26 Laurent DESNOGUES
2006-09-12 14:44 Laurent DESNOGUES
2006-09-12 14:58 ` Markus Schiltknecht
2006-09-12 17:42 ` K. Richard Pixley
2006-09-12 0:17 [Qemu-devel] Re: qemu-system-sparc video problem on 16 bitdisplays Stuart Brady
2006-09-12 7:20 ` [Qemu-devel] ARM CPU Speed simulated by Qemu? Tieu Ma Dau
2006-09-12 11:03 ` nyos
2006-09-12 12:43 ` Paul Brook
2006-09-12 13:19 ` Markus Schiltknecht
2006-09-12 13:39 ` Paul Brook
2006-09-12 14:21 ` Markus Schiltknecht
2006-09-12 14:34 ` Paul Brook
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).