qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] Profiling Qemu for speed?
@ 2005-04-18  8:35 Daniel J Guinan
  2005-04-18  9:51 ` Ian Rogers
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel J Guinan @ 2005-04-18  8:35 UTC (permalink / raw)
  To: qemu-devel

This conversation, below, is very interesting.  It is precisely this
part of QEMU that fascinates me and potentially holds the most promise
for performance gains.  I have even imagined using a genetic algorithm
to discover optimal block-sizes and instruction re-ordering and
whatnot.  This could be done in order to generate translation tables of
guest instruction sequences and host translated instruction sequences.
Even if  only a handful of very common sequences were translated in
this fashion, the potential speedups are enormous.

Before even discussing the exotic possibilities, however, we need to
figure out what is possible within the framework of the current QEMU
translation system.  Rewiring QEMU to support translating sequences
(blocks of instructions) rather than single instructions may or may not
be necessary.  It should be rather simple to instrument QEMU to keep
track of the most common sequences in order to figure out if there are,
in fact, sequences that show up with a high enough frequency to make
this endeavor worthwhile (I would think the answer would be yes). 
Then, someone skilled in machine code for the host and guest could take
a stab at hand-coding the translation for the most common couple of
sequences to see how the performance gains come out.

I would love to see some work in this direction and would be willing to
help, although my skills are limited in x86 machine.

-Daniel

> One thought would be to have a peephole optimizer that looks back
> over
> the just translated basic block (or a state machine that matches such
> sequences as an on-line algorithm) and match against common, known
> primitive sequences, and replaces them with optimized versions.
> 
> The kind of profiling you would want to do here is to run, say,
> windows
> and take a snapshot of the dynamic code cache, and look for common
> instruction sequences. Ideally, you could write some software to do
> this
> automatically.
> 
> Anyway, I'm sure there are lots of other ideas laying around.
> 
> 
> -- John.
> 
> Another thing I've thought about is checking what sequences of 
> instructions often appear in x86 programs (such as e.g. "push %ebp; 
> movl %esp, %ebp") and then creating C-functions which emulate such an
> 
> antire block, so they can be optimized as a whole by gcc. That would 
> give a similar performance gain on all supported targets, and not
> just 
> on the one you created the peephole optimizer for (+ less work to 
> debug).
> 
> The only possible downside is that you can't jump to a particular 
> instruction in such a block (the same goes for several kinds of 
> peephole optimizations though). I don't know yet how Qemu exactly
> keeps 
> track of the translations it has already performed, whether it
> supports 
> multiple existing translations of the same instruction and/or whether
> 
> it can already automatically invalidate the old block in case it
> turns 
> out it needs to be splitted and thus re-translated (I guess it should
> 
> at least some of these things, since it theory an x86 could jump into
> 
> the middle of an instruction in order to reinterpret the bytes as 
> another instruction stream).
> 
> 
> Jonas
> 
> 
> Unfortunately it's not that simple. The push instruction may cause an
> 
> exception. Whatever optimizations you apply you've got to make sure
> that the 
> guest state is still consistent when the exception occurs.
> 
> Paul
> 
> 
> If we just concatenate the C code of the two procedures, won't gcc
> take 
> care of that for us? Or could scheduling mess this up? Maybe there's
> a 
> switch to avoid having it reschedule instructions in a way that side 
> effects happen in a different order? (that would still give us the 
> advantage of CSE and peephole optimizations)
> 
> 
> Jonas
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [Qemu-devel] Profiling Qemu for speed?
@ 2005-04-18 11:24 Daniel J Guinan
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel J Guinan @ 2005-04-18 11:24 UTC (permalink / raw)
  To: qemu-devel


> > I was up until 3:00am studying Qemu, and I came to the conclusion
> that
> > it doesn't make sense to try speeding up the output code, at least
> not
> > yet. A peephole optimizer or hand-coded sequences made to handle
> common
> > combinations of instructions would lead to the problems discussed
> here:
> > exceptions happening at the right time, self-modifying code, etc.
> 
> Well yeah... I didn't say it would be easy :-)

I would like to suggest that perhaps it would be useful to experiment
anyway.  Self-modifying code cannot be that prevalent (unless you are a
virus writer - which, yes, means that probably any standard Windows
installation of any maturity would stop working ;-), exceptions are
potentially more dangerous and branching into a translated blocks would
need to be dealt with by (probably) retranslating on an
instruction-by-instruction level and then allowing the branch to occur
into the traditional translation.

At any rate, making this a compile-time (or even runtime) option would
allow for basic experimentation.  Until we can go from instruction
translation to logic translation, this system will never fullfill it's
potential.  Instruction translation can maintain the property of system
correctness, which is indeed broken by logic translation (e.g. perhaps
exceptions don't occur when they normally would, or need to be handled
and translated resulting in slightly different behavior).  However, I
would like to assert that with the proper balance and approach, exact
system correctness may not be strictly neccessary (I am thinking of
certain exception cases) and logic translation has the potential to be
wickidly faster (if the PPC can do something in 6 steps that takes an
x86 15 steps, but there is an outside chance of a division by zero
exception, why do it in 87 steps? Shouldn't we at least try to see if
we can make it work in 6? or 12?).

> > Worse, the translator might have to spend so much time doing this
> that
> > the result would actually be slower execution.
> 
> Not if you do it right. Remember. by and large you only incur a
> one-time
> hit, and after that you're going as fast as your dynamic translator
> is
> smart.
> 
> Here's a heuristic for you: optimize for the general case, not corner
> cases. You still have to handle the corner cases, but they are corner
> cases for a reason: you don't run into them as often.

Exactly, as I hinted at in a previous post, it may turn out that some
sequence of (for example) 5 instructions is so common in all software
that the act of hand coding a replacement for those 5 instructions
could result in a noticible overall performance improvement.  And then
you can start scaling the performance up from there.  I will guarantee
that we can find some sequences that do not have problems with
exceptions that would be candidates (guaranteeing on branch and
self-modifying is not as easy).  But not to try? 

-Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread
* [Qemu-devel] Profiling Qemu for speed?
@ 2005-04-17  5:58 Joe Luser
  2005-04-17  8:21 ` John R. Hogerhuis
  2005-04-17 10:36 ` Paul Brook
  0 siblings, 2 replies; 18+ messages in thread
From: Joe Luser @ 2005-04-17  5:58 UTC (permalink / raw)
  To: qemu-devel

Hi folks,

I've been using Qemu on the Mac for a few days now; Several OSes
running (including Windoze), and I'm impressed. The source looks pretty
clean, too.

Has anyone done any profiling work to see where Qemu spends most of its
processing time? It is fast running x86 code on the PPC, but I'd like
to find ways to speed things up even more.

If you know of areas that will make a big difference in speed, please
let me know.

-Nathaniel G H



		
__________________________________ 
Do you Yahoo!? 
Make Yahoo! your home page 
http://www.yahoo.com/r/hs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-04-18 14:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-18  8:35 [Qemu-devel] Profiling Qemu for speed? Daniel J Guinan
2005-04-18  9:51 ` Ian Rogers
2005-04-18 13:44   ` Daniel Egger
2005-04-18 14:12     ` Christian MICHON
2005-04-18 14:29       ` Ian Rogers
2005-04-18 14:19     ` Ian Rogers
2005-04-18 14:40     ` Paul Brook
  -- strict thread matches above, loose matches on Subject: below --
2005-04-18 11:24 Daniel J Guinan
2005-04-17  5:58 Joe Luser
2005-04-17  8:21 ` John R. Hogerhuis
2005-04-17  8:59   ` Jonas Maebe
2005-04-17 10:27     ` Paul Brook
2005-04-17 10:46       ` Jonas Maebe
2005-04-18  1:36         ` Nathaniel G H
2005-04-18  2:11           ` John R. Hogerhuis
2005-04-18  2:39           ` André Braga
2005-04-18  4:31             ` Karl Magdsick
2005-04-17 10:36 ` Paul Brook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).