Re: [Qemu-devel] TCG flow vs dyngen

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefano Bonifazi <stefboombastic@gmail.com>
To: Rob Landley <rob@landley.net>
Cc: "Raphaël Lefèvre" <taylor.lefevre@gmail.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] TCG flow vs dyngen
Date: Sun, 23 Jan 2011 23:25:55 +0100	[thread overview]
Message-ID: <4D3CAAF3.2080600@gmail.com> (raw)
In-Reply-To: <4D3CA28C.5080907@landley.net>

On 01/23/2011 10:50 PM, Rob Landley wrote:
> On 01/16/2011 10:01 AM, Raphaël Lefèvre wrote:
>> On Sun, Jan 16, 2011 at 11:21 PM, Stefano Bonifazi
>> <stefboombastic@gmail.com>  wrote:
>> 2. "how can I check the number of target cpu cycles or target
>> instructions executed inside qemu-user (i.e. qemu-ppc)?
>> Is there any variable I can inspect for such informations?" at Dec, 2010
> Keep in mind I'm a bit rusty and not an expert, but I'll give a stab at
> answering:
>
> You can't, because QEMU doesn't work that way.  QEMU isn't an
> instruction level emulator, it's closer to a Java JIT.  It doesn't
> translate one instruction at a time but instead translates large blocks
> of code all at once, and keeps a cache of translated blocks around.
> Execution jumps into each block and either waits for it to exit again
> (meaning it jumped out of that page and QEMU's main execution loop has
> to look up what page to execute next, possibly translating it first if
> it's not in the cache yet), or else QEMU interrupts it after while to
> fake an IRQ of some kind (such as a timer interrupt).
>
> You may want to read Fabrice Bellard's original paper on the QEMU design:
>
> http://www.usenix.org/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf
>
> Since that was written, dyngen was replaced with tcg, but that does the
> same thing in a slightly different way.
>
> Building a QEMU with dyngen support used to use the host compiler to
> compile chunks of code corresponding to the target operations it would
> see at runtime, and then strip the machine language out of the resulting
> .o files and save them in a table.  Then at runtime dyngen could
> generate translated pages by gluing together the resulting saved machine
> language snippets the host compiler had produced when qemu was built.
> The problem was, beating the right kind of machine language snippets out
> of the .o files the compiler produced from the example code turned out
> to be VERY COMPILER DEPENDENT.  This is why you couldn't build qemu with
> gcc 4.x for the longest time, gcc's code generator and the layout of the
> .o files changed in a bunch of subtle ways which broke dyngen's ability
> to extract usable machine code snippets to put 'em into the table so it
> could translate pages at runtime.
>
> TCG stands for "Tiny Code Generator".  It just hardwires a code
> generator into QEMU.  They wrote a mini-compiler in C, which knows what
> instructions to output for each host qemu supports.  If QEMU understands
> target instructions well enough to _read_ them, it's not a big stretch
> to be able to _write_ them when running on that kind of host.  (It's
> more or less the same operation in reverse.)  This means that QEMU can
> no longer run on a type of host it can't execute target code for, but
> the solution is to just add support for all the interesting machines out
> there, on both sides.
>
> So, when QEMU executes code, the virtual MMU faults a new page into the
> virtual TLB, and goes "I can't execute this, fix it up!"  And the fixup
> handler looks for a translation of the page in the cache of translated
> pages, and if it can't find it it calls the translator to convert the
> target code into a page of corresponding host code.  Which may involve
> discarding an existing entry out of the cache, but this is how
> instruction caches work on real hardware anyway so the delays in QEMU
> are where they'd be on real hardware anyway, and optimizing for one is
> pretty close to optimizing for the other, so life is good.
>
> The chunk you found earlier is a function pointer typecast:
>
> #define tcg_qemu_tb_exec(tb_ptr) \
>    ((long REGPARM (*)(void *))code_gen_prologue)(tb_ptr)
>
> Which looks like it's calling code_gen_prologue() with tp_ptr as its
> argument (typecast to a void *), and it returns a long.  That calls a
> translated page, and when the function returns that means the page of
> code needs to jump to code somewhere outside of that page, and we go
> back to the main loop to figure out where to go next.
>
> The reason QEMU is as fast as it is is because once it has a page of
> translated code, actually _running_ it is entirely native.  It jumps
> into the page, and executes natively until it leaves the page.   Control
> only goes back to QEMU to switch pages or to handle I/O and interrupts
> and such.  So when you ask "how many clock cycles did that instruction
> take", the answer is "it doesn't work that way".  QEMU emulates at
> memory page level (generally 4k of target code), not at individual
> instruction level.
>
> (Oh, and the worst thing you can do to QEMU from a performance
> perspective is self-modifying code.  Because the virtual MMU has to
> strip the executable bit off the TLB entry and re-translate the entire
> page next time something tries to execute it.  It _works_, it's just
> slow.  But again, real hardware can hiccup a bit on this too.)
>
> Does that answer your question?
>
> Rob
Wow! Thank you! That's an ANSWER!
Gold for who's studying all of that! Though at the stage of my work I 
had to "understand" almost all of it, your perfect summary make 
everything much clearer..
About counting instructions I found that counting the instructions of 
each executed TB was a very good approximation, sure the cache represent 
a major problem, already translated TB can't be counted that way.. I'd 
like to disable the cache, but the parameter singlestep doesn't seem to 
work for qemu-user.
  Right now I am stuck with another problem .. maybe with your 
experience you can tell me whether it is possible at all..
I am trying to shift in memory the target executable .. now the code is 
"supposed" to be loaded by the elfloader at the exact start address set 
at link time ..
Inside elfloader there is even a check for verifying whether that 
address range is busy.. but no action is taken in that case o.O
Maybe I'll post a new thread about this problem (bug?) .. anyway if you 
think you can help me anyway I'll give you further details..
Thank you really very much again for your great explanation!
Best Regards!
Stefano B.

next prev parent reply	other threads:[~2011-01-23 22:25 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-16 14:46 [Qemu-devel] TCG flow vs dyngen Raphael Lefevre
2011-01-16 15:21 ` Stefano Bonifazi
2011-01-16 16:01   ` Raphaël Lefèvre
2011-01-16 16:43     ` Stefano Bonifazi
2011-01-16 18:29       ` Peter Maydell
2011-01-16 19:02         ` Stefano Bonifazi
2011-01-16 19:24           ` Peter Maydell
2011-01-24 13:20             ` [Qemu-devel] " Stefano Bonifazi
2011-01-16 20:50           ` [Qemu-devel] " Stefano Bonifazi
2011-01-16 21:08             ` Raphaël Lefèvre
2011-01-24 12:35               ` [Qemu-devel] " Stefano Bonifazi
2011-01-17 11:59             ` [Qemu-devel] " Lluís
2011-01-24 12:31               ` [Qemu-devel] " Stefano Bonifazi
2011-01-24 13:36                 ` Lluís
2011-01-24 14:00                   ` Stefano Bonifazi
2011-01-24 15:06                     ` Lluís
2011-01-24 17:23                       ` Stefano Bonifazi
2011-01-24 18:12                         ` Lluís
2011-01-16 19:16       ` [Qemu-devel] " Raphaël Lefèvre
2011-01-23 21:50     ` Rob Landley
2011-01-23 22:25       ` Stefano Bonifazi [this message]
2011-01-23 23:40         ` Rob Landley
2011-01-24 10:17           ` Stefano Bonifazi
2011-01-24 18:20             ` Rob Landley
2011-01-24 21:16               ` Stefano Bonifazi
2011-01-25  1:19                 ` Rob Landley
2011-01-25  8:53                   ` Stefano Bonifazi
2011-01-24 14:32       ` Peter Maydell
2011-01-24 14:56         ` Stefano Bonifazi
2011-01-24 15:15           ` Lluís
2011-01-24 18:02           ` Dushyant Bansal
2011-01-24 19:38             ` Stefano Bonifazi
2011-01-25  7:56               ` Dushyant Bansal
2011-01-25  9:04                 ` Stefano Bonifazi
2011-01-25  9:05                   ` Edgar E. Iglesias
2011-01-25  9:28                     ` Stefano Bonifazi
  -- strict thread matches above, loose matches on Subject: below --
2010-12-10 21:26 Stefano Bonifazi
2010-12-11 11:02 ` Blue Swirl
2010-12-11 12:29   ` Stefano Bonifazi
2010-12-11 13:11     ` Blue Swirl
2010-12-11 14:32       ` Stefano Bonifazi
2010-12-11 14:44         ` Blue Swirl
2010-12-14 20:17           ` Stefano Bonifazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D3CAAF3.2080600@gmail.com \
    --to=stefboombastic@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rob@landley.net \
    --cc=taylor.lefevre@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).