Hi all!
  From the technical documentation 
(http://www.usenix.org/publications/library/proceedings/usenix05/tech/freenix/bellard.html) 
I read:

> The first step is to split each target CPU instruction into fewer 
> simpler instructions called /micro operations/. Each micro operation 
> is implemented by a small piece of C code. This small C source code is 
> compiled by GCC to an object file. The micro operations are chosen so 
> that their number is much smaller (typically a few hundreds) than all 
> the combinations of instructions and operands of the target CPU. The 
> translation from target CPU instructions to micro operations is done 
> entirely with hand coded code. 
> A compile time tool called dyngen uses the object file containing the 
> micro operations as input to generate a dynamic code generator. This 
> dynamic code generator is invoked at runtime to generate a complete 
> host function which concatenates several micro operations. 
instead from wikipedia(http://en.wikipedia.org/wiki/QEMU) and other 
sources I read:

> The Tiny Code Generator (TCG) aims to remove the shortcoming of 
> relying on a particular version of GCC 
> <http://en.wikipedia.org/wiki/GNU_Compiler_Collection> or any 
> compiler, instead incorporating the compiler (code generator) into 
> other tasks performed by QEMU in run-time. The whole translation task 
> thus consists of two parts: blocks of target code (/TBs/) being 
> rewritten in *TCG ops* - a kind of machine-independent intermediate 
> notation, and subsequently this notation being compiled for the host's 
> architecture by TCG. Optional optimisation passes are performed 
> between them.
- So, I think that the technical documentation is now obsolete, isn't it?

- The "old way" used much offline (compile time) work compiling the 
micro operations into host machine code, while if I understand well, TCG 
does everything in run-time(please correct me if I am wrong!).. so I 
wonder, how can it be as fast as the previous method (or even faster)?

- If I understand well, TGC runtime flow is the following:
     - TCG takes the target binary, and splits it into target blocks
     - if the TB is not cached, TGC translates it (or better the target 
instructions it is composed by) into TCG micro ops,
     - TGC compiles TGC uops into host object code,
     - TGC caches the TB,
     - TGC tries to chain the block with others,
     - TGC copies the TB into the execution buffer
     - TGC runs it
Am I right? Please correct me, whether I am wrong, as I wanna use that 
flow scheme for trying to understand the code..
Thank you very much in advance!
Stefano B.