From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10.13) id 19Fusf-00038k-00 for qemu-devel@nongnu.org; Wed, 14 May 2003 07:57:09 -0400 Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10.13) id 19Fuoj-0001j2-00 for qemu-devel@nongnu.org; Wed, 14 May 2003 07:53:06 -0400 Received: from alcor.imaginet.fr ([195.68.86.12]) by monty-python.gnu.org with esmtp (Exim 4.10.13) id 19Fukp-0000W1-00 for qemu-devel@nongnu.org; Wed, 14 May 2003 07:49:03 -0400 Received: from free.fr (gw.netgem.com [195.68.2.34]) by alcor.imaginet.fr (8.11.6+Sun/8.8.8) with ESMTP id h4EBoWb02035 for ; Wed, 14 May 2003 13:50:32 +0200 (MET DST) Message-ID: <3EC22D10.5050506@free.fr> Date: Wed, 14 May 2003 13:48:32 +0200 From: Fabrice Bellard MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH] Updated Sparc support References: <20030514014546.3FA722C090@lists.samba.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: List-Help: List-Post: List-Subscribe: , List-Archive: List-Unsubscribe: , To: qemu-devel@nongnu.org Rusty Russell wrote: > In message <3EC0E59A.5070907@free.fr> you write: > >>I also plan to add direct block chaining. I will try to make it portable >>by using the 'goto *' gcc extension, but I don't know yet if it will >>work on every CPU. The direct block chaining will generate something like: >> >> 'goto *addr' >> >>at the end of some translated blocks to jump either to the CPU core or >>directly to the next translated block. 'addr' will be a global 'void *' >>variable. Since no code will be patched to change block chaining, it >>will simplify the instruction cache invalidation issues and the >>threading issues. > > > Hmm, I had a more ambitious idea, and that was to keep simple stats on > which block last followed each block: if it goes to the same block > more than N times in a row, coalesce/chain them. > > As blocks get longer, you have more opportunities for register > lifetime analsis, which could eliminate redundant stores to registers > in particular. > > I haven't got actual code, so I haven't mentioned it before... > > Thoughts? It could be interesting to avoid some condition codes computations. Currently it is not possible to do more because qemu has no generic IR and I think I won't have the time to add one. Julian Seward (of the valgrind project) is thinking about adding a more generic IR in valgrind to allow cross debugging, so it might be interesting for valgrind. BUT, I have a much simpler approach "a la FX!32" which has the advantage of being very simple and which needs very little modification in qemu: You can launch your executable a first time to record statistics. Then you launch a special tool 'qemuopt' which statically generates a dynamic library with gcc containing the host cpu code of the most used basic block chains. 'qemuopt' is very easy to do : I discovered that by noting that gcc optimizes very well 'static inline' local functions. So you just have to generate a C source containing approximately: void genfunc(CPUX86State *env) { uint32_t T0, EAX, EBX, ...; EAX = env->regs[R_EAX]; EBX = env->regs[R_EBX]; #define OPPROTO 'static inline' #include "op-i386.c" op_movl_T0_EAX(); op_movl_EBX_T0(); env->regs[R_EAX] = EAX; env->regs[R_EBX] = EBX; } Then gcc does all the hard work for us :-) Fabrice.