From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33)
	id 1Cme42-0007l6-HT
	for qemu-devel@nongnu.org; Thu, 06 Jan 2005 15:16:58 -0500
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33) id 1Cme41-0007kr-WF
	for qemu-devel@nongnu.org; Thu, 06 Jan 2005 15:16:58 -0500
Received: from [129.104.30.34] (helo=mx1.polytechnique.org)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1Cmdsd-0003KN-Gh
	for qemu-devel@nongnu.org; Thu, 06 Jan 2005 15:05:11 -0500
Message-ID: <41DD9A1A.3070608@bellard.org>
Date: Thu, 06 Jan 2005 21:05:46 +0100
From: Fabrice Bellard <fabrice@bellard.org>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] conditional branch implementation using dyngen labels
References: <2e5a35a6050105175820db00e5@mail.gmail.com>
In-Reply-To: <2e5a35a6050105175820db00e5@mail.gmail.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: dmh23@cornell.edu, qemu-devel@nongnu.org

Hi,

You are right, the new code is less efficient than the previous one, but 
I knew when I wrote it that it would be easy to optimize it (the idea is 
just to add platform specific asm macros to test expressions).

The support for conditionnal branches was added in dyngen for several 
reasons:

1) I was confronted to the problem of the explosion of the number of 
micro operations for the 64 bit case, especially for the cases where EIP 
needed to be given as parameter (3 versions are needed to handle 32 bit, 
32 bit sign extended to 64 bit and 64 bit). I also want to suppress 
'target-i386/ops_template_mem.h' which was only needed because I could 
not make a test in the generated code.

2) I needed to simplify the micro operations because gcc behaves very 
badly when the micro operations become more complicated, especially for 
64 bit ops on a 32 bit host.

3) This is a step to a hand coded code generator.

I will commit soon the necessary patches to correct the Windows and PPC 
builds.

Fabrice.

Dan Hecht wrote:
> Hi,
> 
> I am wondering why you changed the code generation for conditional
> branches (i386) in gen_jcc() to use dyngen labels?  It seems the new
> code will be lower performing than the old, since there is an extra
> jump instruction along one of the paths.
> 
> For example, prior to the recent CVS commit, the code generated for a
> conditional jump would be something like:
> 
> 0x087666ce:  mov    0x2c(%ebp),%eax
> 0x087666d1:  test   %eax,%eax
> 0x087666d3:  jne    0x87666ea
> 0x087666d5:  jmp    0x95e3502
> 0x087666da:  mov    $0x82e98ac,%ebx
> 0x087666df:  movl   $0x6ca,0x20(%ebp)
> 0x087666e6:  jmp    0x87666fb
> 0x087666e8:  mov    %esi,%esi
> 0x087666ea:  jmp    0x95e3f26
> 0x087666ef:  movl   $0x6b3,0x20(%ebp)
> 0x087666f6:  mov    $0x82e98ad,%ebx
> 0x087666fb:  ret 
> 
> with jmp at 0x087666d5 and 0x087666ea being chained.
> 
> Now, the code is something like:
> 
> 0x08bd2085:  mov    0x2c(%ebp),%eax
> 0x08bd2088:  test   %eax,%eax
> 0x08bd208a:  jne    0x8bd2091
> 0x08bd208c:  jmp    0x8bd20a3
> 0x08bd2091:  jmp    0x9a4f35d
> 0x08bd2096:  movl   $0x80555c30,0x20(%ebp)
> 0x08bd209d:  mov    $0x8406390,%ebx
> 0x08bd20a2:  ret
> 0x08bd20a3:  jmp    0x9a4fd8f
> 0x08bd20a8:  movl   $0x80555c8c,0x20(%ebp)
> 0x08bd20af:  mov    $0x8406391,%ebx
> 0x08bd20b4:  ret
> 
> with the jmp at 0x08bd2091 and 0x08bd20a3 being chained.  Notice the
> extra jmp in the path to the later part of the block.
> 
> Locally, I have optimized the i386 to generate something like:
> 
> 0x087686dd:  cmpl   $0x0,0x2c(%ebp)
> 0x087686e1:  jne    0x95e7981
> 0x087686e7:  jmp    0x95e6f5d
> 0x087686ec:  movl   $0x6ca,0x20(%ebp)
> 0x087686f3:  mov    $0x82eba54,%ebx
> 0x087686f8:  ret
> 0x087686f9:  movl   $0x6b3,0x20(%ebp)
> 0x08768700:  mov    $0x82eba55,%ebx
> 0x08768705:  ret 
> 
> with the jne at 0x087686e1 and the jmp at 0x087686e7 getting chained,
> as an optimization (but haven't had time to clean it up enough to send
> in for a patch).  However, this target specific code is harder to
> implement with the new broken down micro operations.
> 
> So, I'm wondering the reasoning behind this change and if there's
> another way I should have gone about implementing this optimization. 
> Previously, I just rewrite op_jz_subx, etc all in assembly.
> 
> Thanks in advance,
> Dan