From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33)
	id 1BytpP-0007Pb-6y
	for qemu-devel@nongnu.org; Sun, 22 Aug 2004 11:00:18 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33)
	id 1BytpN-0007PB-5w
	for qemu-devel@nongnu.org; Sun, 22 Aug 2004 11:00:14 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33) id 1BytpN-0007P8-1u
	for qemu-devel@nongnu.org; Sun, 22 Aug 2004 11:00:13 -0400
Received: from [213.80.72.10] (helo=kubrik.opensource.se)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1Bytkb-0007E1-H6
	for qemu-devel@nongnu.org; Sun, 22 Aug 2004 10:55:18 -0400
Received: from 192.168.1.16 (unknown [213.80.72.14])
	by kubrik.opensource.se (Postfix) with ESMTP id 8B7B13752C
	for <qemu-devel@nongnu.org>; Sun, 22 Aug 2004 16:43:25 +0200 (CEST)
From: Magnus Damm <damm@opensource.se>
Content-Type: text/plain
Message-Id: <1093186895.1266.223.camel@kubu.opensource.se>
Mime-Version: 1.0
Date: Sun, 22 Aug 2004 17:01:54 +0200
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] instruction optimization thoughts
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Hi all,

Today I've been playing around with qemu trying to understand how the
emulation works. I've tried some debug flags and looked at log files.

This is how I believe the translation between x86 opcodes and micro
operations is performed today, please correct me if I am wrong:

gen_intermediate_code_internal() in target-i386/translate.c is used to
build intermediate code. The function disas_insn() is used to convert
each opcode into several micro operations. When the block is finished, 
the function optimize_flags() is used to optimize away some flag related
micro operations.

After looking at some log files I wonder if it would be possible to
reduce the number of micro operations (especially the ones involved in
flag handling) by analyzing resources used and set by each x86
instruction and then feed that information into the code that converts
x86 opcodes into micro operations.

Have a look at the following example:

----------------
IN:
0x300a8b99:  pop    ebx
0x300a8b9a:  add    ebx
0x300a8ba0:  mov    DWORD PTR [ebp-684],eax
0x300a8ba6:  xor    edx,edx
0x300a8ba8:  lea    eax,[ebp-528]
0x300a8bae:  mov    esi,esi
0x300a8bb0:  inc    edx
0x300a8bb1:  mov    DWORD PTR [eax]
0x300a8bb7:  add    eax
0x300a8bba:  cmp    edx
0x300a8bbd:  jbe    0x300a8bb0

If we analyze the x86 instructions and keep track of resources first,
instead of generating the micro operations directly, we would come up
with a table containing resource information related to each x86
instruction. This table contains data about required resources and
resources that will be set by each instruction. 

The table could also quite easily be extended to contain flags that mark
if resources are constant or not which leads to further optimization
possibilities later.

instruction        | resources required | resources set

pop ebx            | ESP                |    EBX
add ebx,0x11927    | EBX                |    EBX OF SF ZF AF PF CF
mov ..ebp-684],eax | EBP EAX            | IO
xor edx,edx        | EDX                |    EDX OF SF ZF AF PF CF
lea eax,[ebp-528]  | EBP                |    EAX
mov esi,esi        | ESI                |    ESI

inc edx            | EDX                |    EDX OF SF ZF AF PF
mov ..[eax], 0     | EAX                | IO
add eax, 4         | EAX                |    EAX OF SF ZF AF PF CF
cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
jbe ..             | EIP CF ZF          |    EIP

Then we perform a optimization step. This step removes resources marked
as set that are redundant. Maybe the code for this step could be shared
by many target processors, think of it as some kind of generic resource
optimizer.

After optimization:

instruction        | resources required | resources set

pop ebx            | ESP                |    EBX
add ebx,0x11927    | EBX                |    EBX
mov ..ebp-684],eax | EBP EAX            | IO
xor edx,edx        | EDX                |    EDX
lea eax,[ebp-528]  | EBP                |    EAX
mov esi,esi        | ESI                |    ESI

inc edx            | EDX                |    EDX
mov ..[eax], 0     | EAX                | IO
add eax, 4         | EAX                |    EAX
cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
jbe ..             | EIP CF ZF          |    EIP

Several flag-related resources have been removed above. No other
registers have been removed, but that would also be possible. The
information left in the table is fed into the code that translates the
x86 opcodes into micro operations and it is up to that code to generate
as few micro operations as possible.

I guess what I am trying to say is that it would be cool to add a
generic optimization step before the opcode to micro operations
translation. But would it be useful? Or just slow?

Any thoughts? Maybe the flag handling code is fast enough today?

/ magnus