qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] instruction optimization thoughts
@ 2004-08-22 15:01 Magnus Damm
  2004-08-24  1:20 ` dguinan
  0 siblings, 1 reply; 4+ messages in thread
From: Magnus Damm @ 2004-08-22 15:01 UTC (permalink / raw)
  To: qemu-devel

Hi all,

Today I've been playing around with qemu trying to understand how the
emulation works. I've tried some debug flags and looked at log files.

This is how I believe the translation between x86 opcodes and micro
operations is performed today, please correct me if I am wrong:

gen_intermediate_code_internal() in target-i386/translate.c is used to
build intermediate code. The function disas_insn() is used to convert
each opcode into several micro operations. When the block is finished, 
the function optimize_flags() is used to optimize away some flag related
micro operations.

After looking at some log files I wonder if it would be possible to
reduce the number of micro operations (especially the ones involved in
flag handling) by analyzing resources used and set by each x86
instruction and then feed that information into the code that converts
x86 opcodes into micro operations.

Have a look at the following example:

----------------
IN:
0x300a8b99:  pop    ebx
0x300a8b9a:  add    ebx
0x300a8ba0:  mov    DWORD PTR [ebp-684],eax
0x300a8ba6:  xor    edx,edx
0x300a8ba8:  lea    eax,[ebp-528]
0x300a8bae:  mov    esi,esi
0x300a8bb0:  inc    edx
0x300a8bb1:  mov    DWORD PTR [eax]
0x300a8bb7:  add    eax
0x300a8bba:  cmp    edx
0x300a8bbd:  jbe    0x300a8bb0

If we analyze the x86 instructions and keep track of resources first,
instead of generating the micro operations directly, we would come up
with a table containing resource information related to each x86
instruction. This table contains data about required resources and
resources that will be set by each instruction. 

The table could also quite easily be extended to contain flags that mark
if resources are constant or not which leads to further optimization
possibilities later.

instruction        | resources required | resources set

pop ebx            | ESP                |    EBX
add ebx,0x11927    | EBX                |    EBX OF SF ZF AF PF CF
mov ..ebp-684],eax | EBP EAX            | IO
xor edx,edx        | EDX                |    EDX OF SF ZF AF PF CF
lea eax,[ebp-528]  | EBP                |    EAX
mov esi,esi        | ESI                |    ESI

inc edx            | EDX                |    EDX OF SF ZF AF PF
mov ..[eax], 0     | EAX                | IO
add eax, 4         | EAX                |    EAX OF SF ZF AF PF CF
cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
jbe ..             | EIP CF ZF          |    EIP

Then we perform a optimization step. This step removes resources marked
as set that are redundant. Maybe the code for this step could be shared
by many target processors, think of it as some kind of generic resource
optimizer.

After optimization:

instruction        | resources required | resources set

pop ebx            | ESP                |    EBX
add ebx,0x11927    | EBX                |    EBX
mov ..ebp-684],eax | EBP EAX            | IO
xor edx,edx        | EDX                |    EDX
lea eax,[ebp-528]  | EBP                |    EAX
mov esi,esi        | ESI                |    ESI

inc edx            | EDX                |    EDX
mov ..[eax], 0     | EAX                | IO
add eax, 4         | EAX                |    EAX
cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
jbe ..             | EIP CF ZF          |    EIP

Several flag-related resources have been removed above. No other
registers have been removed, but that would also be possible. The
information left in the table is fed into the code that translates the
x86 opcodes into micro operations and it is up to that code to generate
as few micro operations as possible.

I guess what I am trying to say is that it would be cool to add a
generic optimization step before the opcode to micro operations
translation. But would it be useful? Or just slow?

Any thoughts? Maybe the flag handling code is fast enough today?

/ magnus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] instruction optimization thoughts
  2004-08-22 15:01 [Qemu-devel] instruction optimization thoughts Magnus Damm
@ 2004-08-24  1:20 ` dguinan
  2004-08-24 11:45   ` Elefterios Stamatogiannakis
  0 siblings, 1 reply; 4+ messages in thread
From: dguinan @ 2004-08-24  1:20 UTC (permalink / raw)
  To: qemu-devel

I have been thinking along these lines and believe that it could be 
taken even further.  Let's assume that your pre-translation table 
optimizer could be made to work.  As a second step after eliminating 
redundancy and removing effective NOPs, entries in the intermediate 
instruction stream could be considered keys into a database of 
translations.  Depending on the amount of memory we want to spend, 
these keys could be small or even quite large - I imagine that we could 
go through a process of using something along the lines of a genetic 
algorithm to build highly optimized translation blocks for the most 
commonly occurring streams of instructions.  The rudimentary building 
blocks that the genetic algorithm would work with would be all the 
various translations that a particular instruction stream could result 
in (e.g. different ordering of order independent instructions, etc..).  
The tables would, of course, be built off-line (a datafile could be 
placed in CVS, allowing us to contribute a large amount of upfront 
compute time to build highly optimized tables for a variety of 
different platforms).

Thinking along these lines is important.  This project not only 
presents a fantastic emulator, but the "possibility" of eventually 
reaching par or better performance (if we are smart about it).

-Daniel

On Aug 22, 2004, at 5:01 AM, Magnus Damm wrote:

> Hi all,
>
> Today I've been playing around with qemu trying to understand how the
> emulation works. I've tried some debug flags and looked at log files.
>
> This is how I believe the translation between x86 opcodes and micro
> operations is performed today, please correct me if I am wrong:
>
> gen_intermediate_code_internal() in target-i386/translate.c is used to
> build intermediate code. The function disas_insn() is used to convert
> each opcode into several micro operations. When the block is finished,
> the function optimize_flags() is used to optimize away some flag 
> related
> micro operations.
>
> After looking at some log files I wonder if it would be possible to
> reduce the number of micro operations (especially the ones involved in
> flag handling) by analyzing resources used and set by each x86
> instruction and then feed that information into the code that converts
> x86 opcodes into micro operations.
>
> Have a look at the following example:
>
> ----------------
> IN:
> 0x300a8b99:  pop    ebx
> 0x300a8b9a:  add    ebx
> 0x300a8ba0:  mov    DWORD PTR [ebp-684],eax
> 0x300a8ba6:  xor    edx,edx
> 0x300a8ba8:  lea    eax,[ebp-528]
> 0x300a8bae:  mov    esi,esi
> 0x300a8bb0:  inc    edx
> 0x300a8bb1:  mov    DWORD PTR [eax]
> 0x300a8bb7:  add    eax
> 0x300a8bba:  cmp    edx
> 0x300a8bbd:  jbe    0x300a8bb0
>
> If we analyze the x86 instructions and keep track of resources first,
> instead of generating the micro operations directly, we would come up
> with a table containing resource information related to each x86
> instruction. This table contains data about required resources and
> resources that will be set by each instruction.
>
> The table could also quite easily be extended to contain flags that 
> mark
> if resources are constant or not which leads to further optimization
> possibilities later.
>
> instruction        | resources required | resources set
>
> pop ebx            | ESP                |    EBX
> add ebx,0x11927    | EBX                |    EBX OF SF ZF AF PF CF
> mov ..ebp-684],eax | EBP EAX            | IO
> xor edx,edx        | EDX                |    EDX OF SF ZF AF PF CF
> lea eax,[ebp-528]  | EBP                |    EAX
> mov esi,esi        | ESI                |    ESI
>
> inc edx            | EDX                |    EDX OF SF ZF AF PF
> mov ..[eax], 0     | EAX                | IO
> add eax, 4         | EAX                |    EAX OF SF ZF AF PF CF
> cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
> jbe ..             | EIP CF ZF          |    EIP
>
> Then we perform a optimization step. This step removes resources marked
> as set that are redundant. Maybe the code for this step could be shared
> by many target processors, think of it as some kind of generic resource
> optimizer.
>
> After optimization:
>
> instruction        | resources required | resources set
>
> pop ebx            | ESP                |    EBX
> add ebx,0x11927    | EBX                |    EBX
> mov ..ebp-684],eax | EBP EAX            | IO
> xor edx,edx        | EDX                |    EDX
> lea eax,[ebp-528]  | EBP                |    EAX
> mov esi,esi        | ESI                |    ESI
>
> inc edx            | EDX                |    EDX
> mov ..[eax], 0     | EAX                | IO
> add eax, 4         | EAX                |    EAX
> cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
> jbe ..             | EIP CF ZF          |    EIP
>
> Several flag-related resources have been removed above. No other
> registers have been removed, but that would also be possible. The
> information left in the table is fed into the code that translates the
> x86 opcodes into micro operations and it is up to that code to generate
> as few micro operations as possible.
>
> I guess what I am trying to say is that it would be cool to add a
> generic optimization step before the opcode to micro operations
> translation. But would it be useful? Or just slow?
>
> Any thoughts? Maybe the flag handling code is fast enough today?
>
> / magnus
>
>
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] instruction optimization thoughts
  2004-08-24  1:20 ` dguinan
@ 2004-08-24 11:45   ` Elefterios Stamatogiannakis
  2004-08-24 15:36     ` Piotr Krysik
  0 siblings, 1 reply; 4+ messages in thread
From: Elefterios Stamatogiannakis @ 2004-08-24 11:45 UTC (permalink / raw)
  To: qemu-devel

  I don't think the database solution would work. First of all the 
database would have to be very big in order to be effective. That means 
that in order to lookup the streams of instructions in the database you 
would thrash the cache (lots and lots of memory reads to various places).
  The Magnus's idea is very interesting because it looks like a very 
simple peephole optimizer. That is why it would be effective for blocks 
that get executed more than once. For a little more work qemu would have 
more efficient code to execute.
  Nevertheless i don't now how much of Magnus idea qemu implements right 
now. Maybe "Condition code optimisations" + "CPU state optimisations" 
from ' http://fabrice.bellard.free.fr/qemu/qemu-tech.html ' come very 
close to what Magnus suggests.

  Fabrice or anyone else with qemu internals knowledge would be more 
qualified to answer.

  teris.

ps All these code optimizing ideas pale in effectiveness with what a mmu 
optimization work would produce. There is a reason why there is a 
qemu-fast and a qemu-soft. If somehow these too could be consolidated 
then the performance gain would be considerable.... I think.

dguinan@mac.com wrote:
> I have been thinking along these lines and believe that it could be 
> taken even further.  Let's assume that your pre-translation table 
> optimizer could be made to work.  As a second step after eliminating 
> redundancy and removing effective NOPs, entries in the intermediate 
> instruction stream could be considered keys into a database of 
> translations.  Depending on the amount of memory we want to spend, these 
> keys could be small or even quite large - I imagine that we could go 
> through a process of using something along the lines of a genetic 
> algorithm to build highly optimized translation blocks for the most 
> commonly occurring streams of instructions.  The rudimentary building 
> blocks that the genetic algorithm would work with would be all the 
> various translations that a particular instruction stream could result 
> in (e.g. different ordering of order independent instructions, etc..).  
> The tables would, of course, be built off-line (a datafile could be 
> placed in CVS, allowing us to contribute a large amount of upfront 
> compute time to build highly optimized tables for a variety of different 
> platforms).
> 
> Thinking along these lines is important.  This project not only presents 
> a fantastic emulator, but the "possibility" of eventually reaching par 
> or better performance (if we are smart about it).
> 
> -Daniel
> 
> On Aug 22, 2004, at 5:01 AM, Magnus Damm wrote:
> 
>> Hi all,
>>
>> Today I've been playing around with qemu trying to understand how the
>> emulation works. I've tried some debug flags and looked at log files.
>>
>> This is how I believe the translation between x86 opcodes and micro
>> operations is performed today, please correct me if I am wrong:
>>
>> gen_intermediate_code_internal() in target-i386/translate.c is used to
>> build intermediate code. The function disas_insn() is used to convert
>> each opcode into several micro operations. When the block is finished,
>> the function optimize_flags() is used to optimize away some flag related
>> micro operations.
>>
>> After looking at some log files I wonder if it would be possible to
>> reduce the number of micro operations (especially the ones involved in
>> flag handling) by analyzing resources used and set by each x86
>> instruction and then feed that information into the code that converts
>> x86 opcodes into micro operations.
>>
>> Have a look at the following example:
>>
>> ----------------
>> IN:
>> 0x300a8b99:  pop    ebx
>> 0x300a8b9a:  add    ebx
>> 0x300a8ba0:  mov    DWORD PTR [ebp-684],eax
>> 0x300a8ba6:  xor    edx,edx
>> 0x300a8ba8:  lea    eax,[ebp-528]
>> 0x300a8bae:  mov    esi,esi
>> 0x300a8bb0:  inc    edx
>> 0x300a8bb1:  mov    DWORD PTR [eax]
>> 0x300a8bb7:  add    eax
>> 0x300a8bba:  cmp    edx
>> 0x300a8bbd:  jbe    0x300a8bb0
>>
>> If we analyze the x86 instructions and keep track of resources first,
>> instead of generating the micro operations directly, we would come up
>> with a table containing resource information related to each x86
>> instruction. This table contains data about required resources and
>> resources that will be set by each instruction.
>>
>> The table could also quite easily be extended to contain flags that mark
>> if resources are constant or not which leads to further optimization
>> possibilities later.
>>
>> instruction        | resources required | resources set
>>
>> pop ebx            | ESP                |    EBX
>> add ebx,0x11927    | EBX                |    EBX OF SF ZF AF PF CF
>> mov ..ebp-684],eax | EBP EAX            | IO
>> xor edx,edx        | EDX                |    EDX OF SF ZF AF PF CF
>> lea eax,[ebp-528]  | EBP                |    EAX
>> mov esi,esi        | ESI                |    ESI
>>
>> inc edx            | EDX                |    EDX OF SF ZF AF PF
>> mov ..[eax], 0     | EAX                | IO
>> add eax, 4         | EAX                |    EAX OF SF ZF AF PF CF
>> cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
>> jbe ..             | EIP CF ZF          |    EIP
>>
>> Then we perform a optimization step. This step removes resources marked
>> as set that are redundant. Maybe the code for this step could be shared
>> by many target processors, think of it as some kind of generic resource
>> optimizer.
>>
>> After optimization:
>>
>> instruction        | resources required | resources set
>>
>> pop ebx            | ESP                |    EBX
>> add ebx,0x11927    | EBX                |    EBX
>> mov ..ebp-684],eax | EBP EAX            | IO
>> xor edx,edx        | EDX                |    EDX
>> lea eax,[ebp-528]  | EBP                |    EAX
>> mov esi,esi        | ESI                |    ESI
>>
>> inc edx            | EDX                |    EDX
>> mov ..[eax], 0     | EAX                | IO
>> add eax, 4         | EAX                |    EAX
>> cmp edx, 0x4a      | EDX                |        OF SF ZF AF PF CF
>> jbe ..             | EIP CF ZF          |    EIP
>>
>> Several flag-related resources have been removed above. No other
>> registers have been removed, but that would also be possible. The
>> information left in the table is fed into the code that translates the
>> x86 opcodes into micro operations and it is up to that code to generate
>> as few micro operations as possible.
>>
>> I guess what I am trying to say is that it would be cool to add a
>> generic optimization step before the opcode to micro operations
>> translation. But would it be useful? Or just slow?
>>
>> Any thoughts? Maybe the flag handling code is fast enough today?
>>
>> / magnus
>>
>>
>>
>>
>> _______________________________________________
>> Qemu-devel mailing list
>> Qemu-devel@nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/qemu-devel
>>
> 
> 
> 
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] instruction optimization thoughts
  2004-08-24 11:45   ` Elefterios Stamatogiannakis
@ 2004-08-24 15:36     ` Piotr Krysik
  0 siblings, 0 replies; 4+ messages in thread
From: Piotr Krysik @ 2004-08-24 15:36 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3389 bytes --]

--- Elefterios Stamatogiannakis
<estama@dblab.ece.ntua.gr> wrote:

>   I don't think the database solution would work. 
> First of all the database would have to be very 
> big in order to be effective. That means that in 
> order to lookup the streams of instructions in 
> the database you would thrash the cache (lots 
> and lots of memory reads to various places).

Hi!

The database lookup can be very fast -- use good 
hash function and memory mapped file. For best 
performance used code fragments should be located 
in sequence (single page fault and disk read would 
bring a few useful code fragments to RAM). For 
this reason the database should be optimized for 
specific programs executed by particular user.

But once you have a nice optimizer for building 
the database why not integrate it in Qemu to 
optimize most frequently executed blocks (HotSpot)? 
Once we find the optimization to be CPU-expensive, 
we could add small persistent database.


[...]
>   teris.
> 
> ps All these code optimizing ideas pale in
> effectiveness with what a mmu optimization work 
> would produce. There is a reason why there is a 
> qemu-fast and a qemu-soft. If somehow these too
> could be consolidated then the performance gain 
> would be considerable....
> I think.

The MMU optimization improves performance a lot, 
but most of qemu-fast speed comes from code-copy 
optimization (to compare, run benchmarks with 
qemu-fast -no-code-copy). The MMU optimization 
is important for code-copy as it allows running 
blocks that read/write memory.

For MMU optimization to work it is necessary to 
have as big as possible continuous region of virtual 
address space that is dedicated to the guest. 
For this reason qemu-fast uses special memory 
layout and it causes some problems (it requires 
static compilation, hacking of libc). Qemu-fast 
will never be as portable as softmmu is. Except, 
maybe, when running 32-bit guest on 64-bit host.

I did some experiments to check if optimization 
of softmmu, by using techniques of MMU optimization 
is feasible. For this I tried to remove memory 
layout constraint of qemu-fast by using a memory 
"mapping table".

In this experiments I redirect memory access of 
guest code via a mapping table to an area of 
virtual address space, where guest pages are 
mapped (using mmap). This area can be much smaller 
than guest address space, so I minimize problems 
of qemu-fast and improve its portability. In 
future this approach could help MMU optimization 
to enter qemu-softmmu.

As memory access is much simpler that softmmu, 
the benchmarks give better results (but still 
it's 20% slower than qemu-fast with -no-code-copy). 
When running real OS, Linux seems faster then under 
softmmu but Windows 98 -- much slower. The problem 
with Windows is that it "likes" to modify pages 
where code is executing, and this causes lots of 
page faults.

I'm attaching a patch, so other developers can see 
what I'm doing. Before using the patch, please make 
sure you can build working qemu-fast with your 
setup. To see it working run qemu-fast with 
-no-code-copy option.

Fabrice, does it make sense to modify code-copy 
to be compatible with this patch (I know it's a lot 
of work)?


Regards,

Piotrek



	
		
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fast-map-env-2.patch --]
[-- Type: text/x-patch; name="fast-map-env-2.patch", Size: 23069 bytes --]

diff -ru qemu-snapshot-2004-08-04_23/cpu-all.h qemu-snapshot-2004-08-04_23-fast-map/cpu-all.h
--- qemu-snapshot-2004-08-04_23/cpu-all.h	2004-07-05 23:25:09.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/cpu-all.h	2004-08-19 00:28:30.000000000 +0200
@@ -24,6 +24,17 @@
 #define WORDS_ALIGNED
 #endif
 
+/* keep in sync with exec-all.h
+ */
+#ifndef offsetof
+#define offsetof(type, field) ((size_t) &((type *)0)->field)
+#endif
+
+/* XXX: assume sizeof(long) >= sizeof(void*)
+ */
+#define map_target2host(env, ptr) \
+    (env->map[((unsigned long) (ptr)) >> MAP_BLOCK_BITS] + ((unsigned long) (ptr)))
+
 /* some important defines: 
  * 
  * WORDS_ALIGNED : if defined, the host cpu can only make word aligned
@@ -181,6 +192,67 @@
     *(uint8_t *)ptr = v;
 }
 
+static inline int ldub_map(void *ptr)
+{
+#if defined(__i386__)
+    int val;
+    asm volatile (
+        "mov    %3, %%eax\n"
+        "shr    %2, %%eax\n"
+        "mov    %1(%%ebp,%%eax,4), %%eax\n"
+        "movzbl (%3,%%eax,1), %0\n"
+        : "=r" (val)
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr)
+        : "%eax");
+    return (val);
+#else
+#error unsupported target CPU
+#endif
+}
+
+static inline int ldsb_map(void *ptr)
+{
+#if defined(__i386__)
+    int val;
+    asm volatile (
+        "mov    %3, %%eax\n"
+        "shr    %2, %%eax\n"
+        "mov    %1(%%ebp,%%eax,4), %%eax\n"
+        "movsbl (%3,%%eax,1), %0\n"
+        : "=r" (val)
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr)
+        : "%eax");
+    return (val);
+#else
+#error unsupported target CPU
+#endif
+}
+
+static inline void stb_map(void *ptr, int v)
+{
+#if defined(__i386__)
+    asm volatile (
+        "mov    %2, %%eax\n"
+        "shr    %1, %%eax\n"
+        "mov    %0(%%ebp,%%eax,4), %%eax\n"
+        "movb   %b3, (%2,%%eax)\n"
+        :
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr),
+          "r" (v)
+        : "%eax");
+#else
+#error unsupported target CPU
+#endif
+}
+
+
+
 /* NOTE: on arm, putting 2 in /proc/sys/debug/alignment so that the
    kernel handles unaligned load/stores may give better results, but
    it is a system wide setting : bad */
@@ -467,6 +539,105 @@
     *(uint64_t *)ptr = v;
 }
 
+static inline int lduw_map(void *ptr)
+{
+#if defined(__i386__)
+    int val;
+    asm volatile (
+        "mov    %3, %%eax\n"
+        "shr    %2, %%eax\n"
+        "mov    %1(%%ebp,%%eax,4), %%eax\n"
+        "movzwl (%3,%%eax,1), %0\n"
+        : "=r" (val)
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr)
+        : "%eax");
+    return (val);
+#else
+#error unsupported target CPU
+#endif
+}
+
+static inline int ldsw_map(void *ptr)
+{
+#if defined(__i386__)
+    int val;
+    asm volatile (
+        "mov    %3, %%eax\n"
+        "shr    %2, %%eax\n"
+        "mov    %1(%%ebp,%%eax,4), %%eax\n"
+        "movswl (%3,%%eax,1), %0\n"
+        : "=r" (val)
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr)
+        : "%eax");
+    return (val);
+#else
+#error unsupported target CPU
+#endif
+}
+
+static inline int ldl_map(void *ptr)
+{
+#if defined(__i386__)
+    int val;
+    asm volatile (
+        "mov    %3, %%eax\n"
+        "shr    %2, %%eax\n"
+        "mov    %1(%%ebp,%%eax,4), %%eax\n"
+        "movl   (%3,%%eax,1), %0\n"
+        : "=r" (val)
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr)
+        : "%eax");
+    return (val);
+#else
+#error unsupported target CPU
+#endif
+}
+
+static inline void stw_map(void *ptr, int v)
+{
+#if defined(__i386__)
+    asm volatile (
+        "mov    %2, %%eax\n"
+        "shr    %1, %%eax\n"
+        "mov    %0(%%ebp,%%eax,4), %%eax\n"
+        "movw   %w3, (%2,%%eax)\n"
+        :
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr),
+          "r" (v)
+        : "%eax");
+    /* XXX PK: clobber memory? */
+#else
+#error unsupported target CPU
+#endif
+}
+
+static inline void stl_map(void *ptr, int v)
+{
+#if defined(__i386__)
+    asm volatile (
+        "mov    %2, %%eax\n"
+        "shr    %1, %%eax\n"
+        "mov    %0(%%ebp,%%eax,4), %%eax\n"
+        "movl   %3, (%2,%%eax)\n"
+        :
+        : "m" (*(uint8_t *)offsetof(CPUX86State, map[0])),
+          "I" (MAP_BLOCK_BITS),
+          "r" (ptr),
+          "r" (v)
+        : "%eax");
+#else
+#error unsupported target CPU
+#endif
+}
+
 /* float access */
 
 static inline float ldfl_raw(void *ptr)
diff -ru qemu-snapshot-2004-08-04_23/cpu-exec.c qemu-snapshot-2004-08-04_23-fast-map/cpu-exec.c
--- qemu-snapshot-2004-08-04_23/cpu-exec.c	2004-07-14 19:20:55.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/cpu-exec.c	2004-08-19 00:05:30.000000000 +0200
@@ -814,10 +814,13 @@
 {
     struct ucontext *uc = puc;
     unsigned long pc;
+    unsigned long addr;
     int trapno;
+    int res;
 
 #ifndef REG_EIP
 /* for glibc 2.1 */
+#define REG_EAX    EAX
 #define REG_EIP    EIP
 #define REG_ERR    ERR
 #define REG_TRAPNO TRAPNO
@@ -831,10 +834,20 @@
         return 1;
     } else
 #endif
-        return handle_cpu_signal(pc, (unsigned long)info->si_addr, 
-                                 trapno == 0xe ? 
-                                 (uc->uc_mcontext.gregs[REG_ERR] >> 1) & 1 : 0,
-                                 &uc->uc_sigmask, puc);
+           {
+        /* EAX == env->map[addr >> MAP_BLOCK_BITS]
+         * see *_map functions in cpu-all.h
+         */
+        /* XXX: check opcode at pc to detect possible inconsistency?
+         */
+        addr = (unsigned long)info->si_addr - uc->uc_mcontext.gregs[REG_EAX];
+        res = handle_cpu_signal(pc, addr, 
+                                trapno == 0xe ? 
+                                (uc->uc_mcontext.gregs[REG_ERR] >> 1) & 1 : 0,
+                                &uc->uc_sigmask, puc);
+        uc->uc_mcontext.gregs[REG_EAX] = (uint32_t)cpu_single_env->map[addr >> MAP_BLOCK_BITS];
+        return (res);
+    }
 }
 
 #elif defined(__x86_64__)
diff -ru qemu-snapshot-2004-08-04_23/exec.c qemu-snapshot-2004-08-04_23-fast-map/exec.c
--- qemu-snapshot-2004-08-04_23/exec.c	2004-07-05 23:25:10.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/exec.c	2004-08-19 00:22:03.000000000 +0200
@@ -45,7 +45,7 @@
 
 #define SMC_BITMAP_USE_THRESHOLD 10
 
-#define MMAP_AREA_START        0x00000000
+#define MMAP_AREA_START        (MAP_PAGE_SIZE + MAP_BLOCK_SIZE + MAP_PAGE_SIZE)
 #define MMAP_AREA_END          0xa8000000
 
 TranslationBlock tbs[CODE_GEN_MAX_BLOCKS];
@@ -125,6 +125,73 @@
 FILE *logfile;
 int loglevel;
 
+/* XXX: NOT TESTED
+ */
+static void *map_mmap(CPUState *env, target_ulong begin, target_ulong length, int prot, int flags,
+                      int fd, off_t offset)
+{
+    char *addr;
+    void *res;
+
+    addr = map_target2host(env, begin);
+    if ((addr < (char *)MMAP_AREA_START) || ((char *)MMAP_AREA_END <= addr))
+        abort();
+    res = mmap((void *)addr, length, prot, flags, fd, offset);
+    if (res == MAP_FAILED)
+        return (res);
+    if (!(begin && (MAP_BLOCK_SIZE - 1)) && ((char *)MMAP_AREA_START <= map_target2host(env, begin - MAP_PAGE_SIZE))) {
+        addr = map_target2host(env, begin - MAP_PAGE_SIZE) + MAP_PAGE_SIZE;
+        if ((addr < (char *)MMAP_AREA_START) || ((char *)MMAP_AREA_END <= addr))
+            abort();
+        res = mmap((void *)addr, length, prot, flags, fd, offset);
+    }
+    return (res);
+}
+
+/* XXX: NOT TESTED
+ */
+static int map_munmap(CPUState *env, target_ulong begin, target_ulong length)
+{
+    char *addr;
+    int res;
+
+    addr = map_target2host(env, begin);
+    if ((addr < (char *)MMAP_AREA_START) || ((char *)MMAP_AREA_END <= addr))
+        abort();
+    res = munmap((void *)addr, length);
+    if (res == - 1)
+        return (res);
+    if (!(begin && (MAP_BLOCK_SIZE - 1)) && ((char *)MMAP_AREA_START <= map_target2host(env, begin - MAP_PAGE_SIZE))) {
+        addr = map_target2host(env, begin - MAP_PAGE_SIZE) + MAP_PAGE_SIZE;
+        if ((addr < (char *)MMAP_AREA_START) || ((char *)MMAP_AREA_END <= addr))
+            abort();
+        res = munmap((void *)addr, length);
+    }
+    return (res);
+}
+
+/* XXX: NOT TESTED
+ */
+static int map_mprotect(CPUState *env, target_ulong begin, target_ulong length, int prot)
+{
+    char *addr;
+    int res;
+
+    addr = map_target2host(env, begin);
+    if ((addr < (char *)MMAP_AREA_START) || ((char *)MMAP_AREA_END <= addr))
+        abort();
+    res = mprotect((void *)addr, length, prot);
+    if (res == - 1)
+        return (res);
+    if (!(begin && (MAP_BLOCK_SIZE - 1)) && ((char *)MMAP_AREA_START <= map_target2host(env, begin - MAP_PAGE_SIZE))) {
+        addr = map_target2host(env, begin - MAP_PAGE_SIZE) + MAP_PAGE_SIZE;
+        if ((addr < (char *)MMAP_AREA_START) || ((char *)MMAP_AREA_END <= addr))
+            abort();
+        res = mprotect((void *)addr, length, prot);
+    }
+    return (res);
+}
+
 static void page_init(void)
 {
     /* NOTE: we can always suppose that qemu_host_page_size >=
@@ -836,8 +903,8 @@
         prot = 0;
         for(addr = host_start; addr < host_end; addr += TARGET_PAGE_SIZE)
             prot |= page_get_flags(addr);
-        mprotect((void *)host_start, qemu_host_page_size, 
-                 (prot & PAGE_BITS) & ~PAGE_WRITE);
+        map_mprotect(host_start, qemu_host_page_size, 
+                     (prot & PAGE_BITS) & ~PAGE_WRITE);
 #ifdef DEBUG_TB_INVALIDATE
         printf("protecting code page: 0x%08lx\n", 
                host_start);
@@ -1313,8 +1380,9 @@
     }
 
 #if !defined(CONFIG_SOFTMMU)
-    if (addr < MMAP_AREA_END)
-        munmap((void *)addr, TARGET_PAGE_SIZE);
+    if (((char *)MMAP_AREA_START <= map_target2host(env, addr))
+            && (map_target2host(env, addr) < (char *)MMAP_AREA_END))
+        map_munmap(env, addr, TARGET_PAGE_SIZE);
 #endif
 }
 
@@ -1341,8 +1409,9 @@
 #if !defined(CONFIG_SOFTMMU)
     /* NOTE: as we generated the code for this page, it is already at
        least readable */
-    if (addr < MMAP_AREA_END)
-        mprotect((void *)addr, TARGET_PAGE_SIZE, PROT_READ);
+    if (((char *)MMAP_AREA_START <= map_target2host(env, addr))
+            && (map_target2host(env, addr) < (char *)MMAP_AREA_END))
+        map_mprotect(env, addr, TARGET_PAGE_SIZE, PROT_READ);
 #endif
 }
 
@@ -1418,9 +1487,10 @@
                     if (p->valid_tag == virt_valid_tag &&
                         p->phys_addr >= start && p->phys_addr < end &&
                         (p->prot & PROT_WRITE)) {
-                        if (addr < MMAP_AREA_END) {
-                            mprotect((void *)addr, TARGET_PAGE_SIZE, 
-                                     p->prot & ~PROT_WRITE);
+                        if (((char *)MMAP_AREA_END <= map_target2host(env, addr))
+                                && (map_target2host(env, addr) < (char *)MMAP_AREA_END)) {
+                            map_mprotect(env, addr, TARGET_PAGE_SIZE, 
+                                         p->prot & ~PROT_WRITE);
                         }
                     }
                     addr += TARGET_PAGE_SIZE;
@@ -1556,34 +1626,58 @@
         } else {
             void *map_addr;
 
-            if (vaddr >= MMAP_AREA_END) {
-                ret = 2;
-            } else {
-                if (prot & PROT_WRITE) {
-                    if ((pd & ~TARGET_PAGE_MASK) == IO_MEM_ROM || 
+            if (prot & PROT_WRITE) {
+                if ((pd & ~TARGET_PAGE_MASK) == IO_MEM_ROM || 
 #if defined(TARGET_HAS_SMC) || 1
-                        first_tb ||
+                    first_tb ||
 #endif
-                        ((pd & ~TARGET_PAGE_MASK) == IO_MEM_RAM && 
-                         !cpu_physical_memory_is_dirty(pd))) {
-                        /* ROM: we do as if code was inside */
-                        /* if code is present, we only map as read only and save the
-                           original mapping */
-                        VirtPageDesc *vp;
-                        
-                        vp = virt_page_find_alloc(vaddr >> TARGET_PAGE_BITS);
-                        vp->phys_addr = pd;
-                        vp->prot = prot;
-                        vp->valid_tag = virt_valid_tag;
-                        prot &= ~PAGE_WRITE;
-                    }
+                    ((pd & ~TARGET_PAGE_MASK) == IO_MEM_RAM && 
+                     !cpu_physical_memory_is_dirty(pd))) {
+                    /* ROM: we do as if code was inside */
+                    /* if code is present, we only map as read only and save the
+                       original mapping */
+                    VirtPageDesc *vp;
+                    
+                    vp = virt_page_find_alloc(vaddr >> TARGET_PAGE_BITS);
+                    vp->phys_addr = pd;
+                    vp->prot = prot;
+                    vp->valid_tag = virt_valid_tag;
+                    prot &= ~PAGE_WRITE;
                 }
-                map_addr = mmap((void *)vaddr, TARGET_PAGE_SIZE, prot, 
-                                MAP_SHARED | MAP_FIXED, phys_ram_fd, (pd & TARGET_PAGE_MASK));
-                if (map_addr == MAP_FAILED) {
-                    cpu_abort(env, "mmap failed when mapped physical address 0x%08x to virtual address 0x%08x\n",
-                              paddr, vaddr);
+            }
+            /* if null block [MAP_PAGE_SIZE ... MAP_PAGE_SIZE + MAP_BLOCK_SIZE + MAP_PAGE_SIZE), alloc new block
+             */
+            /* XXX: handle unaligned access on block bounduary (need to allocate block for address vaddr - MAP_PAGE_SIZE)
+             */
+            if ((map_target2host(env, vaddr) < (char *)MMAP_AREA_START)
+                    || ((char *)MMAP_AREA_END <= map_target2host(env, vaddr))) {
+                static uint32_t block_next = MMAP_AREA_START;
+                uint32_t block;
+                int i;
+
+                block = block_next;
+                block_next = block + MAP_BLOCK_SIZE + MAP_PAGE_SIZE;
+                if (block_next > MMAP_AREA_END) {
+                    block = MMAP_AREA_START;
+                    block_next = block + MAP_BLOCK_SIZE + MAP_PAGE_SIZE;
                 }
+                /* invalidate pointers to chosen block
+                 */
+                /* XXX: NOT TESTED
+                 */
+                for (i = 0; i < (1L << (MAP_ADDR_BITS - MAP_BLOCK_BITS)); ++ i)
+                    if (env->map[i] == (char *)(block - i * (MAP_BLOCK_SIZE + MAP_PAGE_SIZE))) {
+                        env->map[i] = (char *)(MAP_PAGE_SIZE - i * (MAP_BLOCK_SIZE + MAP_PAGE_SIZE));
+                        munmap((void *)block, MAP_BLOCK_SIZE + MAP_PAGE_SIZE);
+                    }
+                i = vaddr >> MAP_BLOCK_BITS;
+                env->map[i] = (char *)(block - (i << MAP_BLOCK_BITS));
+            }
+            map_addr = map_mmap(env, vaddr, TARGET_PAGE_SIZE, prot, 
+                                MAP_SHARED | MAP_FIXED, phys_ram_fd, (pd & TARGET_PAGE_MASK));
+            if (map_addr == MAP_FAILED) {
+                cpu_abort(env, "mmap failed when mapped physical address 0x%08x to virtual address 0x%08x\n",
+                          paddr, vaddr);
             }
         }
     }
@@ -1604,7 +1698,8 @@
     addr &= TARGET_PAGE_MASK;
 
     /* if it is not mapped, no need to worry here */
-    if (addr >= MMAP_AREA_END)
+    if ((map_target2host(cpu_single_env, addr) < (char *)MMAP_AREA_START)
+          || (map_target2host(cpu_single_env, addr) >= (char *)MMAP_AREA_END))
         return 0;
     vp = virt_page_find(addr >> TARGET_PAGE_BITS);
     if (!vp)
@@ -1619,7 +1714,7 @@
     printf("page_unprotect: addr=0x%08x phys_addr=0x%08x prot=%x\n", 
            addr, vp->phys_addr, vp->prot);
 #endif
-    if (mprotect((void *)addr, TARGET_PAGE_SIZE, vp->prot) < 0)
+    if (map_mprotect(cpu_single_env, addr, TARGET_PAGE_SIZE, vp->prot) < 0)
         cpu_abort(cpu_single_env, "error mprotect addr=0x%lx prot=%d\n",
                   (unsigned long)addr, vp->prot);
     /* set the dirty bit */
@@ -1754,8 +1849,8 @@
     if (prot & PAGE_WRITE_ORG) {
         pindex = (address - host_start) >> TARGET_PAGE_BITS;
         if (!(p1[pindex].flags & PAGE_WRITE)) {
-            mprotect((void *)host_start, qemu_host_page_size, 
-                     (prot & PAGE_BITS) | PAGE_WRITE);
+            map_mprotect(host_start, qemu_host_page_size, 
+                         (prot & PAGE_BITS) | PAGE_WRITE);
             p1[pindex].flags |= PAGE_WRITE;
             /* and since the content will be modified, we must invalidate
                the corresponding translated code. */
diff -ru qemu-snapshot-2004-08-04_23/target-i386/cpu.h qemu-snapshot-2004-08-04_23-fast-map/target-i386/cpu.h
--- qemu-snapshot-2004-08-04_23/target-i386/cpu.h	2004-07-12 22:33:47.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/target-i386/cpu.h	2004-08-19 00:28:38.000000000 +0200
@@ -20,6 +20,12 @@
 #ifndef CPU_I386_H
 #define CPU_I386_H
 
+#define MAP_PAGE_BITS 12
+#define MAP_BLOCK_BITS 24
+#define MAP_ADDR_BITS 32
+#define MAP_PAGE_SIZE (1L << MAP_PAGE_BITS)
+#define MAP_BLOCK_SIZE (1L << MAP_BLOCK_BITS)
+
 #define TARGET_LONG_BITS 32
 
 /* target supports implicit self modifying code */
@@ -291,6 +297,9 @@
     int32_t df; /* D flag : 1 if D = 0, -1 if D = 1 */
     uint32_t hflags; /* hidden flags, see HF_xxx constants */
 
+    /* offset <= 127 to enable assembly optimization */
+    void *map[1L << (MAP_ADDR_BITS - MAP_BLOCK_BITS)];
+
     /* FPU state */
     unsigned int fpstt; /* top of stack index */
     unsigned int fpus;
diff -ru qemu-snapshot-2004-08-04_23/target-i386/op.c qemu-snapshot-2004-08-04_23-fast-map/target-i386/op.c
--- qemu-snapshot-2004-08-04_23/target-i386/op.c	2004-08-03 23:37:41.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/target-i386/op.c	2004-08-19 00:04:06.000000000 +0200
@@ -390,7 +390,7 @@
 
 /* memory access */
 
-#define MEMSUFFIX _raw
+#define MEMSUFFIX _map
 #include "ops_mem.h"
 
 #if !defined(CONFIG_USER_ONLY)
diff -ru qemu-snapshot-2004-08-04_23/target-i386/ops_template_mem.h qemu-snapshot-2004-08-04_23-fast-map/target-i386/ops_template_mem.h
--- qemu-snapshot-2004-08-04_23/target-i386/ops_template_mem.h	2004-01-18 22:44:40.000000000 +0100
+++ qemu-snapshot-2004-08-04_23-fast-map/target-i386/ops_template_mem.h	2004-08-19 00:04:06.000000000 +0200
@@ -23,11 +23,11 @@
 #if MEM_WRITE == 0
 
 #if DATA_BITS == 8
-#define MEM_SUFFIX b_raw
+#define MEM_SUFFIX b_map
 #elif DATA_BITS == 16
-#define MEM_SUFFIX w_raw
+#define MEM_SUFFIX w_map
 #elif DATA_BITS == 32
-#define MEM_SUFFIX l_raw
+#define MEM_SUFFIX l_map
 #endif
 
 #elif MEM_WRITE == 1
diff -ru qemu-snapshot-2004-08-04_23/target-i386/translate.c qemu-snapshot-2004-08-04_23-fast-map/target-i386/translate.c
--- qemu-snapshot-2004-08-04_23/target-i386/translate.c	2004-06-13 15:26:14.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/target-i386/translate.c	2004-08-19 00:04:06.000000000 +0200
@@ -394,7 +394,7 @@
 };
 
 static GenOpFunc *gen_op_arithc_mem_T0_T1_cc[9][2] = {
-    DEF_ARITHC(_raw)
+    DEF_ARITHC(_map)
 #ifndef CONFIG_USER_ONLY
     DEF_ARITHC(_kernel)
     DEF_ARITHC(_user)
@@ -423,7 +423,7 @@
 };
 
 static GenOpFunc *gen_op_cmpxchg_mem_T0_T1_EAX_cc[9] = {
-    DEF_CMPXCHG(_raw)
+    DEF_CMPXCHG(_map)
 #ifndef CONFIG_USER_ONLY
     DEF_CMPXCHG(_kernel)
     DEF_CMPXCHG(_user)
@@ -467,7 +467,7 @@
 };
 
 static GenOpFunc *gen_op_shift_mem_T0_T1_cc[9][8] = {
-    DEF_SHIFT(_raw)
+    DEF_SHIFT(_map)
 #ifndef CONFIG_USER_ONLY
     DEF_SHIFT(_kernel)
     DEF_SHIFT(_user)
@@ -498,7 +498,7 @@
 };
 
 static GenOpFunc1 *gen_op_shiftd_mem_T0_T1_im_cc[9][2] = {
-    DEF_SHIFTD(_raw, im)
+    DEF_SHIFTD(_map, im)
 #ifndef CONFIG_USER_ONLY
     DEF_SHIFTD(_kernel, im)
     DEF_SHIFTD(_user, im)
@@ -506,7 +506,7 @@
 };
 
 static GenOpFunc *gen_op_shiftd_mem_T0_T1_ECX_cc[9][2] = {
-    DEF_SHIFTD(_raw, ECX)
+    DEF_SHIFTD(_map, ECX)
 #ifndef CONFIG_USER_ONLY
     DEF_SHIFTD(_kernel, ECX)
     DEF_SHIFTD(_user, ECX)
@@ -540,8 +540,8 @@
 };
 
 static GenOpFunc *gen_op_lds_T0_A0[3 * 3] = {
-    gen_op_ldsb_raw_T0_A0,
-    gen_op_ldsw_raw_T0_A0,
+    gen_op_ldsb_map_T0_A0,
+    gen_op_ldsw_map_T0_A0,
     NULL,
 #ifndef CONFIG_USER_ONLY
     gen_op_ldsb_kernel_T0_A0,
@@ -555,8 +555,8 @@
 };
 
 static GenOpFunc *gen_op_ldu_T0_A0[3 * 3] = {
-    gen_op_ldub_raw_T0_A0,
-    gen_op_lduw_raw_T0_A0,
+    gen_op_ldub_map_T0_A0,
+    gen_op_lduw_map_T0_A0,
     NULL,
 
 #ifndef CONFIG_USER_ONLY
@@ -572,9 +572,9 @@
 
 /* sign does not matter, except for lidt/lgdt call (TODO: fix it) */
 static GenOpFunc *gen_op_ld_T0_A0[3 * 3] = {
-    gen_op_ldub_raw_T0_A0,
-    gen_op_lduw_raw_T0_A0,
-    gen_op_ldl_raw_T0_A0,
+    gen_op_ldub_map_T0_A0,
+    gen_op_lduw_map_T0_A0,
+    gen_op_ldl_map_T0_A0,
 
 #ifndef CONFIG_USER_ONLY
     gen_op_ldub_kernel_T0_A0,
@@ -588,9 +588,9 @@
 };
 
 static GenOpFunc *gen_op_ld_T1_A0[3 * 3] = {
-    gen_op_ldub_raw_T1_A0,
-    gen_op_lduw_raw_T1_A0,
-    gen_op_ldl_raw_T1_A0,
+    gen_op_ldub_map_T1_A0,
+    gen_op_lduw_map_T1_A0,
+    gen_op_ldl_map_T1_A0,
 
 #ifndef CONFIG_USER_ONLY
     gen_op_ldub_kernel_T1_A0,
@@ -604,9 +604,9 @@
 };
 
 static GenOpFunc *gen_op_st_T0_A0[3 * 3] = {
-    gen_op_stb_raw_T0_A0,
-    gen_op_stw_raw_T0_A0,
-    gen_op_stl_raw_T0_A0,
+    gen_op_stb_map_T0_A0,
+    gen_op_stw_map_T0_A0,
+    gen_op_stl_map_T0_A0,
 
 #ifndef CONFIG_USER_ONLY
     gen_op_stb_kernel_T0_A0,
@@ -621,8 +621,8 @@
 
 static GenOpFunc *gen_op_st_T1_A0[3 * 3] = {
     NULL,
-    gen_op_stw_raw_T1_A0,
-    gen_op_stl_raw_T1_A0,
+    gen_op_stw_map_T1_A0,
+    gen_op_stl_map_T1_A0,
 
 #ifndef CONFIG_USER_ONLY
     NULL,
@@ -4321,7 +4321,7 @@
 
 
     DEF_READF( )
-    DEF_READF(_raw)
+    DEF_READF(_map)
 #ifndef CONFIG_USER_ONLY
     DEF_READF(_kernel)
     DEF_READF(_user)
@@ -4440,7 +4440,7 @@
 
 
     DEF_WRITEF( )
-    DEF_WRITEF(_raw)
+    DEF_WRITEF(_map)
 #ifndef CONFIG_USER_ONLY
     DEF_WRITEF(_kernel)
     DEF_WRITEF(_user)
@@ -4479,7 +4479,7 @@
     [INDEX_op_rorl ## SUFFIX ## _T0_T1_cc] = INDEX_op_rorl ## SUFFIX ## _T0_T1,
 
     DEF_SIMPLER( )
-    DEF_SIMPLER(_raw)
+    DEF_SIMPLER(_map)
 #ifndef CONFIG_USER_ONLY
     DEF_SIMPLER(_kernel)
     DEF_SIMPLER(_user)
diff -ru qemu-snapshot-2004-08-04_23/vl.c qemu-snapshot-2004-08-04_23-fast-map/vl.c
--- qemu-snapshot-2004-08-04_23/vl.c	2004-08-04 00:09:30.000000000 +0200
+++ qemu-snapshot-2004-08-04_23-fast-map/vl.c	2004-08-19 00:35:26.000000000 +0200
@@ -3035,6 +3035,8 @@
 
     /* init CPU state */
     env = cpu_init();
+    for (i = 0; i < (1L << (MAP_ADDR_BITS - MAP_BLOCK_BITS)); ++ i)
+        env->map[i] = (char *)(MAP_PAGE_SIZE - i * MAP_BLOCK_SIZE);
     global_env = env;
     cpu_single_env = env;
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-08-24 17:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-22 15:01 [Qemu-devel] instruction optimization thoughts Magnus Damm
2004-08-24  1:20 ` dguinan
2004-08-24 11:45   ` Elefterios Stamatogiannakis
2004-08-24 15:36     ` Piotr Krysik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).