* [Qemu-devel] simulated memory instead of host memory @ 2003-06-09 18:31 Johan Rydberg 2003-06-09 19:09 ` Fabrice Bellard 0 siblings, 1 reply; 5+ messages in thread From: Johan Rydberg @ 2003-06-09 18:31 UTC (permalink / raw) To: qemu-devel First question of the day; First of all I would like to say that I really like the concept of QEMU. Let GCC do most of the work and just glue it all together. Brilliant. One downside of it though is all the tampering with flags to GCC. To the question. How hard would it be to make QEMU a full-system simulator? Or, more concrete: How hard would it be to instead of using the host memory for the simulated app, use simulated memory (on per-page basis)? -- Johan Rydberg, Free Software Developer, Sweden http://rtmk.sf.net | http://www.nongnu.org/guss/ Listning to Tricky - Where I'm from ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory 2003-06-09 18:31 [Qemu-devel] simulated memory instead of host memory Johan Rydberg @ 2003-06-09 19:09 ` Fabrice Bellard 2003-06-09 19:37 ` Johan Rydberg 0 siblings, 1 reply; 5+ messages in thread From: Fabrice Bellard @ 2003-06-09 19:09 UTC (permalink / raw) To: qemu-devel Johan Rydberg wrote: > First question of the day; > > First of all I would like to say that I really like the concept of QEMU. > Let GCC do most of the work and just glue it all together. Brilliant. > One downside of it though is all the tampering with flags to GCC. Yes, on every new gcc version there may be problems... a solution may be to distribute binary only versions of some object files of QEMU. I hope that someday someone will make a proper code generator, but it is really a lot of work! > To the question. How hard would it be to make QEMU a full-system > simulator? Or, more concrete: How hard would it be to instead of using > the host memory for the simulated app, use simulated memory (on per-page > basis)? It would be possible. I spent a lot of time thinking about it, but I did not make it because of lack of time and motivation. I see three solutions: 1) The very slow (but simplest) solution is to just modify the memory access inline functions in 'cpu-i386.h' to emulate the x86 MMU. 2) A faster solution is to use 4MB tables containing the addresses of each CPU page. One 4MB table would be used for read, one table for write. The tables can be seen as big TLBs. Unmapped pages would have a NULL entry in the tables so that a fault is generated on access to fill the table. 3) An even faster solution is to use Linux memory mappings to emulate the MMU. The Linux MM state of the process would be considered as a TLB of the virtual x86 MMU state. It works only if the host has <= 4KB page size and if the guest OS don't do any mapping in memory >= 0xc0000000. With Linux as guest it would work as you can easily change the base address of the kernel. The restriction about mappings >= 0xc0000000 could be suppressed with a small (but tricky) kernel patch which would allow to mmap() at addresses >= 0xc0000000. I wanted to implement solution (3) to be able to simulate an unpatched Linux kernel (and call the project 'qplex86' !). To run any OS you would also need precise segment limits and rights emulation, at least for non user code. Fabrice. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory 2003-06-09 19:09 ` Fabrice Bellard @ 2003-06-09 19:37 ` Johan Rydberg 2003-06-09 20:18 ` Fabrice Bellard 0 siblings, 1 reply; 5+ messages in thread From: Johan Rydberg @ 2003-06-09 19:37 UTC (permalink / raw) To: qemu-devel On Mon, 09 Jun 2003 21:09:37 +0200 Fabrice Bellard <fabrice.bellard@free.fr> wrote: : It would be possible. I spent a lot of time thinking about it, but I did : not make it because of lack of time and motivation. I see three solutions: : [...] : 2) A faster solution is to use 4MB tables containing the addresses of : each CPU page. One 4MB table would be used for read, one table for : write. The tables can be seen as big TLBs. Unmapped pages would have a : NULL entry in the tables so that a fault is generated on access to fill : the table. In the current version of GUSS I use a similar technique. I call them mtcaches, which stands for memory translation caches. They can be seen as a direct mapped cache, with the virtual page number as index. The tag is constructed from the virtual address, with the offset masked. The cache contains <tag, diff> tuples. The diff is the difference between the virtual address and the host memory address. When there is a mtcache hit, all that has to be done to get the host memory address is add the virtual address to the diff value. When there is a mtcache miss, the full MMU emulation code is called. It is up to it to add entries to the mtcache (there is separate mtcaches for reads and write, and user and supervisor mode). Some early testing (booting the Linux kernel on a simulated MIPS32 4Kc) shows that you can get a 95% hit rate or more. On SPARC and other RISC architectures which has bit extraction insns and register+register addring the test against the mtcache can be done in 6-8 insns. The testing on IA-32 is a bit more complex (12-14 insns), mainly due to the limited number of general purpose registers. This is what my code generator emits for a memory store. The value that should be stores is located in %ebx. The virtual address in %eax. %ecx must be pushed on the stack to free a register. 40017160: 0000005b: push %ecx 40017161: 0000005c: mov 0x805cce4,%ebp pointer to mtcache 40017167: 00000062: mov %eax,%ecx 40017169: 00000064: shr $0xc,%ecx 4001716c: 00000067: and $0xff,%ecx 256 entries 40017172: 0000006d: lea 0x0(%ebp,%ecx,8),%esi mtcache entry at %esi 40017176: 00000071: mov %eax,%ecx 40017178: 00000073: and $0xfffff000,%ecx make tag 4001717e: 00000079: cmp %ecx,0x0(%esi) and compare 40017181: 0000007c: jne 0x00000439 miss -> slow way 40017187: 00000082: mov 0x4(%esi),%esi 4001718a: 00000085: add %eax,%esi 4001718c: 00000087: mov %ebx,0x0(%esi) do the store 4001718f: 0000008a: pop %ecx Can you come to thing of a faster way to do it? Note that I generate the code by hand (not using GCC). : 3) An even faster solution is to use Linux memory mappings to emulate : the MMU. The Linux MM state of the process would be considered as a TLB : of the virtual x86 MMU state. It works only if the host has <= 4KB page : size and if the guest OS don't do any mapping in memory >= 0xc0000000. : With Linux as guest it would work as you can easily change the base : address of the kernel. The restriction about mappings >= 0xc0000000 : could be suppressed with a small (but tricky) kernel patch which would : allow to mmap() at addresses >= 0xc0000000. Since it isn't very portable I don't think it is an option. : I wanted to implement solution (3) to be able to simulate an unpatched : Linux kernel (and call the project 'qplex86' !). : : To run any OS you would also need precise segment limits and rights : emulation, at least for non user code. Of course. Everything has to be simulated. That is the challange :) -- Johan Rydberg, Free Software Developer, Sweden http://rtmk.sf.net | http://www.nongnu.org/guss/ Listning to Her Majesty - F.U.N.E.R.A.L. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory 2003-06-09 19:37 ` Johan Rydberg @ 2003-06-09 20:18 ` Fabrice Bellard 2003-06-09 20:43 ` Johan Rydberg 0 siblings, 1 reply; 5+ messages in thread From: Fabrice Bellard @ 2003-06-09 20:18 UTC (permalink / raw) To: qemu-devel Johan Rydberg wrote: > This is what my code generator emits for a memory store. The value that > should be stores is located in %ebx. The virtual address in %eax. > %ecx must be pushed on the stack to free a register. > > 40017160: 0000005b: push %ecx > 40017161: 0000005c: mov 0x805cce4,%ebp pointer to mtcache > 40017167: 00000062: mov %eax,%ecx > 40017169: 00000064: shr $0xc,%ecx > 4001716c: 00000067: and $0xff,%ecx 256 entries > 40017172: 0000006d: lea 0x0(%ebp,%ecx,8),%esi mtcache entry at %esi > 40017176: 00000071: mov %eax,%ecx > 40017178: 00000073: and $0xfffff000,%ecx make tag > 4001717e: 00000079: cmp %ecx,0x0(%esi) and compare > 40017181: 0000007c: jne 0x00000439 miss -> slow way > 40017187: 00000082: mov 0x4(%esi),%esi > 4001718a: 00000085: add %eax,%esi > 4001718c: 00000087: mov %ebx,0x0(%esi) do the store > 4001718f: 0000008a: pop %ecx > > Can you come to thing of a faster way to do it? Note that I generate > the code by hand (not using GCC). Using a cache as you do is a good idea. You can save some insns, and more if you use differents bits of the address (do a mask with 0x7f8), but you would have less cache hits. 40017160: 0000005b: push %ecx 40017167: 00000062: mov %eax,%esi 40017169: 00000064: shr $0xc,%esi movl %esi, %ecx 4001716c: 00000067: and $0xff,%esi 256 entries 4001717e: 00000079: cmp %ecx,0x805cee4(%esi,8) compare 40017181: 0000007c: jne 0x00000439 miss -> slow way 40017187: 00000082: add 0x805cee8(%esi,8),%eax 4001718c: 00000087: mov %ebx,0x0(%eax) do the store 4001718f: 0000008a: pop %ecx I guess GCC should give nearly optimal code. > : 3) An even faster solution is to use Linux memory mappings to emulate > : the MMU. The Linux MM state of the process would be considered as a TLB > : of the virtual x86 MMU state. It works only if the host has <= 4KB page > : size and if the guest OS don't do any mapping in memory >= 0xc0000000. > : With Linux as guest it would work as you can easily change the base > : address of the kernel. The restriction about mappings >= 0xc0000000 > : could be suppressed with a small (but tricky) kernel patch which would > : allow to mmap() at addresses >= 0xc0000000. > > Since it isn't very portable I don't think it is an option. Well, if you generate code it is already not portable :-) Fabrice. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory 2003-06-09 20:18 ` Fabrice Bellard @ 2003-06-09 20:43 ` Johan Rydberg 0 siblings, 0 replies; 5+ messages in thread From: Johan Rydberg @ 2003-06-09 20:43 UTC (permalink / raw) To: qemu-devel Fabrice Bellard <fabrice.bellard@free.fr> wrote: : Using a cache as you do is a good idea. You can save some insns, and : more if you use differents bits of the address (do a mask with 0x7f8), : but you would have less cache hits. Since doing it the "slow way" is really slow you should try to maximize the hit rate. The ideal whould be something like 95-99% hit rate for normal pages, and it should only have to escape into the slow path on accesses to memory mapped I/O devices. Well, you could always dream I guess. : 40017160: 0000005b: push %ecx : 40017167: 00000062: mov %eax,%esi : 40017169: 00000064: shr $0xc,%esi : movl %esi, %ecx : 4001716c: 00000067: and $0xff,%esi 256 entries : 4001717e: 00000079: cmp %ecx,0x805cee4(%esi,8) compare : 40017181: 0000007c: jne 0x00000439 miss -> slow way : 40017187: 00000082: add 0x805cee8(%esi,8),%eax : 4001718c: 00000087: mov %ebx,0x0(%eax) do the store : 4001718f: 0000008a: pop %ecx Does this really work? 0x805cee4 is the address to _a pointer_ that holds the address of the mtcache. The reason for having a pointer to the real mtcache is that it is much faster just to change the pointer when switching between user and supervisor mode (and the other way around). Maybe it would be better to have centralized mtcache, and copy the contents of the per-cpu and per-state mtcaches into that one when the state changes. The reason for masking the virtual address as I did, and use it as tag in the cache is that you may also check for unaligned memory accesses. This is not an issue when simulating IA-32, but you must detect it when simulate machines that can not cope with unaligned accesses. : I guess GCC should give nearly optimal code. Most probably. I will wrap something together and see what it generates. : [...] : Well, if you generate code it is already not portable :-) I ment between systsems such as GNU/Linux and BSD. -- Johan Rydberg, Free Software Developer, Sweden http://rtmk.sf.net | http://www.nongnu.org/guss/ Listning to Her Majesty - Rules to follow ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-06-09 20:50 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-09 18:31 [Qemu-devel] simulated memory instead of host memory Johan Rydberg 2003-06-09 19:09 ` Fabrice Bellard 2003-06-09 19:37 ` Johan Rydberg 2003-06-09 20:18 ` Fabrice Bellard 2003-06-09 20:43 ` Johan Rydberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).