[Qemu-devel] 4G address space remapping on 64-bit host

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] 4G address space remapping on 64-bit host
@ 2007-06-29  9:41 Blue Swirl
  2007-06-29 10:15 ` Fabrice Bellard
  2007-06-29 13:00 ` Paul Brook
  0 siblings, 2 replies; 8+ messages in thread
From: Blue Swirl @ 2007-06-29  9:41 UTC (permalink / raw)
  To: qemu-devel

Hi,

I had an idea of mapping the full 32-bit target virtual address space
to a 4GB area on 64-bit hosts. Then the loads and stores to normal RAM
(except page tables, code_mem_write etc) could be made much faster,
falling back to softmmu for other pages. The idea has come up before,
for example in this Fabrice's message:
http://article.gmane.org/gmane.comp.emulators.qemu/685

But I'm not sure if this would be worth the effort, the speedup would
depend on the frequency of the loads/stores and also translation time
vs. translated code execution times. Does anyone have good statistics
on those?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29  9:41 [Qemu-devel] 4G address space remapping on 64-bit host Blue Swirl
@ 2007-06-29 10:15 ` Fabrice Bellard
  2007-06-29 16:48   ` Blue Swirl
  2007-06-29 13:00 ` Paul Brook
  1 sibling, 1 reply; 8+ messages in thread
From: Fabrice Bellard @ 2007-06-29 10:15 UTC (permalink / raw)
  To: qemu-devel

Hi,

In fact, running in 64 bit is not necessary : It is simpler and more 
efficient to use kqemu (or KVM) to handle the address space remapping. 
The trick is to run the translator in the upper part or lower part of 
the 32 bit address space and to protect it with segments.

Even in 64 bit mode, using kqemu would be more efficient because it 
could handle scattered address spaces more efficiently than the host OS.

Fabrice.

Blue Swirl wrote:
> Hi,
> 
> I had an idea of mapping the full 32-bit target virtual address space
> to a 4GB area on 64-bit hosts. Then the loads and stores to normal RAM
> (except page tables, code_mem_write etc) could be made much faster,
> falling back to softmmu for other pages. The idea has come up before,
> for example in this Fabrice's message:
> http://article.gmane.org/gmane.comp.emulators.qemu/685
> 
> But I'm not sure if this would be worth the effort, the speedup would
> depend on the frequency of the loads/stores and also translation time
> vs. translated code execution times. Does anyone have good statistics
> on those?
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29 10:15 ` Fabrice Bellard
@ 2007-06-29 16:48   ` Blue Swirl
  2007-06-29 20:48     ` Fabrice Bellard
  0 siblings, 1 reply; 8+ messages in thread
From: Blue Swirl @ 2007-06-29 16:48 UTC (permalink / raw)
  To: qemu-devel

On 6/29/07, Fabrice Bellard <fabrice@bellard.org> wrote:
> In fact, running in 64 bit is not necessary : It is simpler and more
> efficient to use kqemu (or KVM) to handle the address space remapping.
> The trick is to run the translator in the upper part or lower part of
> the 32 bit address space and to protect it with segments.

Would that be hard to implement for the kqemu case? What is your
guesstimate on what kind of performance benefit would Sparc32
emulation get from that?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29 16:48   ` Blue Swirl
@ 2007-06-29 20:48     ` Fabrice Bellard
  2007-07-03  7:48       ` Blue Swirl
  0 siblings, 1 reply; 8+ messages in thread
From: Fabrice Bellard @ 2007-06-29 20:48 UTC (permalink / raw)
  To: qemu-devel

Blue Swirl wrote:
> On 6/29/07, Fabrice Bellard <fabrice@bellard.org> wrote:
>> In fact, running in 64 bit is not necessary : It is simpler and more
>> efficient to use kqemu (or KVM) to handle the address space remapping.
>> The trick is to run the translator in the upper part or lower part of
>> the 32 bit address space and to protect it with segments.
> 
> Would that be hard to implement for the kqemu case? What is your
> guesstimate on what kind of performance benefit would Sparc32
> emulation get from that?

The kqemu part could be quite simple. A new execution mode could be 
added so that:

- shadow page table faults generate a specific signal in the user guest 
code.
- A kqemu "syscall" callable from the user guest code could be used to 
do the equivalent of tlb_set_page(), tlb_flush_page() and tlb_flush().

Note that I don't think it is worth using Xen for that. Modifying kqemu 
(or even KVM) should be more flexible. With kqemu it could also work on 
FreeBSD, Solaris, Windows and Linux.

The more complicated part is to split QEMU in two parts : one part 
containing the translator (and maybe some devices) would be executed as 
guest user code in kqemu. The other part would be executed as a regular 
process to handle what is left (graphic, disk access, etc).

If TB where MMIO accesses are done are compiled specifically, I think it 
can be quite efficient.

For the specific sparc32 case, I think that a better register window 
handling and a faster soft mmu code (using 4MB TLBs as it was proposed 
in a patch long ago) should already give an important speed boost (say a 
factor 1.5 to 2). The kqemu optimisation should give at least as much 
performance gain, depending on the ratio of instructions which do a 
memory access and on the number of TLB faults.

Regards,

Fabrice.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29 20:48     ` Fabrice Bellard
@ 2007-07-03  7:48       ` Blue Swirl
  0 siblings, 0 replies; 8+ messages in thread
From: Blue Swirl @ 2007-07-03  7:48 UTC (permalink / raw)
  To: qemu-devel

On 6/29/07, Fabrice Bellard <fabrice@bellard.org> wrote:
> The kqemu part could be quite simple. A new execution mode could be
> added so that:
>
> - shadow page table faults generate a specific signal in the user guest
> code.
> - A kqemu "syscall" callable from the user guest code could be used to
> do the equivalent of tlb_set_page(), tlb_flush_page() and tlb_flush().
>
> Note that I don't think it is worth using Xen for that. Modifying kqemu
> (or even KVM) should be more flexible. With kqemu it could also work on
> FreeBSD, Solaris, Windows and Linux.
>
> The more complicated part is to split QEMU in two parts : one part
> containing the translator (and maybe some devices) would be executed as
> guest user code in kqemu. The other part would be executed as a regular
> process to handle what is left (graphic, disk access, etc).

The first step could be execution of TB code from kqemu, memory
accesses could bypass the TLB.

> If TB where MMIO accesses are done are compiled specifically, I think it
> can be quite efficient.
>
> For the specific sparc32 case, I think that a better register window
> handling and a faster soft mmu code (using 4MB TLBs as it was proposed
> in a patch long ago) should already give an important speed boost (say a
> factor 1.5 to 2). The kqemu optimisation should give at least as much
> performance gain, depending on the ratio of instructions which do a
> memory access and on the number of TLB faults.

About register windows, it's strange that enabling REG_REGWPTR
mysteriously does not work.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29  9:41 [Qemu-devel] 4G address space remapping on 64-bit host Blue Swirl
  2007-06-29 10:15 ` Fabrice Bellard
@ 2007-06-29 13:00 ` Paul Brook
  2007-06-29 17:14   ` Gwenole Beauchesne
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Brook @ 2007-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

> I had an idea of mapping the full 32-bit target virtual address space
> to a 4GB area on 64-bit hosts. Then the loads and stores to normal RAM
> (except page tables, code_mem_write etc) could be made much faster,
> falling back to softmmu for other pages. The idea has come up before,
> for example in this Fabrice's message:
> http://article.gmane.org/gmane.comp.emulators.qemu/685
>
> But I'm not sure if this would be worth the effort, the speedup would
> depend on the frequency of the loads/stores and also translation time
> vs. translated code execution times. Does anyone have good statistics
> on those?

I'd expect the overhead of SIGSEGV+mmap to be prohibitive. I don't have 
numbers to back this up, but experience with MIPS system emulation shows that 
TLB miss cost can have significant effect on overall performance.

Like Fabrice, I think this would be most useful in combination with some sort 
of hypervisor.  Somewhere on my TODO list is porting qemu to run directly as 
a paravirtual Xen DomU.  This means you can insert the guest pagetable walk 
directly into the host mmu fault handler, and do clever things with shadow 
pagetables.

I should probably get the cycle counting patches polished and applied. These 
include a mechanism for distinguishing RAM and MMIO accesses.

Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29 13:00 ` Paul Brook
@ 2007-06-29 17:14   ` Gwenole Beauchesne
  2007-06-29 21:03     ` Paul Brook
  0 siblings, 1 reply; 8+ messages in thread
From: Gwenole Beauchesne @ 2007-06-29 17:14 UTC (permalink / raw)
  To: qemu-devel

Hi,

2007/6/29, Paul Brook <paul@codesourcery.com>:
> I'd expect the overhead of SIGSEGV+mmap to be prohibitive. I don't have
> numbers to back this up, but experience with MIPS system emulation shows that
> TLB miss cost can have significant effect on overall performance.

I'd say this can't be worse than on MacOS X where Mach exception
handling is terribly slow. Typically 100 usec per fault
caught+mprotect where Linux requires less than 5 usec to do the same.

> Like Fabrice, I think this would be most useful in combination with some sort
> of hypervisor.  Somewhere on my TODO list is porting qemu to run directly as
> a paravirtual Xen DomU.  This means you can insert the guest pagetable walk
> directly into the host mmu fault handler, and do clever things with shadow
> pagetables.

This would be great. As Fabrice mentioned, the tricky part would be to
run the translator in the upper part or lower part of the 32-bit
address space. Would fixing compilation with -pie help this (with some
provisions for the dyngen ops) or do you see another means to achieve
this?
-- 
Gwenolé.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] 4G address space remapping on 64-bit host
  2007-06-29 17:14   ` Gwenole Beauchesne
@ 2007-06-29 21:03     ` Paul Brook
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Brook @ 2007-06-29 21:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Gwenole Beauchesne

> > I'd expect the overhead of SIGSEGV+mmap to be prohibitive. I don't have
> > numbers to back this up, but experience with MIPS system emulation shows
> > that TLB miss cost can have significant effect on overall performance.
>
> I'd say this can't be worse than on MacOS X where Mach exception
> handling is terribly slow. Typically 100 usec per fault
> caught+mprotect where Linux requires less than 5 usec to do the same.

Maybe. I'll agree OSX memory management can be horribly slow[1].

> > Like Fabrice, I think this would be most useful in combination with some
> > sort of hypervisor.  Somewhere on my TODO list is porting qemu to run
> > directly as a paravirtual Xen DomU.  This means you can insert the guest
> > pagetable walk directly into the host mmu fault handler, and do clever
> > things with shadow pagetables.
>
> This would be great. As Fabrice mentioned, the tricky part would be to
> run the translator in the upper part or lower part of the 32-bit
> address space. Would fixing compilation with -pie help this (with some
> provisions for the dyngen ops) or do you see another means to achieve
> this?

My initial plan was to punt, and only worry about 64-bit hosts :-)

Using segmentation to chop a lump out of the address space is probably the 
simplest, and efficient as long as your OS doesn't try to access that area.
The hardest bit is emulating accesses that trap, but since we also control the 
(emulated) guest code we know which instructions we need to decode.

Paul

[1] This is from experience trying to make gcc go fast on that platform, not 
just random apple-bashing :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-07-03  7:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-29  9:41 [Qemu-devel] 4G address space remapping on 64-bit host Blue Swirl
2007-06-29 10:15 ` Fabrice Bellard
2007-06-29 16:48   ` Blue Swirl
2007-06-29 20:48     ` Fabrice Bellard
2007-07-03  7:48       ` Blue Swirl
2007-06-29 13:00 ` Paul Brook
2007-06-29 17:14   ` Gwenole Beauchesne
2007-06-29 21:03     ` Paul Brook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).