[Qemu-devel] KQEMU code organization

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] KQEMU code organization
@ 2008-05-27 16:56 Jan Kiszka
  2008-05-27 17:20 ` Ben Taylor
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-27 16:56 UTC (permalink / raw)
  To: qemu-devel

Hi,

is there a technical reason why the kqemu kernel module is built out of
a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
date back to the time when wrapper and core were distributed under
different licenses?

I'm currently trying to hunt down a (probable) bug in kqemu, and the
monitor is now unfortunately a white spot for the source-level debugger.
So far I only managed to make the rest visible.

BTW, am I missing an official code repository of kqemu? Why is there no
subfolder, e.g., in the qemu svn repos? So patches should be provided
against 1.3.0pre11, right?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] KQEMU code organization
  2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka
@ 2008-05-27 17:20 ` Ben Taylor
  2008-05-27 18:25   ` [Qemu-devel] " Jan Kiszka
  2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard
  2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard
  2 siblings, 1 reply; 31+ messages in thread
From: Ben Taylor @ 2008-05-27 17:20 UTC (permalink / raw)
  To: qemu-devel

On Tue, May 27, 2008 at 12:56 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> Hi,
>
> is there a technical reason why the kqemu kernel module is built out of
> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
> date back to the time when wrapper and core were distributed under
> different licenses?
>
> I'm currently trying to hunt down a (probable) bug in kqemu, and the
> monitor is now unfortunately a white spot for the source-level debugger.
> So far I only managed to make the rest visible.
>
> BTW, am I missing an official code repository of kqemu? Why is there no
> subfolder, e.g., in the qemu svn repos? So patches should be provided
> against 1.3.0pre11, right?

I maintain a version of the repository at

http://svn9.cvsdude.com/kdesolaris/kqemu/trunk/1.0.3pre11/kqemu

that includes the Solaris changes and other patches posted to the
list that I've been able to test and integrate.

despite the kqemu being under kdesolaris (my friend owns this tree),
this is kqemu the kernel module, not kqemu the KDE qemu front end. :-)

ben

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-27 17:20 ` Ben Taylor
@ 2008-05-27 18:25   ` Jan Kiszka
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-27 18:25 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1778 bytes --]

Ben Taylor wrote:
> On Tue, May 27, 2008 at 12:56 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> Hi,
>>
>> is there a technical reason why the kqemu kernel module is built out of
>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>> date back to the time when wrapper and core were distributed under
>> different licenses?
>>
>> I'm currently trying to hunt down a (probable) bug in kqemu, and the
>> monitor is now unfortunately a white spot for the source-level debugger.
>> So far I only managed to make the rest visible.
>>
>> BTW, am I missing an official code repository of kqemu? Why is there no
>> subfolder, e.g., in the qemu svn repos? So patches should be provided
>> against 1.3.0pre11, right?
> 
> I maintain a version of the repository at
> 
> http://svn9.cvsdude.com/kdesolaris/kqemu/trunk/1.0.3pre11/kqemu
> 
> that includes the Solaris changes and other patches posted to the
> list that I've been able to test and integrate.

So this is the de-facto official development version?  Quite a few
changes in that tree, also to core stuff. Hmm. But nothing that fixes my
spurious CPL degeneration. What a pity.

However, you could merge another (minor) patch:

Index: Makefile
===================================================================
--- Makefile	(Revision 17)
+++ Makefile	(Arbeitskopie)
@@ -47,7 +47,7 @@ endif # !CONFIG_WIN32
 
 clean:
 	$(MAKE) -C common clean
-	rm -f kqemu.ko *.o *~
+	rm -rf kqemu.ko *.o *~ .kqemu* Module.* modules.order kqemu.mod.c .tmp_versions
 
 endif # !CONFIG_SOLARIS
 

Actually, more needs to be cleaned up /wrt Linux module building. But
I'm reluctant to touch common/Makefile until the (current) reason for
this code organization is known.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] KQEMU code organization
  2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka
  2008-05-27 17:20 ` Ben Taylor
@ 2008-05-27 20:58 ` Fabrice Bellard
  2008-05-27 21:40   ` [Qemu-devel] " Jan Kiszka
  2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard
  2 siblings, 1 reply; 31+ messages in thread
From: Fabrice Bellard @ 2008-05-27 20:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: jan.kiszka

Hi,

Regarding kqemu, I am still hesitating whether to commit it in the QEMU
subversion repository. Moreover, I may change its license to another
open source one so I would prefer that the patches are assigned to my
copyright, especially if they are just small bugfixes.

For your information, I will commit some incompatible API changes in
kqemu in the next few days, so a new version will be needed anyway.

Regards,

Fabrice.

Jan Kiszka wrote:
> Hi,
> 
> is there a technical reason why the kqemu kernel module is built out of
> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
> date back to the time when wrapper and core were distributed under
> different licenses?
> 
> I'm currently trying to hunt down a (probable) bug in kqemu, and the
> monitor is now unfortunately a white spot for the source-level debugger.
> So far I only managed to make the rest visible.
> 
> BTW, am I missing an official code repository of kqemu? Why is there no
> subfolder, e.g., in the qemu svn repos? So patches should be provided
> against 1.3.0pre11, right?
> 
> Thanks,
> Jan
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard
@ 2008-05-27 21:40   ` Jan Kiszka
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-27 21:40 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

Hi Fabrice,

Fabrice Bellard wrote:
> Hi,
> 
> Regarding kqemu, I am still hesitating whether to commit it in the QEMU
> subversion repository. Moreover, I may change its license to another
> open source one so I would prefer that the patches are assigned to my
> copyright, especially if they are just small bugfixes.

Hmm, that leaves an uncomfortable feeling on my side. If the licenses
of the officially supported version did not include a GPL-compatible
one, we would have to stick with what we have at the moment for Linux.
Or will we see a dual licensed kqemu?

> 
> For your information, I will commit some incompatible API changes in
> kqemu in the next few days, so a new version will be needed anyway.

What is the roadmap of kqemu then? Are there functional enhancements
planned, or further performance tunings? What are those?

BTW, I think I understood my problem with kqemu in the meantime: lcall
from ring 0 => fails on lret as the real CS (with "wrong" RPL) is pushed
onto the guest stack. Am I right? How to fix this best, by emulating
lcall at kernel level?

Thanks,
Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] KQEMU code organization
  2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka
  2008-05-27 17:20 ` Ben Taylor
  2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard
@ 2008-05-27 22:11 ` Fabrice Bellard
  2008-05-28 16:02   ` [Qemu-devel] " Jan Kiszka
  2 siblings, 1 reply; 31+ messages in thread
From: Fabrice Bellard @ 2008-05-27 22:11 UTC (permalink / raw)
  To: qemu-devel

Jan Kiszka wrote:
> Hi,
> 
> is there a technical reason why the kqemu kernel module is built out of
> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
> date back to the time when wrapper and core were distributed under
> different licenses?

This is a technical reason: the "blob" is run in an address space
different from the host kernel.

Fabrice.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard
@ 2008-05-28 16:02   ` Jan Kiszka
  2008-05-28 16:37     ` Fabrice Bellard
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2008-05-28 16:02 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> is there a technical reason why the kqemu kernel module is built out of
>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>> date back to the time when wrapper and core were distributed under
>> different licenses?
> 
> This is a technical reason: the "blob" is run in an address space
> different from the host kernel.

Well, easy to claim, I know, but I don't think this is a hard reason.
However, as overcoming genmon and genoffset may require quite some
refactoring, I'm not sure if it's worth it.

For debugging purposes I meanwhile created my own build system anyway.
gdb fortunately accepts an monitor-image.out built with -g so that
source level debugging of the monitor is possible as well.

/me now needs to understand how this thing works...

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-28 16:02   ` [Qemu-devel] " Jan Kiszka
@ 2008-05-28 16:37     ` Fabrice Bellard
  2008-05-28 16:55       ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Fabrice Bellard @ 2008-05-28 16:37 UTC (permalink / raw)
  To: qemu-devel

Jan Kiszka wrote:
> Fabrice Bellard wrote:
>> Jan Kiszka wrote:
>>> Hi,
>>>
>>> is there a technical reason why the kqemu kernel module is built out of
>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>>> date back to the time when wrapper and core were distributed under
>>> different licenses?
>> This is a technical reason: the "blob" is run in an address space
>> different from the host kernel.
> 
> Well, easy to claim, I know, but I don't think this is a hard reason.
> However, as overcoming genmon and genoffset may require quite some
> refactoring, I'm not sure if it's worth it.

I may change the monitor blob format to ELF to allow relocation, but the 
idea stays the same, and I don't think you can do it another way...

> For debugging purposes I meanwhile created my own build system anyway.
> gdb fortunately accepts an monitor-image.out built with -g so that
> source level debugging of the monitor is possible as well.

Right. This is what I do.

Fabrice.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-28 16:37     ` Fabrice Bellard
@ 2008-05-28 16:55       ` Jan Kiszka
  2008-05-28 18:34         ` Jan Kiszka
  2008-05-29 12:29         ` Fabrice Bellard
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-28 16:55 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard wrote:
> Jan Kiszka wrote:
>> Fabrice Bellard wrote:
>>> Jan Kiszka wrote:
>>>> Hi,
>>>>
>>>> is there a technical reason why the kqemu kernel module is built out of
>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>>>> date back to the time when wrapper and core were distributed under
>>>> different licenses?
>>> This is a technical reason: the "blob" is run in an address space
>>> different from the host kernel.
>>
>> Well, easy to claim, I know, but I don't think this is a hard reason.
>> However, as overcoming genmon and genoffset may require quite some
>> refactoring, I'm not sure if it's worth it.
> 
> I may change the monitor blob format to ELF to allow relocation, but the
> idea stays the same, and I don't think you can do it another way...

I agree (from my current knowledge of the problem) that the monitor
remains "foreign" code to the kernel module. But at least the
repackaging into a c-structure should be unnecessary.

The offset generation can be skipped if the assembly files are converted
into inline assembly. Might be tricky in some cases, but I see no
show-stopper yet.

The give it a tiny start, I will look if I can unify the build process
for all "true" kernel components. That is what currently breaks the
debugability of the driver frame (up to kernel2monitor), and which also
causes a kbuild warning. Likely harmless ATM, but it is fragile on
long-term.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-28 16:55       ` Jan Kiszka
@ 2008-05-28 18:34         ` Jan Kiszka
  2008-05-29 12:29         ` Fabrice Bellard
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-28 18:34 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4249 bytes --]

Jan Kiszka wrote:
> Fabrice Bellard wrote:
>> Jan Kiszka wrote:
>>> Fabrice Bellard wrote:
>>>> Jan Kiszka wrote:
>>>>> Hi,
>>>>>
>>>>> is there a technical reason why the kqemu kernel module is built out of
>>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>>>>> date back to the time when wrapper and core were distributed under
>>>>> different licenses?
>>>> This is a technical reason: the "blob" is run in an address space
>>>> different from the host kernel.
>>> Well, easy to claim, I know, but I don't think this is a hard reason.
>>> However, as overcoming genmon and genoffset may require quite some
>>> refactoring, I'm not sure if it's worth it.
>> I may change the monitor blob format to ELF to allow relocation, but the
>> idea stays the same, and I don't think you can do it another way...
> 
> I agree (from my current knowledge of the problem) that the monitor
> remains "foreign" code to the kernel module. But at least the
> repackaging into a c-structure should be unnecessary.
> 
> The offset generation can be skipped if the assembly files are converted
> into inline assembly. Might be tricky in some cases, but I see no
> show-stopper yet.
> 
> The give it a tiny start, I will look if I can unify the build process
> for all "true" kernel components. That is what currently breaks the
> debugability of the driver frame (up to kernel2monitor), and which also
> causes a kbuild warning. Likely harmless ATM, but it is fragile on
> long-term.

Here we go. Still not nice (I would put all monitor code in its own
directory, moving those few host kernel bits into the top-level dir),
but at least much cleaner from kbuild's POV.

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
---
 Makefile |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Index: b/Makefile
===================================================================
--- a/Makefile
+++ b/Makefile
@@ -17,7 +17,7 @@ ifdef CONFIG_KBUILD26
 all: kqemu.ko
 
 kqemu.ko:
-	make -C common all
+	make -C common monitor-image.h
 	make -C $(KERNEL_PATH) M=`pwd` modules
 
 else
@@ -38,7 +38,8 @@ endif # !CONFIG_WIN32
 
 clean:
 	$(MAKE) -C common clean
-	rm -f kqemu.ko *.o *~
+	rm -rf kqemu.ko *.o *~ .kqemu* Module.* modules.order kqemu.mod.c .tmp_versions \
+               common/.kernel* common/*/.kernel*
 
 FILES=configure Makefile README Changelog LICENSE COPYING \
       install.sh kqemu-linux.c kqemu.h \
@@ -89,10 +90,10 @@ kqemu.o: $(kqemu-objs)
 else
 # called from 2.6 kernel kbuild
 
-obj-m:= kqemu.o
-kqemu-objs:= kqemu-linux.o kqemu-mod.o
+EXTRA_AFLAGS=-I $(PWD)/common
+EXTRA_CFLAGS=-I $(PWD)
 
-$(obj)/kqemu-mod.o: $(src)/kqemu-mod-$(ARCH).o
-	cp $< $@
+obj-m:= kqemu.o
+kqemu-objs:= kqemu-linux.o common/kernel.o common/$(ARCH)/kernel_asm.o
 endif
 endif # PATCHLEVEL


BTW, there is more trouble ahead for kqemu. This is what I get booting a 
x86-64 OpenSuse 10.3 image on a 64-bit platform:

RAX=ffff810001008220 RBX=ffff81002f88a160 RCX=0000000000000036 RDX=0000000000000000
RSI=ffffe20000065aa0 RDI=ffff81002f88a164 RBP=ffff81002df99e68 RSP=ffff81002df99e68
R8 =0000000000000000 R9 =0000000000000000 R10=ffff81002df99db8 R11=0000000000010246
R12=ffff81002f88a164 R13=0000000000000004 R14=ffff81002f4a6b10 R15=ffff81002df99f58
RIP=ffffffff80447515 RFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00a09b00
SS =0000 0000000000000000 ffffffff 00c09300
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ffffffff8059b000 00000000 00000000
LDT=0000 0000000000000000 00000000 00008000
TR =0040 ffff81000101c280 00002087 00008900
GDT=     ffffffff8061e000 00000080
IDT=     ffffffff8067f000 00000fff
CR0=8005003b CR2=00007fff4183bf70 CR3=000000002e8a7000 CR4=000006a0
Unsupported return value: 0xffffffff

Kernel log says

  kqemu: aborting: Unexpected exception 0x0d in monitor space
  err=0000 CS:EIP=f180:00000000f0001f6f SS:SP=0000:00000000f00c6e20

with the official kqemu and, interestingly,

  kqemu: aborting: mon_get_ptel_l3() failed

with Ben's repos.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-28 16:55       ` Jan Kiszka
  2008-05-28 18:34         ` Jan Kiszka
@ 2008-05-29 12:29         ` Fabrice Bellard
  2008-05-29 13:16           ` Jan Kiszka
  2008-05-29 16:13           ` Jamie Lokier
  1 sibling, 2 replies; 31+ messages in thread
From: Fabrice Bellard @ 2008-05-29 12:29 UTC (permalink / raw)
  To: qemu-devel

Jan Kiszka wrote:
> Fabrice Bellard wrote:
>> Jan Kiszka wrote:
>>> Fabrice Bellard wrote:
>>>> Jan Kiszka wrote:
>>>>> Hi,
>>>>>
>>>>> is there a technical reason why the kqemu kernel module is built out of
>>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>>>>> date back to the time when wrapper and core were distributed under
>>>>> different licenses?
>>>> This is a technical reason: the "blob" is run in an address space
>>>> different from the host kernel.
>>> Well, easy to claim, I know, but I don't think this is a hard reason.
>>> However, as overcoming genmon and genoffset may require quite some
>>> refactoring, I'm not sure if it's worth it.
>> I may change the monitor blob format to ELF to allow relocation, but the
>> idea stays the same, and I don't think you can do it another way...
> 
> I agree (from my current knowledge of the problem) that the monitor
> remains "foreign" code to the kernel module. But at least the
> repackaging into a c-structure should be unnecessary.
> 
> The offset generation can be skipped if the assembly files are converted
> into inline assembly. Might be tricky in some cases, but I see no
> show-stopper yet.

This is purely cosmetic and I am generally against such changes.

> The give it a tiny start, I will look if I can unify the build process
> for all "true" kernel components. That is what currently breaks the
> debugability of the driver frame (up to kernel2monitor), and which also
> causes a kbuild warning. Likely harmless ATM, but it is fragile on
> long-term.

For true kernel components I agree it is useful.

Regarding the kqemu evolution, I am doing small API changes to make it 
more independent from the QEMU internal data structures and to allow 
usage from a 32 bit user QEMU application with a 64 bit host. There is 
also another small change I did some time ago but never published to 
allow paravirtualization of the Linux kernel.

Fabrice.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-29 12:29         ` Fabrice Bellard
@ 2008-05-29 13:16           ` Jan Kiszka
  2008-05-29 16:13           ` Jamie Lokier
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-29 13:16 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard wrote:
> Jan Kiszka wrote:
>> Fabrice Bellard wrote:
>>> Jan Kiszka wrote:
>>>> Fabrice Bellard wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> is there a technical reason why the kqemu kernel module is built
>>>>>> out of
>>>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply
>>>>>> date back to the time when wrapper and core were distributed under
>>>>>> different licenses?
>>>>> This is a technical reason: the "blob" is run in an address space
>>>>> different from the host kernel.
>>>> Well, easy to claim, I know, but I don't think this is a hard reason.
>>>> However, as overcoming genmon and genoffset may require quite some
>>>> refactoring, I'm not sure if it's worth it.
>>> I may change the monitor blob format to ELF to allow relocation, but the
>>> idea stays the same, and I don't think you can do it another way...
>>
>> I agree (from my current knowledge of the problem) that the monitor
>> remains "foreign" code to the kernel module. But at least the
>> repackaging into a c-structure should be unnecessary.
>>
>> The offset generation can be skipped if the assembly files are converted
>> into inline assembly. Might be tricky in some cases, but I see no
>> show-stopper yet.
> 
> This is purely cosmetic and I am generally against such changes.

See, the current code structure is not optimal /wrt understandability.
KQEMU is a complex topic, no question. But this doesn't mean the
structuring need to be that complex as well. Everything that helps to
make things straighter, quicker to overview, can also help third parties
to analyze KQEMU, debug potential issues, or even enhance its feature set.

>> The give it a tiny start, I will look if I can unify the build process
>> for all "true" kernel components. That is what currently breaks the
>> debugability of the driver frame (up to kernel2monitor), and which also
>> causes a kbuild warning. Likely harmless ATM, but it is fragile on
>> long-term.
> 
> For true kernel components I agree it is useful.
> 
> Regarding the kqemu evolution, I am doing small API changes to make it
> more independent from the QEMU internal data structures and to allow
> usage from a 32 bit user QEMU application with a 64 bit host. There is
> also another small change I did some time ago but never published to
> allow paravirtualization of the Linux kernel.

OK, thanks for the info. Just leaves me with the open questions about
the planned license(s) and how/where KQEMU is going to be maintained in
the future. I would really like to see it being driven as actively (and
broadly) as the QEMU core - specifically as long as HW-virtualization is
still not the rule on existing platforms :-/.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 12:29         ` Fabrice Bellard
  2008-05-29 13:16           ` Jan Kiszka
@ 2008-05-29 16:13           ` Jamie Lokier
  2008-05-29 16:26             ` Paul Brook
                               ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Jamie Lokier @ 2008-05-29 16:13 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard wrote:
> Regarding the kqemu evolution, I am doing small API changes to make it 
> more independent from the QEMU internal data structures and to allow 
> usage from a 32 bit user QEMU application with a 64 bit host. There is 
> also another small change I did some time ago but never published to 
> allow paravirtualization of the Linux kernel.

Do you see integrating it with KVM at some point, developing a merged
API which supports both hardware-assisted (kvm) or software-assisted
(kqemu) depending on the host's CPU?

Right now, although it's come from a different background, from a
user's perspective kvm seems to do essentially the same as kqemu,
except kvm is faster and kqemu runs on more x86 CPUs.

I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
think that's their names).  It would be great if it hard a third KQEMU
sub-module (which would of course be the most complicated ;-) to make
running vMs even more independent of the host CPU.

That would require adding kqemu's software translation/scanning
callbacks to kvm's API, or vice versa.  But it would have the bonus of
adding kvm's in-kernel fast APIC emulation to kqemu, possibly the
paravirt and virtio stuff too, and further unifying kvm-using and
kqemu-using systems, and combining developer attention from these
different projects, which all seem to be in the same direction.

As someone interested in emulator development I understand the
different histories of kqemu and kvm.  As a user, however, it seems
logical at this point to begin seeing them as different ways of
achieving the same thing, depending on the host CPU capabilities, and
those things which should not depend on the host CPU - such as virtio,
APIC emulation etc. - ought to share the same kernel code.

-- Jamie

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:13           ` Jamie Lokier
@ 2008-05-29 16:26             ` Paul Brook
  2008-05-29 16:35               ` Jamie Lokier
  2008-05-29 16:26             ` Anthony Liguori
  2008-05-29 16:48             ` Jan Kiszka
  2 siblings, 1 reply; 31+ messages in thread
From: Paul Brook @ 2008-05-29 16:26 UTC (permalink / raw)
  To: qemu-devel

> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
> think that's their names).  It would be great if it hard a third KQEMU
> sub-module (which would of course be the most complicated ;-)

I believe this is also a prerequisite for getting kqemu merged into maintream 
kernels, which IMHO is the only sane goal to have. Out of tree kernel modules 
simply aren't worth the effort.

Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:26             ` Paul Brook
@ 2008-05-29 16:35               ` Jamie Lokier
  2008-05-29 17:43                 ` Anthony Liguori
  0 siblings, 1 reply; 31+ messages in thread
From: Jamie Lokier @ 2008-05-29 16:35 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
> > I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
> > think that's their names).  It would be great if it hard a third KQEMU
> > sub-module (which would of course be the most complicated ;-)
> 
> I believe this is also a prerequisite for getting kqemu merged into
> maintream kernels, which IMHO is the only sane goal to have. Out of
> tree kernel modules simply aren't worth the effort.

I think there's utility in crossover between both of them too.

Sometimes it would be nice to have the speed and directness of kvm,
with the code scanning and replacement abilities of kqemu to block
particular instructions, pretend to be a specific CPU model, or
replace some hardware-accessing instruction sequences instead of
trapping and emulating them - without the guest seeing the replacement.

-- Jamie

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:35               ` Jamie Lokier
@ 2008-05-29 17:43                 ` Anthony Liguori
  2008-05-29 21:46                   ` Fabrice Bellard
  0 siblings, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-05-29 17:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paul Brook

Jamie Lokier wrote:
> Paul Brook wrote:
>   
>>> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
>>> think that's their names).  It would be great if it hard a third KQEMU
>>> sub-module (which would of course be the most complicated ;-)
>>>       
>> I believe this is also a prerequisite for getting kqemu merged into
>> maintream kernels, which IMHO is the only sane goal to have. Out of
>> tree kernel modules simply aren't worth the effort.
>>     
>
> I think there's utility in crossover between both of them too.
>   

There are some architectural incompatibilities.  For instance, KVM 
support guest SMP but the code TCG generates does not ensure atomic 
operations are truly atomic.  In general, it may not be possible to do 
this across architectures without employing the use of a big lock.

Also, when you mix dynamic translation in userspace with direct 
execution, it implies you have to completely flush the shadow page table 
cache.  This is going to severely impact performance so I don't know 
that there are a lot of circumstances where using TCG would improve 
performance.

KVM already does some instruction patching FWIW.  For instance, TPR 
accesses are modified in Windows guests to prevent a vmexit from 
occurring since Windows accesses the TPR so frequently.

Regards,

Anthony Liguori

> Sometimes it would be nice to have the speed and directness of kvm,
> with the code scanning and replacement abilities of kqemu to block
> particular instructions, pretend to be a specific CPU model, or
> replace some hardware-accessing instruction sequences instead of
> trapping and emulating them - without the guest seeing the replacement.
>
> -- Jamie
>
>
>   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 17:43                 ` Anthony Liguori
@ 2008-05-29 21:46                   ` Fabrice Bellard
  2008-05-30  3:32                     ` Mulyadi Santosa
  0 siblings, 1 reply; 31+ messages in thread
From: Fabrice Bellard @ 2008-05-29 21:46 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> Jamie Lokier wrote:
>> Paul Brook wrote:
>>  
>>>> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
>>>> think that's their names).  It would be great if it hard a third KQEMU
>>>> sub-module (which would of course be the most complicated ;-)
>>>>       
>>> I believe this is also a prerequisite for getting kqemu merged into
>>> maintream kernels, which IMHO is the only sane goal to have. Out of
>>> tree kernel modules simply aren't worth the effort.
>>>     
>>
>> I think there's utility in crossover between both of them too.
>>   
> 
> There are some architectural incompatibilities.  For instance, KVM
> support guest SMP but the code TCG generates does not ensure atomic
> operations are truly atomic.  In general, it may not be possible to do
> this across architectures without employing the use of a big lock.

But for the x86 on x86 case, it seems possible to make QEMU/TCG SMP safe
(it would consist in using x86 lock instructions on the host when the
guest uses them).

> Also, when you mix dynamic translation in userspace with direct
> execution, it implies you have to completely flush the shadow page table
> cache.  This is going to severely impact performance so I don't know
> that there are a lot of circumstances where using TCG would improve
> performance.
>
> KVM already does some instruction patching FWIW.  For instance, TPR
> accesses are modified in Windows guests to prevent a vmexit from
> occurring since Windows accesses the TPR so frequently.

Code patching seems interesting. Although I did not look in detail, it
seems that VirtualBox use it extensively and gets very good performance
without using hardware virtualization. The "beauty" of it is that the
code patching hacks can stay outside the kernel module. I wonder what
are their plan for their kernel module !

Anyway, I don't think it is worth trying to get kqemu into the Linux
kernel. Moreover, I have no plan to change the kqemu interface to match
the one of KVM. It seems simpler just to have a wrapper for both inside
the user space QEMU. However, my upcoming changes for kqemu and QEMU
will get the interface closer because kqemu will no longer peek into the
QEMU physical to ram translation table.

Fabrice.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 21:46                   ` Fabrice Bellard
@ 2008-05-30  3:32                     ` Mulyadi Santosa
  2008-05-30  8:14                       ` Andreas Färber
  0 siblings, 1 reply; 31+ messages in thread
From: Mulyadi Santosa @ 2008-05-30  3:32 UTC (permalink / raw)
  To: qemu-devel

Hi..

On Fri, May 30, 2008 at 4:46 AM, Fabrice Bellard <fabrice@bellard.org> wrote:
>
> Code patching seems interesting. Although I did not look in detail, it
> seems that VirtualBox use it extensively and gets very good performance
> without using hardware virtualization.

I second that. Beside being Qemu users, I am also now a loyal user of
VirtualBox. I guess that VBox can identify hot spot (repeating
instructions or TB) and tries harder and harder to optimize it. It
could be related to what I call "smart flush of translation cache"...
not entirely flushing cached TB but selectively doing so.

However, I also guess that VBox is tightly related to its kernel
module, thus without it ...it might be slower than Qemu/TCG..but I
have no hard data to support it.

Now, I wonder how transitive does sparc to x86 translation while still
maintaining speed? Does it do what linux-user does?

regards,

Mulyadi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-30  3:32                     ` Mulyadi Santosa
@ 2008-05-30  8:14                       ` Andreas Färber
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Färber @ 2008-05-30  8:14 UTC (permalink / raw)
  To: qemu-devel

Hi,

Am 30.05.2008 um 05:32 schrieb Mulyadi Santosa:

> On Fri, May 30, 2008 at 4:46 AM, Fabrice Bellard  
> <fabrice@bellard.org> wrote:
>>
>> Code patching seems interesting. Although I did not look in detail,  
>> it
>> seems that VirtualBox use it extensively and gets very good  
>> performance
>> without using hardware virtualization.
>
> I second that. Beside being Qemu users, I am also now a loyal user of
> VirtualBox. I guess that VBox can identify hot spot (repeating
> instructions or TB) and tries harder and harder to optimize it. It
> could be related to what I call "smart flush of translation cache"...
> not entirely flushing cached TB but selectively doing so.
>
> However, I also guess that VBox is tightly related to its kernel
> module, thus without it ...it might be slower than Qemu/TCG..but I
> have no hard data to support it.

I've tried VirtualBox on a Core Duo Mac and despite its kernel module,  
in my perception, it is significantly slower than Q, which does not  
have any hypervisor. The responsiveness to moving the mouse around is  
better though in VirtualBox.

Andreas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:13           ` Jamie Lokier
  2008-05-29 16:26             ` Paul Brook
@ 2008-05-29 16:26             ` Anthony Liguori
  2008-05-29 16:53               ` Jan Kiszka
  2008-05-29 21:52               ` Fabrice Bellard
  2008-05-29 16:48             ` Jan Kiszka
  2 siblings, 2 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-05-29 16:26 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Fabrice Bellard wrote:
>   
>> Regarding the kqemu evolution, I am doing small API changes to make it 
>> more independent from the QEMU internal data structures and to allow 
>> usage from a 32 bit user QEMU application with a 64 bit host. There is 
>> also another small change I did some time ago but never published to 
>> allow paravirtualization of the Linux kernel.
>>     
>
> Do you see integrating it with KVM at some point, developing a merged
> API which supports both hardware-assisted (kvm) or software-assisted
> (kqemu) depending on the host's CPU?
>
> Right now, although it's come from a different background, from a
> user's perspective kvm seems to do essentially the same as kqemu,
> except kvm is faster and kqemu runs on more x86 CPUs.
>
> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
> think that's their names).  It would be great if it hard a third KQEMU
> sub-module (which would of course be the most complicated ;-) to make
> running vMs even more independent of the host CPU.
>   

It wouldn't be too bad if you focused on kqemu-user and limited yourself 
to UP guests.  The first step would be getting the existing KVM support 
code to function with TCG.  For instance, use TCG to run 16-bit code, 
and then KVM to run 32/64-bit code.  Once that was all worked out, the 
rest would be pretty straight-forward porting and code cleanup.

> That would require adding kqemu's software translation/scanning
> callbacks to kvm's API, or vice versa.  But it would have the bonus of
> adding kvm's in-kernel fast APIC emulation to kqemu, possibly the
> paravirt and virtio stuff too, and further unifying kvm-using and
> kqemu-using systems, and combining developer attention from these
> different projects, which all seem to be in the same direction.
>   

There's nothing stopping virtio from being used by QEMU + kqemu except 
for my slowness in improving the code such that it performs well and is 
acceptable to QEMU.

FWIW, the l1_phys_map table is a current hurdle in getting performance.  
When we use proper accessors to access the virtio_ring, we end up taking 
a significant performance hit (around 20% on iperf).  I have some simple 
patches that implement a page_desc cache that cache the RAM regions in a 
linear array.  That helps get most of it back.

I'd really like to remove the l1_phys_map entirely and replace it with a 
sorted list of regions.  I think this would have an overall performance 
improvement since its much more cache friendly.  One thing keeping this 
from happening is the fact that the data structure is passed up to the 
kernel for kqemu.  Eliminating that dependency would be a very good thing!

Regards,

Anthony Liguori

> As someone interested in emulator development I understand the
> different histories of kqemu and kvm.  As a user, however, it seems
> logical at this point to begin seeing them as different ways of
> achieving the same thing, depending on the host CPU capabilities, and
> those things which should not depend on the host CPU - such as virtio,
> APIC emulation etc. - ought to share the same kernel code.
>
> -- Jamie
>
>
>   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:26             ` Anthony Liguori
@ 2008-05-29 16:53               ` Jan Kiszka
  2008-05-29 17:48                 ` Anthony Liguori
  2008-05-31 10:18                 ` Avi Kivity
  2008-05-29 21:52               ` Fabrice Bellard
  1 sibling, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2008-05-29 16:53 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> Jamie Lokier wrote:
>> Fabrice Bellard wrote:
>>  
>>> Regarding the kqemu evolution, I am doing small API changes to make
>>> it more independent from the QEMU internal data structures and to
>>> allow usage from a 32 bit user QEMU application with a 64 bit host.
>>> There is also another small change I did some time ago but never
>>> published to allow paravirtualization of the Linux kernel.
>>>     
>>
>> Do you see integrating it with KVM at some point, developing a merged
>> API which supports both hardware-assisted (kvm) or software-assisted
>> (kqemu) depending on the host's CPU?
>>
>> Right now, although it's come from a different background, from a
>> user's perspective kvm seems to do essentially the same as kqemu,
>> except kvm is faster and kqemu runs on more x86 CPUs.
>>
>> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
>> think that's their names).  It would be great if it hard a third KQEMU
>> sub-module (which would of course be the most complicated ;-) to make
>> running vMs even more independent of the host CPU.
>>   
> 
> It wouldn't be too bad if you focused on kqemu-user and limited yourself
> to UP guests.  The first step would be getting the existing KVM support
> code to function with TCG.  For instance, use TCG to run 16-bit code,
> and then KVM to run 32/64-bit code.  Once that was all worked out, the
> rest would be pretty straight-forward porting and code cleanup.

I guess you mean real-mode code with 16-bit here. /me always wondered
why it takes an in-kernel code interpreter for kvm to achieve this - at
least as long as it runs via qemu.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:53               ` Jan Kiszka
@ 2008-05-29 17:48                 ` Anthony Liguori
  2008-05-31 10:18                 ` Avi Kivity
  1 sibling, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-05-29 17:48 UTC (permalink / raw)
  To: qemu-devel

Jan Kiszka wrote:
> Anthony Liguori wrote:
>   
>>
>> It wouldn't be too bad if you focused on kqemu-user and limited yourself
>> to UP guests.  The first step would be getting the existing KVM support
>> code to function with TCG.  For instance, use TCG to run 16-bit code,
>> and then KVM to run 32/64-bit code.  Once that was all worked out, the
>> rest would be pretty straight-forward porting and code cleanup.
>>     
>
> I guess you mean real-mode code with 16-bit here. /me always wondered
> why it takes an in-kernel code interpreter for kvm to achieve this - at
> least as long as it runs via qemu.
>   

We don't use an in-kernel interpreter, we use vm86 mode for 16-bit 
code.  There is an in-kernel interpreter (x86_emulate) but that is used 
mostly for handling shadow page table faults.

Regards,

Anthony Liguori

> Jan
>
>   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:53               ` Jan Kiszka
  2008-05-29 17:48                 ` Anthony Liguori
@ 2008-05-31 10:18                 ` Avi Kivity
  2008-06-02 16:34                   ` Jamie Lokier
  1 sibling, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2008-05-31 10:18 UTC (permalink / raw)
  To: qemu-devel

Jan Kiszka wrote:

  

>> It wouldn't be too bad if you focused on kqemu-user and limited yourself
>> to UP guests.  The first step would be getting the existing KVM support
>> code to function with TCG.  For instance, use TCG to run 16-bit code,
>> and then KVM to run 32/64-bit code.  Once that was all worked out, the
>> rest would be pretty straight-forward porting and code cleanup.
>>     
>
> I guess you mean real-mode code with 16-bit here. /me always wondered
> why it takes an in-kernel code interpreter for kvm to achieve this - at
> least as long as it runs via qemu.
>   

kvm started out with qemu emulating 16-bit code (and before that, even 
32-bit code; kvm only did 64-bit).

The reason I don't like this approach is that it makes the interface 
complex and hard to understand, and makes kvm heavily tied into qemu.

Some problems that arise from having qemu emulate code:
- difficult to do smp properly
- qemu needs to be able to inject mmio for in-kernel emulated devices
- in-kernel devices (lapic, etc.) need to interact with guest code 
executing in userspace



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-31 10:18                 ` Avi Kivity
@ 2008-06-02 16:34                   ` Jamie Lokier
  0 siblings, 0 replies; 31+ messages in thread
From: Jamie Lokier @ 2008-06-02 16:34 UTC (permalink / raw)
  To: qemu-devel

Avi Kivity wrote:
> kvm started out with qemu emulating 16-bit code (and before that, even 
> 32-bit code; kvm only did 64-bit).
> 
> The reason I don't like this approach is that it makes the interface 
> complex and hard to understand, and makes kvm heavily tied into qemu.
> 
> Some problems that arise from having qemu emulate code:
> - difficult to do smp properly

Now that atomic ops will be translated to atomic ops, and futex is
translated to host futex, and I think this is solved.

> - qemu needs to be able to inject mmio for in-kernel emulated devices
> - in-kernel devices (lapic, etc.) need to interact with guest code 
> executing in userspace

These two seem to apply equally if kqemu is made to work with
in-kernel emulated devices, which seems useful for exactly the same
reasons as kvm does.

-- Jamie

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:26             ` Anthony Liguori
  2008-05-29 16:53               ` Jan Kiszka
@ 2008-05-29 21:52               ` Fabrice Bellard
  2008-05-31 10:06                 ` Avi Kivity
  2008-06-01 22:58                 ` Anthony Liguori
  1 sibling, 2 replies; 31+ messages in thread
From: Fabrice Bellard @ 2008-05-29 21:52 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> [...]
> FWIW, the l1_phys_map table is a current hurdle in getting performance. 
> When we use proper accessors to access the virtio_ring, we end up taking
> a significant performance hit (around 20% on iperf).  I have some simple
> patches that implement a page_desc cache that cache the RAM regions in a
> linear array.  That helps get most of it back.
> 
> I'd really like to remove the l1_phys_map entirely and replace it with a
> sorted list of regions.  I think this would have an overall performance
> improvement since its much more cache friendly.  One thing keeping this
> from happening is the fact that the data structure is passed up to the
> kernel for kqemu.  Eliminating that dependency would be a very good thing!

If the l1_phys_map is a performance bottleneck it means that the
internals of QEMU are not properly used. In QEMU/kqemu, it is not
accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
KVM cannot use a similar system.

Fabrice.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 21:52               ` Fabrice Bellard
@ 2008-05-31 10:06                 ` Avi Kivity
  2008-06-01 22:58                 ` Anthony Liguori
  1 sibling, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2008-05-31 10:06 UTC (permalink / raw)
  To: qemu-devel, Fabrice Bellard

Fabrice Bellard wrote:
> Anthony Liguori wrote:
>   
>> [...]
>> FWIW, the l1_phys_map table is a current hurdle in getting performance. 
>> When we use proper accessors to access the virtio_ring, we end up taking
>> a significant performance hit (around 20% on iperf).  I have some simple
>> patches that implement a page_desc cache that cache the RAM regions in a
>> linear array.  That helps get most of it back.
>>
>> I'd really like to remove the l1_phys_map entirely and replace it with a
>> sorted list of regions.  I think this would have an overall performance
>> improvement since its much more cache friendly.  One thing keeping this
>> from happening is the fact that the data structure is passed up to the
>> kernel for kqemu.  Eliminating that dependency would be a very good thing!
>>     
>
> If the l1_phys_map is a performance bottleneck it means that the
> internals of QEMU are not properly used. In QEMU/kqemu, it is not
> accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
> KVM cannot use a similar system.
>
>   

In that case, replacing l1_phys_map by a region list is a good thing.  
l1_phys_map consumes a large amount of memory.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 21:52               ` Fabrice Bellard
  2008-05-31 10:06                 ` Avi Kivity
@ 2008-06-01 22:58                 ` Anthony Liguori
  2008-06-02  9:02                   ` Fabrice Bellard
  1 sibling, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-06-01 22:58 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard wrote:
> Anthony Liguori wrote:
>   
>> [...]
>> FWIW, the l1_phys_map table is a current hurdle in getting performance. 
>> When we use proper accessors to access the virtio_ring, we end up taking
>> a significant performance hit (around 20% on iperf).  I have some simple
>> patches that implement a page_desc cache that cache the RAM regions in a
>> linear array.  That helps get most of it back.
>>
>> I'd really like to remove the l1_phys_map entirely and replace it with a
>> sorted list of regions.  I think this would have an overall performance
>> improvement since its much more cache friendly.  One thing keeping this
>> from happening is the fact that the data structure is passed up to the
>> kernel for kqemu.  Eliminating that dependency would be a very good thing!
>>     
>
> If the l1_phys_map is a performance bottleneck it means that the
> internals of QEMU are not properly used. In QEMU/kqemu, it is not
> accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
> KVM cannot use a similar system.
>   

This is for device emulation.  KVM doesn't use l1_phys_map() for things 
like shadow page table accesses.

In the device emulation, we're currently using stl_phys() and friends.  
This goes through a full lookup in l1_phys_map.

Looking at other devices, some use phys_ram_base + PA and stl_raw() 
which is broken but faster.  A few places call 
cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw().  
This is okay but it still requires at least one l1_phys_map lookup per 
operation in the device (packet receive, io notification, etc.).  I 
don't think that's going to help much because in our fast paths, we're 
only doing 2 or 3 stl_phys() operations.

At least on x86, there are very few regions of RAM.  That makes it very 
easy to cache.  A TLB style cache seems wrong to me because there are so 
few RAM regions.  I don't see a better way to do this with the existing 
APIs.

Regards,

Anthony Liguori

> Fabrice.
>
>
>
>   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-06-01 22:58                 ` Anthony Liguori
@ 2008-06-02  9:02                   ` Fabrice Bellard
  2008-06-02 13:25                     ` Anthony Liguori
  0 siblings, 1 reply; 31+ messages in thread
From: Fabrice Bellard @ 2008-06-02  9:02 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> Fabrice Bellard wrote:
>> Anthony Liguori wrote:
>>  
>>> [...]
>>> FWIW, the l1_phys_map table is a current hurdle in getting 
>>> performance. When we use proper accessors to access the virtio_ring, 
>>> we end up taking
>>> a significant performance hit (around 20% on iperf).  I have some simple
>>> patches that implement a page_desc cache that cache the RAM regions in a
>>> linear array.  That helps get most of it back.
>>>
>>> I'd really like to remove the l1_phys_map entirely and replace it with a
>>> sorted list of regions.  I think this would have an overall performance
>>> improvement since its much more cache friendly.  One thing keeping this
>>> from happening is the fact that the data structure is passed up to the
>>> kernel for kqemu.  Eliminating that dependency would be a very good 
>>> thing!
>>>     
>>
>> If the l1_phys_map is a performance bottleneck it means that the
>> internals of QEMU are not properly used. In QEMU/kqemu, it is not
>> accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
>> KVM cannot use a similar system.
>>   
> 
> This is for device emulation.  KVM doesn't use l1_phys_map() for things 
> like shadow page table accesses.
> 
> In the device emulation, we're currently using stl_phys() and friends.  
> This goes through a full lookup in l1_phys_map.
> 
> Looking at other devices, some use phys_ram_base + PA and stl_raw() 
> which is broken but faster.  A few places call 
> cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw().  
> This is okay but it still requires at least one l1_phys_map lookup per 
> operation in the device (packet receive, io notification, etc.).  I 
> don't think that's going to help much because in our fast paths, we're 
> only doing 2 or 3 stl_phys() operations.
> 
> At least on x86, there are very few regions of RAM.  That makes it very 
> easy to cache.  A TLB style cache seems wrong to me because there are so 
> few RAM regions.  I don't see a better way to do this with the existing 
> APIs.

I see your point. st/ldx_phys() were never optimized in fact.

A first solution would be to use a cache similar to the TLBs. It has the 
advantage is being quite generic and fast. Another solution would be to 
compute a few intervals with are tested before the generic case. These 
intervals would correspond to the main RAM area and would be updated 
each time a new device region is registered.

Does your remark implies that KVM switches back to the QEMU process for 
each I/O ? If so, the l1_phys_map access time should be negligible 
compared to the SVM-VMX/kernel/user context switch !

Fabrice.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-06-02  9:02                   ` Fabrice Bellard
@ 2008-06-02 13:25                     ` Anthony Liguori
  0 siblings, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-06-02 13:25 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard wrote:
>> This is for device emulation.  KVM doesn't use l1_phys_map() for 
>> things like shadow page table accesses.
>>
>> In the device emulation, we're currently using stl_phys() and 
>> friends.  This goes through a full lookup in l1_phys_map.
>>
>> Looking at other devices, some use phys_ram_base + PA and stl_raw() 
>> which is broken but faster.  A few places call 
>> cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw().  
>> This is okay but it still requires at least one l1_phys_map lookup 
>> per operation in the device (packet receive, io notification, etc.).  
>> I don't think that's going to help much because in our fast paths, 
>> we're only doing 2 or 3 stl_phys() operations.
>>
>> At least on x86, there are very few regions of RAM.  That makes it 
>> very easy to cache.  A TLB style cache seems wrong to me because 
>> there are so few RAM regions.  I don't see a better way to do this 
>> with the existing APIs.
>
> I see your point. st/ldx_phys() were never optimized in fact.
>
> A first solution would be to use a cache similar to the TLBs. It has 
> the advantage is being quite generic and fast. Another solution would 
> be to compute a few intervals with are tested before the generic case. 
> These intervals would correspond to the main RAM area and would be 
> updated each time a new device region is registered.

I currently have a patch that takes the later approach.

> Does your remark implies that KVM switches back to the QEMU process 
> for each I/O ? If so, the l1_phys_map access time should be negligible 
> compared to the SVM-VMX/kernel/user context switch !

Most MMIO/PIO cause an exit to QEMU.  We run the main loop in an 
dedicated thread though so packet delivery is handled without forcing a 
VCPU to exit.

Regards,

Anthony Liguori

> Fabrice.
>
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:13           ` Jamie Lokier
  2008-05-29 16:26             ` Paul Brook
  2008-05-29 16:26             ` Anthony Liguori
@ 2008-05-29 16:48             ` Jan Kiszka
  2008-05-29 17:47               ` Anthony Liguori
  2 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2008-05-29 16:48 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Fabrice Bellard wrote:
>> Regarding the kqemu evolution, I am doing small API changes to make it 
>> more independent from the QEMU internal data structures and to allow 
>> usage from a 32 bit user QEMU application with a 64 bit host. There is 
>> also another small change I did some time ago but never published to 
>> allow paravirtualization of the Linux kernel.
> 
> Do you see integrating it with KVM at some point, developing a merged
> API which supports both hardware-assisted (kvm) or software-assisted
> (kqemu) depending on the host's CPU?

I had the same idea while initially looking closer at kqemu, but I
didn't felt familiar enough with the code and its design requirements to
suggest this. :)

> 
> Right now, although it's come from a different background, from a
> user's perspective kvm seems to do essentially the same as kqemu,
> except kvm is faster and kqemu runs on more x86 CPUs.
> 
> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I
> think that's their names).  It would be great if it hard a third KQEMU
> sub-module (which would of course be the most complicated ;-) to make
> running vMs even more independent of the host CPU.

Well, already the same driver interface to userspace would be great. :->

> 
> That would require adding kqemu's software translation/scanning
> callbacks to kvm's API, or vice versa.  But it would have the bonus of
> adding kvm's in-kernel fast APIC emulation to kqemu, possibly the
> paravirt and virtio stuff too, and further unifying kvm-using and
> kqemu-using systems, and combining developer attention from these
> different projects, which all seem to be in the same direction.

The most important thing, IMHO, this /could/ open the door from mainline
integration of a software-based QEMU accelerator - surely the ultimate
goal /wrt to maintainability and distribution (on Linux).

> 
> As someone interested in emulator development I understand the
> different histories of kqemu and kvm.  As a user, however, it seems
> logical at this point to begin seeing them as different ways of
> achieving the same thing, depending on the host CPU capabilities, and
> those things which should not depend on the host CPU - such as virtio,
> APIC emulation etc. - ought to share the same kernel code.

Virtio on x86 requires no special host-kernel support, IIRC. But, yeah,
in-kernel irqchip (including APIC) is a further incentive to motivate
such step.

But - this is all nice on the drawing board. It just requires a
reasonable balance between required effort (wouldn't be small, I guess)
and future relevance. For x86, by AMD and Intel at least (not sure about
VIA right now), you see hardware virtualization in every new processor.
So kqemu becomes less and less relevant over the time. The thrilling
question is: Is that period long enough to justify a kqemu / soft-kvm,
and to push the result mainline?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] Re: KQEMU code organization
  2008-05-29 16:48             ` Jan Kiszka
@ 2008-05-29 17:47               ` Anthony Liguori
  0 siblings, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-05-29 17:47 UTC (permalink / raw)
  To: qemu-devel

Jan Kiszka wrote:
> Jamie Lokier wrote:
>   
> Virtio on x86 requires no special host-kernel support, IIRC. But, yeah,
> in-kernel irqchip (including APIC) is a further incentive to motivate
> such step.
>
> But - this is all nice on the drawing board. It just requires a
> reasonable balance between required effort (wouldn't be small, I guess)
> and future relevance. For x86, by AMD and Intel at least (not sure about
> VIA right now),

As of the recently announced Isaiah microarchitecture, VIA includes 
support for VT.

Regards,

Anthony Liguori

>  you see hardware virtualization in every new processor.
> So kqemu becomes less and less relevant over the time. The thrilling
> question is: Is that period long enough to justify a kqemu / soft-kvm,
> and to push the result mainline?
>
> Jan
>
>   

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-06-02 16:35 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka
2008-05-27 17:20 ` Ben Taylor
2008-05-27 18:25   ` [Qemu-devel] " Jan Kiszka
2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard
2008-05-27 21:40   ` [Qemu-devel] " Jan Kiszka
2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard
2008-05-28 16:02   ` [Qemu-devel] " Jan Kiszka
2008-05-28 16:37     ` Fabrice Bellard
2008-05-28 16:55       ` Jan Kiszka
2008-05-28 18:34         ` Jan Kiszka
2008-05-29 12:29         ` Fabrice Bellard
2008-05-29 13:16           ` Jan Kiszka
2008-05-29 16:13           ` Jamie Lokier
2008-05-29 16:26             ` Paul Brook
2008-05-29 16:35               ` Jamie Lokier
2008-05-29 17:43                 ` Anthony Liguori
2008-05-29 21:46                   ` Fabrice Bellard
2008-05-30  3:32                     ` Mulyadi Santosa
2008-05-30  8:14                       ` Andreas Färber
2008-05-29 16:26             ` Anthony Liguori
2008-05-29 16:53               ` Jan Kiszka
2008-05-29 17:48                 ` Anthony Liguori
2008-05-31 10:18                 ` Avi Kivity
2008-06-02 16:34                   ` Jamie Lokier
2008-05-29 21:52               ` Fabrice Bellard
2008-05-31 10:06                 ` Avi Kivity
2008-06-01 22:58                 ` Anthony Liguori
2008-06-02  9:02                   ` Fabrice Bellard
2008-06-02 13:25                     ` Anthony Liguori
2008-05-29 16:48             ` Jan Kiszka
2008-05-29 17:47               ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).