* [Qemu-devel] KQEMU code organization @ 2008-05-27 16:56 Jan Kiszka 2008-05-27 17:20 ` Ben Taylor ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-27 16:56 UTC (permalink / raw) To: qemu-devel Hi, is there a technical reason why the kqemu kernel module is built out of a binary blob (monitor-image.bin->monitor-image.h)? Does this simply date back to the time when wrapper and core were distributed under different licenses? I'm currently trying to hunt down a (probable) bug in kqemu, and the monitor is now unfortunately a white spot for the source-level debugger. So far I only managed to make the rest visible. BTW, am I missing an official code repository of kqemu? Why is there no subfolder, e.g., in the qemu svn repos? So patches should be provided against 1.3.0pre11, right? Thanks, Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] KQEMU code organization 2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka @ 2008-05-27 17:20 ` Ben Taylor 2008-05-27 18:25 ` [Qemu-devel] " Jan Kiszka 2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard 2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard 2 siblings, 1 reply; 31+ messages in thread From: Ben Taylor @ 2008-05-27 17:20 UTC (permalink / raw) To: qemu-devel On Tue, May 27, 2008 at 12:56 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > Hi, > > is there a technical reason why the kqemu kernel module is built out of > a binary blob (monitor-image.bin->monitor-image.h)? Does this simply > date back to the time when wrapper and core were distributed under > different licenses? > > I'm currently trying to hunt down a (probable) bug in kqemu, and the > monitor is now unfortunately a white spot for the source-level debugger. > So far I only managed to make the rest visible. > > BTW, am I missing an official code repository of kqemu? Why is there no > subfolder, e.g., in the qemu svn repos? So patches should be provided > against 1.3.0pre11, right? I maintain a version of the repository at http://svn9.cvsdude.com/kdesolaris/kqemu/trunk/1.0.3pre11/kqemu that includes the Solaris changes and other patches posted to the list that I've been able to test and integrate. despite the kqemu being under kdesolaris (my friend owns this tree), this is kqemu the kernel module, not kqemu the KDE qemu front end. :-) ben ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-27 17:20 ` Ben Taylor @ 2008-05-27 18:25 ` Jan Kiszka 0 siblings, 0 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-27 18:25 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 1778 bytes --] Ben Taylor wrote: > On Tue, May 27, 2008 at 12:56 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> Hi, >> >> is there a technical reason why the kqemu kernel module is built out of >> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >> date back to the time when wrapper and core were distributed under >> different licenses? >> >> I'm currently trying to hunt down a (probable) bug in kqemu, and the >> monitor is now unfortunately a white spot for the source-level debugger. >> So far I only managed to make the rest visible. >> >> BTW, am I missing an official code repository of kqemu? Why is there no >> subfolder, e.g., in the qemu svn repos? So patches should be provided >> against 1.3.0pre11, right? > > I maintain a version of the repository at > > http://svn9.cvsdude.com/kdesolaris/kqemu/trunk/1.0.3pre11/kqemu > > that includes the Solaris changes and other patches posted to the > list that I've been able to test and integrate. So this is the de-facto official development version? Quite a few changes in that tree, also to core stuff. Hmm. But nothing that fixes my spurious CPL degeneration. What a pity. However, you could merge another (minor) patch: Index: Makefile =================================================================== --- Makefile (Revision 17) +++ Makefile (Arbeitskopie) @@ -47,7 +47,7 @@ endif # !CONFIG_WIN32 clean: $(MAKE) -C common clean - rm -f kqemu.ko *.o *~ + rm -rf kqemu.ko *.o *~ .kqemu* Module.* modules.order kqemu.mod.c .tmp_versions endif # !CONFIG_SOLARIS Actually, more needs to be cleaned up /wrt Linux module building. But I'm reluctant to touch common/Makefile until the (current) reason for this code organization is known. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] KQEMU code organization 2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka 2008-05-27 17:20 ` Ben Taylor @ 2008-05-27 20:58 ` Fabrice Bellard 2008-05-27 21:40 ` [Qemu-devel] " Jan Kiszka 2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard 2 siblings, 1 reply; 31+ messages in thread From: Fabrice Bellard @ 2008-05-27 20:58 UTC (permalink / raw) To: qemu-devel; +Cc: jan.kiszka Hi, Regarding kqemu, I am still hesitating whether to commit it in the QEMU subversion repository. Moreover, I may change its license to another open source one so I would prefer that the patches are assigned to my copyright, especially if they are just small bugfixes. For your information, I will commit some incompatible API changes in kqemu in the next few days, so a new version will be needed anyway. Regards, Fabrice. Jan Kiszka wrote: > Hi, > > is there a technical reason why the kqemu kernel module is built out of > a binary blob (monitor-image.bin->monitor-image.h)? Does this simply > date back to the time when wrapper and core were distributed under > different licenses? > > I'm currently trying to hunt down a (probable) bug in kqemu, and the > monitor is now unfortunately a white spot for the source-level debugger. > So far I only managed to make the rest visible. > > BTW, am I missing an official code repository of kqemu? Why is there no > subfolder, e.g., in the qemu svn repos? So patches should be provided > against 1.3.0pre11, right? > > Thanks, > Jan > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard @ 2008-05-27 21:40 ` Jan Kiszka 0 siblings, 0 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-27 21:40 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 1123 bytes --] Hi Fabrice, Fabrice Bellard wrote: > Hi, > > Regarding kqemu, I am still hesitating whether to commit it in the QEMU > subversion repository. Moreover, I may change its license to another > open source one so I would prefer that the patches are assigned to my > copyright, especially if they are just small bugfixes. Hmm, that leaves an uncomfortable feeling on my side. If the licenses of the officially supported version did not include a GPL-compatible one, we would have to stick with what we have at the moment for Linux. Or will we see a dual licensed kqemu? > > For your information, I will commit some incompatible API changes in > kqemu in the next few days, so a new version will be needed anyway. What is the roadmap of kqemu then? Are there functional enhancements planned, or further performance tunings? What are those? BTW, I think I understood my problem with kqemu in the meantime: lcall from ring 0 => fails on lret as the real CS (with "wrong" RPL) is pushed onto the guest stack. Am I right? How to fix this best, by emulating lcall at kernel level? Thanks, Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] KQEMU code organization 2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka 2008-05-27 17:20 ` Ben Taylor 2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard @ 2008-05-27 22:11 ` Fabrice Bellard 2008-05-28 16:02 ` [Qemu-devel] " Jan Kiszka 2 siblings, 1 reply; 31+ messages in thread From: Fabrice Bellard @ 2008-05-27 22:11 UTC (permalink / raw) To: qemu-devel Jan Kiszka wrote: > Hi, > > is there a technical reason why the kqemu kernel module is built out of > a binary blob (monitor-image.bin->monitor-image.h)? Does this simply > date back to the time when wrapper and core were distributed under > different licenses? This is a technical reason: the "blob" is run in an address space different from the host kernel. Fabrice. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard @ 2008-05-28 16:02 ` Jan Kiszka 2008-05-28 16:37 ` Fabrice Bellard 0 siblings, 1 reply; 31+ messages in thread From: Jan Kiszka @ 2008-05-28 16:02 UTC (permalink / raw) To: qemu-devel Fabrice Bellard wrote: > Jan Kiszka wrote: >> Hi, >> >> is there a technical reason why the kqemu kernel module is built out of >> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >> date back to the time when wrapper and core were distributed under >> different licenses? > > This is a technical reason: the "blob" is run in an address space > different from the host kernel. Well, easy to claim, I know, but I don't think this is a hard reason. However, as overcoming genmon and genoffset may require quite some refactoring, I'm not sure if it's worth it. For debugging purposes I meanwhile created my own build system anyway. gdb fortunately accepts an monitor-image.out built with -g so that source level debugging of the monitor is possible as well. /me now needs to understand how this thing works... Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-28 16:02 ` [Qemu-devel] " Jan Kiszka @ 2008-05-28 16:37 ` Fabrice Bellard 2008-05-28 16:55 ` Jan Kiszka 0 siblings, 1 reply; 31+ messages in thread From: Fabrice Bellard @ 2008-05-28 16:37 UTC (permalink / raw) To: qemu-devel Jan Kiszka wrote: > Fabrice Bellard wrote: >> Jan Kiszka wrote: >>> Hi, >>> >>> is there a technical reason why the kqemu kernel module is built out of >>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >>> date back to the time when wrapper and core were distributed under >>> different licenses? >> This is a technical reason: the "blob" is run in an address space >> different from the host kernel. > > Well, easy to claim, I know, but I don't think this is a hard reason. > However, as overcoming genmon and genoffset may require quite some > refactoring, I'm not sure if it's worth it. I may change the monitor blob format to ELF to allow relocation, but the idea stays the same, and I don't think you can do it another way... > For debugging purposes I meanwhile created my own build system anyway. > gdb fortunately accepts an monitor-image.out built with -g so that > source level debugging of the monitor is possible as well. Right. This is what I do. Fabrice. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-28 16:37 ` Fabrice Bellard @ 2008-05-28 16:55 ` Jan Kiszka 2008-05-28 18:34 ` Jan Kiszka 2008-05-29 12:29 ` Fabrice Bellard 0 siblings, 2 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-28 16:55 UTC (permalink / raw) To: qemu-devel Fabrice Bellard wrote: > Jan Kiszka wrote: >> Fabrice Bellard wrote: >>> Jan Kiszka wrote: >>>> Hi, >>>> >>>> is there a technical reason why the kqemu kernel module is built out of >>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >>>> date back to the time when wrapper and core were distributed under >>>> different licenses? >>> This is a technical reason: the "blob" is run in an address space >>> different from the host kernel. >> >> Well, easy to claim, I know, but I don't think this is a hard reason. >> However, as overcoming genmon and genoffset may require quite some >> refactoring, I'm not sure if it's worth it. > > I may change the monitor blob format to ELF to allow relocation, but the > idea stays the same, and I don't think you can do it another way... I agree (from my current knowledge of the problem) that the monitor remains "foreign" code to the kernel module. But at least the repackaging into a c-structure should be unnecessary. The offset generation can be skipped if the assembly files are converted into inline assembly. Might be tricky in some cases, but I see no show-stopper yet. The give it a tiny start, I will look if I can unify the build process for all "true" kernel components. That is what currently breaks the debugability of the driver frame (up to kernel2monitor), and which also causes a kbuild warning. Likely harmless ATM, but it is fragile on long-term. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-28 16:55 ` Jan Kiszka @ 2008-05-28 18:34 ` Jan Kiszka 2008-05-29 12:29 ` Fabrice Bellard 1 sibling, 0 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-28 18:34 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 4249 bytes --] Jan Kiszka wrote: > Fabrice Bellard wrote: >> Jan Kiszka wrote: >>> Fabrice Bellard wrote: >>>> Jan Kiszka wrote: >>>>> Hi, >>>>> >>>>> is there a technical reason why the kqemu kernel module is built out of >>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >>>>> date back to the time when wrapper and core were distributed under >>>>> different licenses? >>>> This is a technical reason: the "blob" is run in an address space >>>> different from the host kernel. >>> Well, easy to claim, I know, but I don't think this is a hard reason. >>> However, as overcoming genmon and genoffset may require quite some >>> refactoring, I'm not sure if it's worth it. >> I may change the monitor blob format to ELF to allow relocation, but the >> idea stays the same, and I don't think you can do it another way... > > I agree (from my current knowledge of the problem) that the monitor > remains "foreign" code to the kernel module. But at least the > repackaging into a c-structure should be unnecessary. > > The offset generation can be skipped if the assembly files are converted > into inline assembly. Might be tricky in some cases, but I see no > show-stopper yet. > > The give it a tiny start, I will look if I can unify the build process > for all "true" kernel components. That is what currently breaks the > debugability of the driver frame (up to kernel2monitor), and which also > causes a kbuild warning. Likely harmless ATM, but it is fragile on > long-term. Here we go. Still not nice (I would put all monitor code in its own directory, moving those few host kernel bits into the top-level dir), but at least much cleaner from kbuild's POV. Signed-off-by: Jan Kiszka <jan.kiszka@web.de> --- Makefile | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) Index: b/Makefile =================================================================== --- a/Makefile +++ b/Makefile @@ -17,7 +17,7 @@ ifdef CONFIG_KBUILD26 all: kqemu.ko kqemu.ko: - make -C common all + make -C common monitor-image.h make -C $(KERNEL_PATH) M=`pwd` modules else @@ -38,7 +38,8 @@ endif # !CONFIG_WIN32 clean: $(MAKE) -C common clean - rm -f kqemu.ko *.o *~ + rm -rf kqemu.ko *.o *~ .kqemu* Module.* modules.order kqemu.mod.c .tmp_versions \ + common/.kernel* common/*/.kernel* FILES=configure Makefile README Changelog LICENSE COPYING \ install.sh kqemu-linux.c kqemu.h \ @@ -89,10 +90,10 @@ kqemu.o: $(kqemu-objs) else # called from 2.6 kernel kbuild -obj-m:= kqemu.o -kqemu-objs:= kqemu-linux.o kqemu-mod.o +EXTRA_AFLAGS=-I $(PWD)/common +EXTRA_CFLAGS=-I $(PWD) -$(obj)/kqemu-mod.o: $(src)/kqemu-mod-$(ARCH).o - cp $< $@ +obj-m:= kqemu.o +kqemu-objs:= kqemu-linux.o common/kernel.o common/$(ARCH)/kernel_asm.o endif endif # PATCHLEVEL BTW, there is more trouble ahead for kqemu. This is what I get booting a x86-64 OpenSuse 10.3 image on a 64-bit platform: RAX=ffff810001008220 RBX=ffff81002f88a160 RCX=0000000000000036 RDX=0000000000000000 RSI=ffffe20000065aa0 RDI=ffff81002f88a164 RBP=ffff81002df99e68 RSP=ffff81002df99e68 R8 =0000000000000000 R9 =0000000000000000 R10=ffff81002df99db8 R11=0000000000010246 R12=ffff81002f88a164 R13=0000000000000004 R14=ffff81002f4a6b10 R15=ffff81002df99f58 RIP=ffffffff80447515 RFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 00000000 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 SS =0000 0000000000000000 ffffffff 00c09300 DS =0000 0000000000000000 00000000 00000000 FS =0000 0000000000000000 00000000 00000000 GS =0000 ffffffff8059b000 00000000 00000000 LDT=0000 0000000000000000 00000000 00008000 TR =0040 ffff81000101c280 00002087 00008900 GDT= ffffffff8061e000 00000080 IDT= ffffffff8067f000 00000fff CR0=8005003b CR2=00007fff4183bf70 CR3=000000002e8a7000 CR4=000006a0 Unsupported return value: 0xffffffff Kernel log says kqemu: aborting: Unexpected exception 0x0d in monitor space err=0000 CS:EIP=f180:00000000f0001f6f SS:SP=0000:00000000f00c6e20 with the official kqemu and, interestingly, kqemu: aborting: mon_get_ptel_l3() failed with Ben's repos. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-28 16:55 ` Jan Kiszka 2008-05-28 18:34 ` Jan Kiszka @ 2008-05-29 12:29 ` Fabrice Bellard 2008-05-29 13:16 ` Jan Kiszka 2008-05-29 16:13 ` Jamie Lokier 1 sibling, 2 replies; 31+ messages in thread From: Fabrice Bellard @ 2008-05-29 12:29 UTC (permalink / raw) To: qemu-devel Jan Kiszka wrote: > Fabrice Bellard wrote: >> Jan Kiszka wrote: >>> Fabrice Bellard wrote: >>>> Jan Kiszka wrote: >>>>> Hi, >>>>> >>>>> is there a technical reason why the kqemu kernel module is built out of >>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >>>>> date back to the time when wrapper and core were distributed under >>>>> different licenses? >>>> This is a technical reason: the "blob" is run in an address space >>>> different from the host kernel. >>> Well, easy to claim, I know, but I don't think this is a hard reason. >>> However, as overcoming genmon and genoffset may require quite some >>> refactoring, I'm not sure if it's worth it. >> I may change the monitor blob format to ELF to allow relocation, but the >> idea stays the same, and I don't think you can do it another way... > > I agree (from my current knowledge of the problem) that the monitor > remains "foreign" code to the kernel module. But at least the > repackaging into a c-structure should be unnecessary. > > The offset generation can be skipped if the assembly files are converted > into inline assembly. Might be tricky in some cases, but I see no > show-stopper yet. This is purely cosmetic and I am generally against such changes. > The give it a tiny start, I will look if I can unify the build process > for all "true" kernel components. That is what currently breaks the > debugability of the driver frame (up to kernel2monitor), and which also > causes a kbuild warning. Likely harmless ATM, but it is fragile on > long-term. For true kernel components I agree it is useful. Regarding the kqemu evolution, I am doing small API changes to make it more independent from the QEMU internal data structures and to allow usage from a 32 bit user QEMU application with a 64 bit host. There is also another small change I did some time ago but never published to allow paravirtualization of the Linux kernel. Fabrice. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-29 12:29 ` Fabrice Bellard @ 2008-05-29 13:16 ` Jan Kiszka 2008-05-29 16:13 ` Jamie Lokier 1 sibling, 0 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-29 13:16 UTC (permalink / raw) To: qemu-devel Fabrice Bellard wrote: > Jan Kiszka wrote: >> Fabrice Bellard wrote: >>> Jan Kiszka wrote: >>>> Fabrice Bellard wrote: >>>>> Jan Kiszka wrote: >>>>>> Hi, >>>>>> >>>>>> is there a technical reason why the kqemu kernel module is built >>>>>> out of >>>>>> a binary blob (monitor-image.bin->monitor-image.h)? Does this simply >>>>>> date back to the time when wrapper and core were distributed under >>>>>> different licenses? >>>>> This is a technical reason: the "blob" is run in an address space >>>>> different from the host kernel. >>>> Well, easy to claim, I know, but I don't think this is a hard reason. >>>> However, as overcoming genmon and genoffset may require quite some >>>> refactoring, I'm not sure if it's worth it. >>> I may change the monitor blob format to ELF to allow relocation, but the >>> idea stays the same, and I don't think you can do it another way... >> >> I agree (from my current knowledge of the problem) that the monitor >> remains "foreign" code to the kernel module. But at least the >> repackaging into a c-structure should be unnecessary. >> >> The offset generation can be skipped if the assembly files are converted >> into inline assembly. Might be tricky in some cases, but I see no >> show-stopper yet. > > This is purely cosmetic and I am generally against such changes. See, the current code structure is not optimal /wrt understandability. KQEMU is a complex topic, no question. But this doesn't mean the structuring need to be that complex as well. Everything that helps to make things straighter, quicker to overview, can also help third parties to analyze KQEMU, debug potential issues, or even enhance its feature set. >> The give it a tiny start, I will look if I can unify the build process >> for all "true" kernel components. That is what currently breaks the >> debugability of the driver frame (up to kernel2monitor), and which also >> causes a kbuild warning. Likely harmless ATM, but it is fragile on >> long-term. > > For true kernel components I agree it is useful. > > Regarding the kqemu evolution, I am doing small API changes to make it > more independent from the QEMU internal data structures and to allow > usage from a 32 bit user QEMU application with a 64 bit host. There is > also another small change I did some time ago but never published to > allow paravirtualization of the Linux kernel. OK, thanks for the info. Just leaves me with the open questions about the planned license(s) and how/where KQEMU is going to be maintained in the future. I would really like to see it being driven as actively (and broadly) as the QEMU core - specifically as long as HW-virtualization is still not the rule on existing platforms :-/. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 12:29 ` Fabrice Bellard 2008-05-29 13:16 ` Jan Kiszka @ 2008-05-29 16:13 ` Jamie Lokier 2008-05-29 16:26 ` Paul Brook ` (2 more replies) 1 sibling, 3 replies; 31+ messages in thread From: Jamie Lokier @ 2008-05-29 16:13 UTC (permalink / raw) To: qemu-devel Fabrice Bellard wrote: > Regarding the kqemu evolution, I am doing small API changes to make it > more independent from the QEMU internal data structures and to allow > usage from a 32 bit user QEMU application with a 64 bit host. There is > also another small change I did some time ago but never published to > allow paravirtualization of the Linux kernel. Do you see integrating it with KVM at some point, developing a merged API which supports both hardware-assisted (kvm) or software-assisted (kqemu) depending on the host's CPU? Right now, although it's come from a different background, from a user's perspective kvm seems to do essentially the same as kqemu, except kvm is faster and kqemu runs on more x86 CPUs. I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I think that's their names). It would be great if it hard a third KQEMU sub-module (which would of course be the most complicated ;-) to make running vMs even more independent of the host CPU. That would require adding kqemu's software translation/scanning callbacks to kvm's API, or vice versa. But it would have the bonus of adding kvm's in-kernel fast APIC emulation to kqemu, possibly the paravirt and virtio stuff too, and further unifying kvm-using and kqemu-using systems, and combining developer attention from these different projects, which all seem to be in the same direction. As someone interested in emulator development I understand the different histories of kqemu and kvm. As a user, however, it seems logical at this point to begin seeing them as different ways of achieving the same thing, depending on the host CPU capabilities, and those things which should not depend on the host CPU - such as virtio, APIC emulation etc. - ought to share the same kernel code. -- Jamie ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:13 ` Jamie Lokier @ 2008-05-29 16:26 ` Paul Brook 2008-05-29 16:35 ` Jamie Lokier 2008-05-29 16:26 ` Anthony Liguori 2008-05-29 16:48 ` Jan Kiszka 2 siblings, 1 reply; 31+ messages in thread From: Paul Brook @ 2008-05-29 16:26 UTC (permalink / raw) To: qemu-devel > I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I > think that's their names). It would be great if it hard a third KQEMU > sub-module (which would of course be the most complicated ;-) I believe this is also a prerequisite for getting kqemu merged into maintream kernels, which IMHO is the only sane goal to have. Out of tree kernel modules simply aren't worth the effort. Paul ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:26 ` Paul Brook @ 2008-05-29 16:35 ` Jamie Lokier 2008-05-29 17:43 ` Anthony Liguori 0 siblings, 1 reply; 31+ messages in thread From: Jamie Lokier @ 2008-05-29 16:35 UTC (permalink / raw) To: Paul Brook; +Cc: qemu-devel Paul Brook wrote: > > I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I > > think that's their names). It would be great if it hard a third KQEMU > > sub-module (which would of course be the most complicated ;-) > > I believe this is also a prerequisite for getting kqemu merged into > maintream kernels, which IMHO is the only sane goal to have. Out of > tree kernel modules simply aren't worth the effort. I think there's utility in crossover between both of them too. Sometimes it would be nice to have the speed and directness of kvm, with the code scanning and replacement abilities of kqemu to block particular instructions, pretend to be a specific CPU model, or replace some hardware-accessing instruction sequences instead of trapping and emulating them - without the guest seeing the replacement. -- Jamie ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:35 ` Jamie Lokier @ 2008-05-29 17:43 ` Anthony Liguori 2008-05-29 21:46 ` Fabrice Bellard 0 siblings, 1 reply; 31+ messages in thread From: Anthony Liguori @ 2008-05-29 17:43 UTC (permalink / raw) To: qemu-devel; +Cc: Paul Brook Jamie Lokier wrote: > Paul Brook wrote: > >>> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I >>> think that's their names). It would be great if it hard a third KQEMU >>> sub-module (which would of course be the most complicated ;-) >>> >> I believe this is also a prerequisite for getting kqemu merged into >> maintream kernels, which IMHO is the only sane goal to have. Out of >> tree kernel modules simply aren't worth the effort. >> > > I think there's utility in crossover between both of them too. > There are some architectural incompatibilities. For instance, KVM support guest SMP but the code TCG generates does not ensure atomic operations are truly atomic. In general, it may not be possible to do this across architectures without employing the use of a big lock. Also, when you mix dynamic translation in userspace with direct execution, it implies you have to completely flush the shadow page table cache. This is going to severely impact performance so I don't know that there are a lot of circumstances where using TCG would improve performance. KVM already does some instruction patching FWIW. For instance, TPR accesses are modified in Windows guests to prevent a vmexit from occurring since Windows accesses the TPR so frequently. Regards, Anthony Liguori > Sometimes it would be nice to have the speed and directness of kvm, > with the code scanning and replacement abilities of kqemu to block > particular instructions, pretend to be a specific CPU model, or > replace some hardware-accessing instruction sequences instead of > trapping and emulating them - without the guest seeing the replacement. > > -- Jamie > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 17:43 ` Anthony Liguori @ 2008-05-29 21:46 ` Fabrice Bellard 2008-05-30 3:32 ` Mulyadi Santosa 0 siblings, 1 reply; 31+ messages in thread From: Fabrice Bellard @ 2008-05-29 21:46 UTC (permalink / raw) To: qemu-devel Anthony Liguori wrote: > Jamie Lokier wrote: >> Paul Brook wrote: >> >>>> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I >>>> think that's their names). It would be great if it hard a third KQEMU >>>> sub-module (which would of course be the most complicated ;-) >>>> >>> I believe this is also a prerequisite for getting kqemu merged into >>> maintream kernels, which IMHO is the only sane goal to have. Out of >>> tree kernel modules simply aren't worth the effort. >>> >> >> I think there's utility in crossover between both of them too. >> > > There are some architectural incompatibilities. For instance, KVM > support guest SMP but the code TCG generates does not ensure atomic > operations are truly atomic. In general, it may not be possible to do > this across architectures without employing the use of a big lock. But for the x86 on x86 case, it seems possible to make QEMU/TCG SMP safe (it would consist in using x86 lock instructions on the host when the guest uses them). > Also, when you mix dynamic translation in userspace with direct > execution, it implies you have to completely flush the shadow page table > cache. This is going to severely impact performance so I don't know > that there are a lot of circumstances where using TCG would improve > performance. > > KVM already does some instruction patching FWIW. For instance, TPR > accesses are modified in Windows guests to prevent a vmexit from > occurring since Windows accesses the TPR so frequently. Code patching seems interesting. Although I did not look in detail, it seems that VirtualBox use it extensively and gets very good performance without using hardware virtualization. The "beauty" of it is that the code patching hacks can stay outside the kernel module. I wonder what are their plan for their kernel module ! Anyway, I don't think it is worth trying to get kqemu into the Linux kernel. Moreover, I have no plan to change the kqemu interface to match the one of KVM. It seems simpler just to have a wrapper for both inside the user space QEMU. However, my upcoming changes for kqemu and QEMU will get the interface closer because kqemu will no longer peek into the QEMU physical to ram translation table. Fabrice. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 21:46 ` Fabrice Bellard @ 2008-05-30 3:32 ` Mulyadi Santosa 2008-05-30 8:14 ` Andreas Färber 0 siblings, 1 reply; 31+ messages in thread From: Mulyadi Santosa @ 2008-05-30 3:32 UTC (permalink / raw) To: qemu-devel Hi.. On Fri, May 30, 2008 at 4:46 AM, Fabrice Bellard <fabrice@bellard.org> wrote: > > Code patching seems interesting. Although I did not look in detail, it > seems that VirtualBox use it extensively and gets very good performance > without using hardware virtualization. I second that. Beside being Qemu users, I am also now a loyal user of VirtualBox. I guess that VBox can identify hot spot (repeating instructions or TB) and tries harder and harder to optimize it. It could be related to what I call "smart flush of translation cache"... not entirely flushing cached TB but selectively doing so. However, I also guess that VBox is tightly related to its kernel module, thus without it ...it might be slower than Qemu/TCG..but I have no hard data to support it. Now, I wonder how transitive does sparc to x86 translation while still maintaining speed? Does it do what linux-user does? regards, Mulyadi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-30 3:32 ` Mulyadi Santosa @ 2008-05-30 8:14 ` Andreas Färber 0 siblings, 0 replies; 31+ messages in thread From: Andreas Färber @ 2008-05-30 8:14 UTC (permalink / raw) To: qemu-devel Hi, Am 30.05.2008 um 05:32 schrieb Mulyadi Santosa: > On Fri, May 30, 2008 at 4:46 AM, Fabrice Bellard > <fabrice@bellard.org> wrote: >> >> Code patching seems interesting. Although I did not look in detail, >> it >> seems that VirtualBox use it extensively and gets very good >> performance >> without using hardware virtualization. > > I second that. Beside being Qemu users, I am also now a loyal user of > VirtualBox. I guess that VBox can identify hot spot (repeating > instructions or TB) and tries harder and harder to optimize it. It > could be related to what I call "smart flush of translation cache"... > not entirely flushing cached TB but selectively doing so. > > However, I also guess that VBox is tightly related to its kernel > module, thus without it ...it might be slower than Qemu/TCG..but I > have no hard data to support it. I've tried VirtualBox on a Core Duo Mac and despite its kernel module, in my perception, it is significantly slower than Q, which does not have any hypervisor. The responsiveness to moving the mouse around is better though in VirtualBox. Andreas ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:13 ` Jamie Lokier 2008-05-29 16:26 ` Paul Brook @ 2008-05-29 16:26 ` Anthony Liguori 2008-05-29 16:53 ` Jan Kiszka 2008-05-29 21:52 ` Fabrice Bellard 2008-05-29 16:48 ` Jan Kiszka 2 siblings, 2 replies; 31+ messages in thread From: Anthony Liguori @ 2008-05-29 16:26 UTC (permalink / raw) To: qemu-devel Jamie Lokier wrote: > Fabrice Bellard wrote: > >> Regarding the kqemu evolution, I am doing small API changes to make it >> more independent from the QEMU internal data structures and to allow >> usage from a 32 bit user QEMU application with a 64 bit host. There is >> also another small change I did some time ago but never published to >> allow paravirtualization of the Linux kernel. >> > > Do you see integrating it with KVM at some point, developing a merged > API which supports both hardware-assisted (kvm) or software-assisted > (kqemu) depending on the host's CPU? > > Right now, although it's come from a different background, from a > user's perspective kvm seems to do essentially the same as kqemu, > except kvm is faster and kqemu runs on more x86 CPUs. > > I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I > think that's their names). It would be great if it hard a third KQEMU > sub-module (which would of course be the most complicated ;-) to make > running vMs even more independent of the host CPU. > It wouldn't be too bad if you focused on kqemu-user and limited yourself to UP guests. The first step would be getting the existing KVM support code to function with TCG. For instance, use TCG to run 16-bit code, and then KVM to run 32/64-bit code. Once that was all worked out, the rest would be pretty straight-forward porting and code cleanup. > That would require adding kqemu's software translation/scanning > callbacks to kvm's API, or vice versa. But it would have the bonus of > adding kvm's in-kernel fast APIC emulation to kqemu, possibly the > paravirt and virtio stuff too, and further unifying kvm-using and > kqemu-using systems, and combining developer attention from these > different projects, which all seem to be in the same direction. > There's nothing stopping virtio from being used by QEMU + kqemu except for my slowness in improving the code such that it performs well and is acceptable to QEMU. FWIW, the l1_phys_map table is a current hurdle in getting performance. When we use proper accessors to access the virtio_ring, we end up taking a significant performance hit (around 20% on iperf). I have some simple patches that implement a page_desc cache that cache the RAM regions in a linear array. That helps get most of it back. I'd really like to remove the l1_phys_map entirely and replace it with a sorted list of regions. I think this would have an overall performance improvement since its much more cache friendly. One thing keeping this from happening is the fact that the data structure is passed up to the kernel for kqemu. Eliminating that dependency would be a very good thing! Regards, Anthony Liguori > As someone interested in emulator development I understand the > different histories of kqemu and kvm. As a user, however, it seems > logical at this point to begin seeing them as different ways of > achieving the same thing, depending on the host CPU capabilities, and > those things which should not depend on the host CPU - such as virtio, > APIC emulation etc. - ought to share the same kernel code. > > -- Jamie > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:26 ` Anthony Liguori @ 2008-05-29 16:53 ` Jan Kiszka 2008-05-29 17:48 ` Anthony Liguori 2008-05-31 10:18 ` Avi Kivity 2008-05-29 21:52 ` Fabrice Bellard 1 sibling, 2 replies; 31+ messages in thread From: Jan Kiszka @ 2008-05-29 16:53 UTC (permalink / raw) To: qemu-devel Anthony Liguori wrote: > Jamie Lokier wrote: >> Fabrice Bellard wrote: >> >>> Regarding the kqemu evolution, I am doing small API changes to make >>> it more independent from the QEMU internal data structures and to >>> allow usage from a 32 bit user QEMU application with a 64 bit host. >>> There is also another small change I did some time ago but never >>> published to allow paravirtualization of the Linux kernel. >>> >> >> Do you see integrating it with KVM at some point, developing a merged >> API which supports both hardware-assisted (kvm) or software-assisted >> (kqemu) depending on the host's CPU? >> >> Right now, although it's come from a different background, from a >> user's perspective kvm seems to do essentially the same as kqemu, >> except kvm is faster and kqemu runs on more x86 CPUs. >> >> I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I >> think that's their names). It would be great if it hard a third KQEMU >> sub-module (which would of course be the most complicated ;-) to make >> running vMs even more independent of the host CPU. >> > > It wouldn't be too bad if you focused on kqemu-user and limited yourself > to UP guests. The first step would be getting the existing KVM support > code to function with TCG. For instance, use TCG to run 16-bit code, > and then KVM to run 32/64-bit code. Once that was all worked out, the > rest would be pretty straight-forward porting and code cleanup. I guess you mean real-mode code with 16-bit here. /me always wondered why it takes an in-kernel code interpreter for kvm to achieve this - at least as long as it runs via qemu. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:53 ` Jan Kiszka @ 2008-05-29 17:48 ` Anthony Liguori 2008-05-31 10:18 ` Avi Kivity 1 sibling, 0 replies; 31+ messages in thread From: Anthony Liguori @ 2008-05-29 17:48 UTC (permalink / raw) To: qemu-devel Jan Kiszka wrote: > Anthony Liguori wrote: > >> >> It wouldn't be too bad if you focused on kqemu-user and limited yourself >> to UP guests. The first step would be getting the existing KVM support >> code to function with TCG. For instance, use TCG to run 16-bit code, >> and then KVM to run 32/64-bit code. Once that was all worked out, the >> rest would be pretty straight-forward porting and code cleanup. >> > > I guess you mean real-mode code with 16-bit here. /me always wondered > why it takes an in-kernel code interpreter for kvm to achieve this - at > least as long as it runs via qemu. > We don't use an in-kernel interpreter, we use vm86 mode for 16-bit code. There is an in-kernel interpreter (x86_emulate) but that is used mostly for handling shadow page table faults. Regards, Anthony Liguori > Jan > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:53 ` Jan Kiszka 2008-05-29 17:48 ` Anthony Liguori @ 2008-05-31 10:18 ` Avi Kivity 2008-06-02 16:34 ` Jamie Lokier 1 sibling, 1 reply; 31+ messages in thread From: Avi Kivity @ 2008-05-31 10:18 UTC (permalink / raw) To: qemu-devel Jan Kiszka wrote: >> It wouldn't be too bad if you focused on kqemu-user and limited yourself >> to UP guests. The first step would be getting the existing KVM support >> code to function with TCG. For instance, use TCG to run 16-bit code, >> and then KVM to run 32/64-bit code. Once that was all worked out, the >> rest would be pretty straight-forward porting and code cleanup. >> > > I guess you mean real-mode code with 16-bit here. /me always wondered > why it takes an in-kernel code interpreter for kvm to achieve this - at > least as long as it runs via qemu. > kvm started out with qemu emulating 16-bit code (and before that, even 32-bit code; kvm only did 64-bit). The reason I don't like this approach is that it makes the interface complex and hard to understand, and makes kvm heavily tied into qemu. Some problems that arise from having qemu emulate code: - difficult to do smp properly - qemu needs to be able to inject mmio for in-kernel emulated devices - in-kernel devices (lapic, etc.) need to interact with guest code executing in userspace -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-31 10:18 ` Avi Kivity @ 2008-06-02 16:34 ` Jamie Lokier 0 siblings, 0 replies; 31+ messages in thread From: Jamie Lokier @ 2008-06-02 16:34 UTC (permalink / raw) To: qemu-devel Avi Kivity wrote: > kvm started out with qemu emulating 16-bit code (and before that, even > 32-bit code; kvm only did 64-bit). > > The reason I don't like this approach is that it makes the interface > complex and hard to understand, and makes kvm heavily tied into qemu. > > Some problems that arise from having qemu emulate code: > - difficult to do smp properly Now that atomic ops will be translated to atomic ops, and futex is translated to host futex, and I think this is solved. > - qemu needs to be able to inject mmio for in-kernel emulated devices > - in-kernel devices (lapic, etc.) need to interact with guest code > executing in userspace These two seem to apply equally if kqemu is made to work with in-kernel emulated devices, which seems useful for exactly the same reasons as kvm does. -- Jamie ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:26 ` Anthony Liguori 2008-05-29 16:53 ` Jan Kiszka @ 2008-05-29 21:52 ` Fabrice Bellard 2008-05-31 10:06 ` Avi Kivity 2008-06-01 22:58 ` Anthony Liguori 1 sibling, 2 replies; 31+ messages in thread From: Fabrice Bellard @ 2008-05-29 21:52 UTC (permalink / raw) To: qemu-devel Anthony Liguori wrote: > [...] > FWIW, the l1_phys_map table is a current hurdle in getting performance. > When we use proper accessors to access the virtio_ring, we end up taking > a significant performance hit (around 20% on iperf). I have some simple > patches that implement a page_desc cache that cache the RAM regions in a > linear array. That helps get most of it back. > > I'd really like to remove the l1_phys_map entirely and replace it with a > sorted list of regions. I think this would have an overall performance > improvement since its much more cache friendly. One thing keeping this > from happening is the fact that the data structure is passed up to the > kernel for kqemu. Eliminating that dependency would be a very good thing! If the l1_phys_map is a performance bottleneck it means that the internals of QEMU are not properly used. In QEMU/kqemu, it is not accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why KVM cannot use a similar system. Fabrice. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 21:52 ` Fabrice Bellard @ 2008-05-31 10:06 ` Avi Kivity 2008-06-01 22:58 ` Anthony Liguori 1 sibling, 0 replies; 31+ messages in thread From: Avi Kivity @ 2008-05-31 10:06 UTC (permalink / raw) To: qemu-devel, Fabrice Bellard Fabrice Bellard wrote: > Anthony Liguori wrote: > >> [...] >> FWIW, the l1_phys_map table is a current hurdle in getting performance. >> When we use proper accessors to access the virtio_ring, we end up taking >> a significant performance hit (around 20% on iperf). I have some simple >> patches that implement a page_desc cache that cache the RAM regions in a >> linear array. That helps get most of it back. >> >> I'd really like to remove the l1_phys_map entirely and replace it with a >> sorted list of regions. I think this would have an overall performance >> improvement since its much more cache friendly. One thing keeping this >> from happening is the fact that the data structure is passed up to the >> kernel for kqemu. Eliminating that dependency would be a very good thing! >> > > If the l1_phys_map is a performance bottleneck it means that the > internals of QEMU are not properly used. In QEMU/kqemu, it is not > accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why > KVM cannot use a similar system. > > In that case, replacing l1_phys_map by a region list is a good thing. l1_phys_map consumes a large amount of memory. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 21:52 ` Fabrice Bellard 2008-05-31 10:06 ` Avi Kivity @ 2008-06-01 22:58 ` Anthony Liguori 2008-06-02 9:02 ` Fabrice Bellard 1 sibling, 1 reply; 31+ messages in thread From: Anthony Liguori @ 2008-06-01 22:58 UTC (permalink / raw) To: qemu-devel Fabrice Bellard wrote: > Anthony Liguori wrote: > >> [...] >> FWIW, the l1_phys_map table is a current hurdle in getting performance. >> When we use proper accessors to access the virtio_ring, we end up taking >> a significant performance hit (around 20% on iperf). I have some simple >> patches that implement a page_desc cache that cache the RAM regions in a >> linear array. That helps get most of it back. >> >> I'd really like to remove the l1_phys_map entirely and replace it with a >> sorted list of regions. I think this would have an overall performance >> improvement since its much more cache friendly. One thing keeping this >> from happening is the fact that the data structure is passed up to the >> kernel for kqemu. Eliminating that dependency would be a very good thing! >> > > If the l1_phys_map is a performance bottleneck it means that the > internals of QEMU are not properly used. In QEMU/kqemu, it is not > accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why > KVM cannot use a similar system. > This is for device emulation. KVM doesn't use l1_phys_map() for things like shadow page table accesses. In the device emulation, we're currently using stl_phys() and friends. This goes through a full lookup in l1_phys_map. Looking at other devices, some use phys_ram_base + PA and stl_raw() which is broken but faster. A few places call cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). This is okay but it still requires at least one l1_phys_map lookup per operation in the device (packet receive, io notification, etc.). I don't think that's going to help much because in our fast paths, we're only doing 2 or 3 stl_phys() operations. At least on x86, there are very few regions of RAM. That makes it very easy to cache. A TLB style cache seems wrong to me because there are so few RAM regions. I don't see a better way to do this with the existing APIs. Regards, Anthony Liguori > Fabrice. > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-06-01 22:58 ` Anthony Liguori @ 2008-06-02 9:02 ` Fabrice Bellard 2008-06-02 13:25 ` Anthony Liguori 0 siblings, 1 reply; 31+ messages in thread From: Fabrice Bellard @ 2008-06-02 9:02 UTC (permalink / raw) To: qemu-devel Anthony Liguori wrote: > Fabrice Bellard wrote: >> Anthony Liguori wrote: >> >>> [...] >>> FWIW, the l1_phys_map table is a current hurdle in getting >>> performance. When we use proper accessors to access the virtio_ring, >>> we end up taking >>> a significant performance hit (around 20% on iperf). I have some simple >>> patches that implement a page_desc cache that cache the RAM regions in a >>> linear array. That helps get most of it back. >>> >>> I'd really like to remove the l1_phys_map entirely and replace it with a >>> sorted list of regions. I think this would have an overall performance >>> improvement since its much more cache friendly. One thing keeping this >>> from happening is the fact that the data structure is passed up to the >>> kernel for kqemu. Eliminating that dependency would be a very good >>> thing! >>> >> >> If the l1_phys_map is a performance bottleneck it means that the >> internals of QEMU are not properly used. In QEMU/kqemu, it is not >> accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why >> KVM cannot use a similar system. >> > > This is for device emulation. KVM doesn't use l1_phys_map() for things > like shadow page table accesses. > > In the device emulation, we're currently using stl_phys() and friends. > This goes through a full lookup in l1_phys_map. > > Looking at other devices, some use phys_ram_base + PA and stl_raw() > which is broken but faster. A few places call > cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). > This is okay but it still requires at least one l1_phys_map lookup per > operation in the device (packet receive, io notification, etc.). I > don't think that's going to help much because in our fast paths, we're > only doing 2 or 3 stl_phys() operations. > > At least on x86, there are very few regions of RAM. That makes it very > easy to cache. A TLB style cache seems wrong to me because there are so > few RAM regions. I don't see a better way to do this with the existing > APIs. I see your point. st/ldx_phys() were never optimized in fact. A first solution would be to use a cache similar to the TLBs. It has the advantage is being quite generic and fast. Another solution would be to compute a few intervals with are tested before the generic case. These intervals would correspond to the main RAM area and would be updated each time a new device region is registered. Does your remark implies that KVM switches back to the QEMU process for each I/O ? If so, the l1_phys_map access time should be negligible compared to the SVM-VMX/kernel/user context switch ! Fabrice. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-06-02 9:02 ` Fabrice Bellard @ 2008-06-02 13:25 ` Anthony Liguori 0 siblings, 0 replies; 31+ messages in thread From: Anthony Liguori @ 2008-06-02 13:25 UTC (permalink / raw) To: qemu-devel Fabrice Bellard wrote: >> This is for device emulation. KVM doesn't use l1_phys_map() for >> things like shadow page table accesses. >> >> In the device emulation, we're currently using stl_phys() and >> friends. This goes through a full lookup in l1_phys_map. >> >> Looking at other devices, some use phys_ram_base + PA and stl_raw() >> which is broken but faster. A few places call >> cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). >> This is okay but it still requires at least one l1_phys_map lookup >> per operation in the device (packet receive, io notification, etc.). >> I don't think that's going to help much because in our fast paths, >> we're only doing 2 or 3 stl_phys() operations. >> >> At least on x86, there are very few regions of RAM. That makes it >> very easy to cache. A TLB style cache seems wrong to me because >> there are so few RAM regions. I don't see a better way to do this >> with the existing APIs. > > I see your point. st/ldx_phys() were never optimized in fact. > > A first solution would be to use a cache similar to the TLBs. It has > the advantage is being quite generic and fast. Another solution would > be to compute a few intervals with are tested before the generic case. > These intervals would correspond to the main RAM area and would be > updated each time a new device region is registered. I currently have a patch that takes the later approach. > Does your remark implies that KVM switches back to the QEMU process > for each I/O ? If so, the l1_phys_map access time should be negligible > compared to the SVM-VMX/kernel/user context switch ! Most MMIO/PIO cause an exit to QEMU. We run the main loop in an dedicated thread though so packet delivery is handled without forcing a VCPU to exit. Regards, Anthony Liguori > Fabrice. > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:13 ` Jamie Lokier 2008-05-29 16:26 ` Paul Brook 2008-05-29 16:26 ` Anthony Liguori @ 2008-05-29 16:48 ` Jan Kiszka 2008-05-29 17:47 ` Anthony Liguori 2 siblings, 1 reply; 31+ messages in thread From: Jan Kiszka @ 2008-05-29 16:48 UTC (permalink / raw) To: qemu-devel Jamie Lokier wrote: > Fabrice Bellard wrote: >> Regarding the kqemu evolution, I am doing small API changes to make it >> more independent from the QEMU internal data structures and to allow >> usage from a 32 bit user QEMU application with a 64 bit host. There is >> also another small change I did some time ago but never published to >> allow paravirtualization of the Linux kernel. > > Do you see integrating it with KVM at some point, developing a merged > API which supports both hardware-assisted (kvm) or software-assisted > (kqemu) depending on the host's CPU? I had the same idea while initially looking closer at kqemu, but I didn't felt familiar enough with the code and its design requirements to suggest this. :) > > Right now, although it's come from a different background, from a > user's perspective kvm seems to do essentially the same as kqemu, > except kvm is faster and kqemu runs on more x86 CPUs. > > I.e. kvm has two sub-modules for Intel VT and AMD SVM extensions (I > think that's their names). It would be great if it hard a third KQEMU > sub-module (which would of course be the most complicated ;-) to make > running vMs even more independent of the host CPU. Well, already the same driver interface to userspace would be great. :-> > > That would require adding kqemu's software translation/scanning > callbacks to kvm's API, or vice versa. But it would have the bonus of > adding kvm's in-kernel fast APIC emulation to kqemu, possibly the > paravirt and virtio stuff too, and further unifying kvm-using and > kqemu-using systems, and combining developer attention from these > different projects, which all seem to be in the same direction. The most important thing, IMHO, this /could/ open the door from mainline integration of a software-based QEMU accelerator - surely the ultimate goal /wrt to maintainability and distribution (on Linux). > > As someone interested in emulator development I understand the > different histories of kqemu and kvm. As a user, however, it seems > logical at this point to begin seeing them as different ways of > achieving the same thing, depending on the host CPU capabilities, and > those things which should not depend on the host CPU - such as virtio, > APIC emulation etc. - ought to share the same kernel code. Virtio on x86 requires no special host-kernel support, IIRC. But, yeah, in-kernel irqchip (including APIC) is a further incentive to motivate such step. But - this is all nice on the drawing board. It just requires a reasonable balance between required effort (wouldn't be small, I guess) and future relevance. For x86, by AMD and Intel at least (not sure about VIA right now), you see hardware virtualization in every new processor. So kqemu becomes less and less relevant over the time. The thrilling question is: Is that period long enough to justify a kqemu / soft-kvm, and to push the result mainline? Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] Re: KQEMU code organization 2008-05-29 16:48 ` Jan Kiszka @ 2008-05-29 17:47 ` Anthony Liguori 0 siblings, 0 replies; 31+ messages in thread From: Anthony Liguori @ 2008-05-29 17:47 UTC (permalink / raw) To: qemu-devel Jan Kiszka wrote: > Jamie Lokier wrote: > > Virtio on x86 requires no special host-kernel support, IIRC. But, yeah, > in-kernel irqchip (including APIC) is a further incentive to motivate > such step. > > But - this is all nice on the drawing board. It just requires a > reasonable balance between required effort (wouldn't be small, I guess) > and future relevance. For x86, by AMD and Intel at least (not sure about > VIA right now), As of the recently announced Isaiah microarchitecture, VIA includes support for VT. Regards, Anthony Liguori > you see hardware virtualization in every new processor. > So kqemu becomes less and less relevant over the time. The thrilling > question is: Is that period long enough to justify a kqemu / soft-kvm, > and to push the result mainline? > > Jan > > ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-06-02 16:35 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-27 16:56 [Qemu-devel] KQEMU code organization Jan Kiszka 2008-05-27 17:20 ` Ben Taylor 2008-05-27 18:25 ` [Qemu-devel] " Jan Kiszka 2008-05-27 20:58 ` [Qemu-devel] " Fabrice Bellard 2008-05-27 21:40 ` [Qemu-devel] " Jan Kiszka 2008-05-27 22:11 ` [Qemu-devel] " Fabrice Bellard 2008-05-28 16:02 ` [Qemu-devel] " Jan Kiszka 2008-05-28 16:37 ` Fabrice Bellard 2008-05-28 16:55 ` Jan Kiszka 2008-05-28 18:34 ` Jan Kiszka 2008-05-29 12:29 ` Fabrice Bellard 2008-05-29 13:16 ` Jan Kiszka 2008-05-29 16:13 ` Jamie Lokier 2008-05-29 16:26 ` Paul Brook 2008-05-29 16:35 ` Jamie Lokier 2008-05-29 17:43 ` Anthony Liguori 2008-05-29 21:46 ` Fabrice Bellard 2008-05-30 3:32 ` Mulyadi Santosa 2008-05-30 8:14 ` Andreas Färber 2008-05-29 16:26 ` Anthony Liguori 2008-05-29 16:53 ` Jan Kiszka 2008-05-29 17:48 ` Anthony Liguori 2008-05-31 10:18 ` Avi Kivity 2008-06-02 16:34 ` Jamie Lokier 2008-05-29 21:52 ` Fabrice Bellard 2008-05-31 10:06 ` Avi Kivity 2008-06-01 22:58 ` Anthony Liguori 2008-06-02 9:02 ` Fabrice Bellard 2008-06-02 13:25 ` Anthony Liguori 2008-05-29 16:48 ` Jan Kiszka 2008-05-29 17:47 ` Anthony Liguori
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).