From mboxrd@z Thu Jan 1 00:00:00 1970 From: Balachandar Subject: Where is the entry of hypercalls in kvm? Date: Fri, 25 Jun 2010 21:06:43 -0400 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=00c09f9db04f3763720489e480d5 To: kernelnewbies@nl.linux.org, kvm@vger.kernel.org Return-path: Sender: kernelnewbies-bounce@nl.linux.org Errors-to: kernelnewbies-bounce@nl.linux.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-subscribe: List-owner: List-post: List-archive: List-Id: kvm.vger.kernel.org --00c09f9db04f3763720489e480d5 Content-Type: text/plain; charset=ISO-8859-1 Hello, I am trying to understand the virtio mechanism in linux. I read that the kick function will notify the host side about the newly published buffers. I am looking especially at virtio_net.Once a packet is ready for transmission the kick function is called here. >>From here i traced the call and i think it goes to this. >>From here where does it go? Which code contains the backend driver of virtio. Where is the code in the hypervisor which this kick will go to? Thank you... Thanks, Bala --00c09f9db04f3763720489e480d5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello,
=A0I am trying to understand the virtio mechanism in linux. I re= ad that the kick function will notify the host side about the newly publish= ed buffers. I am looking especially at virtio_net.Once a packet is ready fo= r transmission the kick function is called here. From here i traced t= he call and i think it goes to this. From here where does it go? W= hich code contains the backend driver of virtio. Where is the code in the h= ypervisor which this kick will go to? Thank you...

Thanks,
Bala

--00c09f9db04f3763720489e480d5-- -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@nl.linux.org Please read the FAQ at http://kernelnewbies.org/FAQ From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Teoh Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 16:17:17 +0800 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Balachandar Return-path: Received: from mail-gw0-f46.google.com ([74.125.83.46]:49273 "EHLO mail-gw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752293Ab0F3IRT convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 04:17:19 -0400 Received: by gwb15 with SMTP id 15so227431gwb.19 for ; Wed, 30 Jun 2010 01:17:18 -0700 (PDT) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Your questioned is answered here: http://www.spinics.net/lists/kvm/msg37526.html And check this paper out: http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf The general concept to remember is that QEMU and KVM just execute the input as binary stream....it does not know what "functions" it is executing...so the binary stream can be any OS (windows / Linux etc)....QEMU just setup the basic block (call basic blocks translation) mechanism, and then execute it block by block. Each block by definition is demarcated by a branch/jump etc. Within the block if there is any privilege instruction, (eg, write MSR registers, load LDT registers etc), then a transition will be made from guest in QEMU into KVM to update the VMCB/VMCS information. (these terms are from Intel/AMD manual). I have not seen any IOCTL calls in QEMU, but I suspect ultimately it should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME) calls inside KVM, which can be found here: arch/x86/kvm/ And the AMD specific virtualization is done in svm.c whereas that of vmx.c is for Intel. Copying the remark in vmx.c: /* * The exit handlers return 1 if the exit was handled fully and guest e= xecution * may resume. Otherwise they set the kvm_run parameter to indicate wh= at needs * to be done to userspace and return 0. */ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) =3D { [EXIT_REASON_EXCEPTION_ And after reading the Intel manual, u will understand that "exit" here actually refers to the special set of privilege intel instructions, which upon being executed by the guest OS, will immediately caused and VMEXIT condition, and these are handled by the above handler in kvm.ko. To know the entry point INTO the guest OS (ie, when the guest code will first be run) first must understand that all these VMX operation are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT). Once inside the VMRESUME state, there is no way for it to access any of the hosts resources, only accessible after VMEXIT is triggered. All key APIs are defined here (for Intel) (this is KVM specific, Xen has another mechanism, : static struct kvm_x86_ops vmx_x86_ops =3D { .cpu_has_kvm_support =3D cpu_has_kvm_support, .disabled_by_bios =3D vmx_disabled_by_bios, .hardware_setup =3D hardware_setup, .hardware_unsetup =3D hardware_unsetup, =2E.. .run =3D vmx_vcpu_run, .handle_exit =3D vmx_handle_exit, .skip_emulated_instruction =3D skip_emulated_instruction, .set_interrupt_shadow =3D vmx_set_interrupt_shadow, and vmx_vcpu_run() is the the answer to your question.....i supposed? Perhaps another summary resource: http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-08= 8782200fe7/TWAR05015_WinHEC05.ppt As for virtio_net.....it is implemented in drivers/net/virtio_net.c......not sure what is your question? On Sat, Jun 26, 2010 at 9:06 AM, Balachandar wrote= : > Hello, > =A0I am trying to understand the virtio mechanism in linux. I read th= at the > kick function will notify the host side about the newly published buf= fers. I > am looking especially at virtio_net.Once a packet is ready for transm= ission > the kick function is called here. From here i traced the call and i t= hink it > goes to this. From here where does it go? Which code contains the bac= kend > driver of virtio. Where is the code in the hypervisor which this kick= will > go to? Thank you... > > Thanks, > Bala > > --=20 Regards, Peter Teoh From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 10:56:53 +0200 Message-ID: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> References: Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: Balachandar , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Peter Teoh Return-path: Received: from cantor2.suse.de ([195.135.220.15]:33517 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754201Ab0F3I44 convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 04:56:56 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 30.06.2010, at 10:17, Peter Teoh wrote: > Your questioned is answered here: > > http://www.spinics.net/lists/kvm/msg37526.html > > And check this paper out: > > http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf > > The general concept to remember is that QEMU and KVM just execute the > input as binary stream....it does not know what "functions" it is > executing...so the binary stream can be any OS (windows / Linux > etc)....QEMU just setup the basic block (call basic blocks > translation) mechanism, and then execute it block by block. Each > block by definition is demarcated by a branch/jump etc. Within the > block if there is any privilege instruction, (eg, write MSR registers, > load LDT registers etc), then a transition will be made from guest in > QEMU into KVM to update the VMCB/VMCS information. (these terms are > from Intel/AMD manual). Eh, no. There are two modes of operation: 1) TCG 2) KVM In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode. In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd. > > I have not seen any IOCTL calls in QEMU, See kvm*.c and target-xxx/kvm.c > but I suspect ultimately it > should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME) > calls inside KVM, which can be found here: > > arch/x86/kvm/ > > And the AMD specific virtualization is done in svm.c whereas that of > vmx.c is for Intel. > > Copying the remark in vmx.c: > > /* > * The exit handlers return 1 if the exit was handled fully and guest execution > * may resume. Otherwise they set the kvm_run parameter to indicate what needs > * to be done to userspace and return 0. > */ > static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { > [EXIT_REASON_EXCEPTION_ > > And after reading the Intel manual, u will understand that "exit" here > actually refers to the special set of privilege intel instructions, > which upon being executed by the guest OS, will immediately caused and > VMEXIT condition, and these are handled by the above handler in > kvm.ko. in kvm-xxx.ko for x86. Also, please don't top post :) Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Balachandar Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 10:59:44 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Peter Teoh Return-path: Received: from mail-qy0-f174.google.com ([209.85.216.174]:59121 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753691Ab0F3O7p convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 10:59:45 -0400 Received: by qyk38 with SMTP id 38so238998qyk.19 for ; Wed, 30 Jun 2010 07:59:44 -0700 (PDT) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Jun 30, 2010 at 4:17 AM, Peter Teoh w= rote: > Your questioned is answered here: > > http://www.spinics.net/lists/kvm/msg37526.html > > And check this paper out: > > http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf > > The general concept to remember is that QEMU and KVM just execute the > input as binary stream....it does not know what "functions" it is > executing...so the binary stream can be any OS (windows / Linux > etc)....QEMU just setup the basic block (call basic blocks > translation) mechanism, and then execute it block by block. =A0 Each > block by definition is demarcated by a branch/jump etc. =A0 Within th= e > block if there is any privilege instruction, (eg, write MSR registers= , > load LDT registers etc), then a transition will be made from guest in > QEMU into KVM to update the VMCB/VMCS information. =A0 (these terms a= re > from Intel/AMD manual). > > I have not seen any IOCTL calls in QEMU, but I suspect ultimately it > should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME= ) > calls inside KVM, which can be found here: > > arch/x86/kvm/ > > And the AMD specific virtualization is done in svm.c whereas that of > vmx.c is for Intel. > > Copying the remark in vmx.c: > > /* > =A0* The exit handlers return 1 if the exit was handled fully and gue= st execution > =A0* may resume. =A0Otherwise they set the kvm_run parameter to indic= ate what needs > =A0* to be done to userspace and return 0. > =A0*/ > static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) =3D { > =A0 =A0 =A0 =A0[EXIT_REASON_EXCEPTION_ > > And after reading the Intel manual, u will understand that "exit" her= e > actually refers to the special set of privilege intel instructions, > which upon being executed by the guest OS, will immediately caused an= d > VMEXIT condition, and these are handled by the above handler in > kvm.ko. > > To know the entry point INTO the guest OS (ie, when the guest code > will first be run) first must understand that all these VMX operation > are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT). =A0 Once insi= de > the VMRESUME state, there is no way for it to access any of the hosts > resources, only accessible after VMEXIT is triggered. > > All key APIs are defined here (for Intel) (this is KVM specific, Xen > has another mechanism, : > > static struct kvm_x86_ops vmx_x86_ops =3D { > =A0 =A0 =A0 =A0.cpu_has_kvm_support =3D cpu_has_kvm_support, > =A0 =A0 =A0 =A0.disabled_by_bios =3D vmx_disabled_by_bios, > =A0 =A0 =A0 =A0.hardware_setup =3D hardware_setup, > =A0 =A0 =A0 =A0.hardware_unsetup =3D hardware_unsetup, > ... > =A0 =A0 =A0 =A0.run =3D vmx_vcpu_run, > =A0 =A0 =A0 =A0.handle_exit =3D vmx_handle_exit, > =A0 =A0 =A0 =A0.skip_emulated_instruction =3D skip_emulated_instructi= on, > =A0 =A0 =A0 =A0.set_interrupt_shadow =3D vmx_set_interrupt_shadow, > > and vmx_vcpu_run() is the the answer to your question.....i supposed? > > Perhaps another summary resource: > > http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-= 088782200fe7/TWAR05015_WinHEC05.ppt > > As for virtio_net.....it is implemented in > drivers/net/virtio_net.c......not sure what is your question? > Thank you for your elaborate answer. My question is what is the code in qemu-kvm that is called when kick function is called in virtio_net? The kick function does some ioport write and this will be trapped by the hypervisor into kvm. Then kvm will call some function in qemu-kvm userspace for io emulation. So for this particular case virtio_net what is the function in qemu-kvm that will be called when kick is encountered in the guest? Thanks, Bala From mboxrd@z Thu Jan 1 00:00:00 1970 From: Balachandar Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 11:02:29 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Peter Teoh Return-path: Received: from mail-gy0-f174.google.com ([209.85.160.174]:57529 "EHLO mail-gy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753736Ab0F3PCb convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 11:02:31 -0400 Received: by gyd12 with SMTP id 12so448913gyd.19 for ; Wed, 30 Jun 2010 08:02:30 -0700 (PDT) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Jun 30, 2010 at 10:59 AM, Balachandar wrot= e: > On Wed, Jun 30, 2010 at 4:17 AM, Peter Teoh = wrote: >> Your questioned is answered here: >> >> http://www.spinics.net/lists/kvm/msg37526.html >> >> And check this paper out: >> >> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >> >> The general concept to remember is that QEMU and KVM just execute th= e >> input as binary stream....it does not know what "functions" it is >> executing...so the binary stream can be any OS (windows / Linux >> etc)....QEMU just setup the basic block (call basic blocks >> translation) mechanism, and then execute it block by block. =A0 Each >> block by definition is demarcated by a branch/jump etc. =A0 Within t= he >> block if there is any privilege instruction, (eg, write MSR register= s, >> load LDT registers etc), then a transition will be made from guest i= n >> QEMU into KVM to update the VMCB/VMCS information. =A0 (these terms = are >> from Intel/AMD manual). >> >> I have not seen any IOCTL calls in QEMU, but I suspect ultimately it >> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUM= E) >> calls inside KVM, which can be found here: >> >> arch/x86/kvm/ >> >> And the AMD specific virtualization is done in svm.c whereas that of >> vmx.c is for Intel. >> >> Copying the remark in vmx.c: >> >> /* >> =A0* The exit handlers return 1 if the exit was handled fully and gu= est execution >> =A0* may resume. =A0Otherwise they set the kvm_run parameter to indi= cate what needs >> =A0* to be done to userspace and return 0. >> =A0*/ >> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) =3D { >> =A0 =A0 =A0 =A0[EXIT_REASON_EXCEPTION_ >> >> And after reading the Intel manual, u will understand that "exit" he= re >> actually refers to the special set of privilege intel instructions, >> which upon being executed by the guest OS, will immediately caused a= nd >> VMEXIT condition, and these are handled by the above handler in >> kvm.ko. >> >> To know the entry point INTO the guest OS (ie, when the guest code >> will first be run) first must understand that all these VMX operatio= n >> are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT). =A0 Once ins= ide >> the VMRESUME state, there is no way for it to access any of the host= s >> resources, only accessible after VMEXIT is triggered. >> >> All key APIs are defined here (for Intel) (this is KVM specific, Xen >> has another mechanism, : >> >> static struct kvm_x86_ops vmx_x86_ops =3D { >> =A0 =A0 =A0 =A0.cpu_has_kvm_support =3D cpu_has_kvm_support, >> =A0 =A0 =A0 =A0.disabled_by_bios =3D vmx_disabled_by_bios, >> =A0 =A0 =A0 =A0.hardware_setup =3D hardware_setup, >> =A0 =A0 =A0 =A0.hardware_unsetup =3D hardware_unsetup, >> ... >> =A0 =A0 =A0 =A0.run =3D vmx_vcpu_run, >> =A0 =A0 =A0 =A0.handle_exit =3D vmx_handle_exit, >> =A0 =A0 =A0 =A0.skip_emulated_instruction =3D skip_emulated_instruct= ion, >> =A0 =A0 =A0 =A0.set_interrupt_shadow =3D vmx_set_interrupt_shadow, >> >> and vmx_vcpu_run() is the the answer to your question.....i supposed= ? >> >> Perhaps another summary resource: >> >> http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3= -088782200fe7/TWAR05015_WinHEC05.ppt >> >> As for virtio_net.....it is implemented in >> drivers/net/virtio_net.c......not sure what is your question? >> > Thank you for your elaborate answer. My question is what is the code > in qemu-kvm that is called when kick function is called in virtio_net= ? > The kick function does some ioport write and this will be trapped by > the hypervisor into kvm. Then kvm will call some function in qemu-kvm > userspace for io emulation. So for this particular case virtio_net > what is the function in qemu-kvm that will be called when kick is > encountered in the guest? > I already got the answer from Alexander. If anyone is looking the function is virtio_net_write in hw/virtio_pci.c From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 10:10:32 -0500 Message-ID: <4C2B5E68.9060102@codemonkey.ws> References: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Peter Teoh , Balachandar , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Alexander Graf Return-path: Received: from mail-iw0-f174.google.com ([209.85.214.174]:60668 "EHLO mail-iw0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752401Ab0F3PKX (ORCPT ); Wed, 30 Jun 2010 11:10:23 -0400 Received: by iwn7 with SMTP id 7so952981iwn.19 for ; Wed, 30 Jun 2010 08:10:22 -0700 (PDT) In-Reply-To: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> Sender: kvm-owner@vger.kernel.org List-ID: On 06/30/2010 03:56 AM, Alexander Graf wrote: > On 30.06.2010, at 10:17, Peter Teoh wrote: > > >> Your questioned is answered here: >> >> http://www.spinics.net/lists/kvm/msg37526.html >> >> And check this paper out: >> >> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >> >> The general concept to remember is that QEMU and KVM just execute the >> input as binary stream....it does not know what "functions" it is >> executing...so the binary stream can be any OS (windows / Linux >> etc)....QEMU just setup the basic block (call basic blocks >> translation) mechanism, and then execute it block by block. Each >> block by definition is demarcated by a branch/jump etc. Within the >> block if there is any privilege instruction, (eg, write MSR registers, >> load LDT registers etc), then a transition will be made from guest in >> QEMU into KVM to update the VMCB/VMCS information. (these terms are >> from Intel/AMD manual). >> > Eh, no. > > There are two modes of operation: > > 1) TCG > 2) KVM > > In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode. > > In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Only because there is a mini-x86 interpreter in the kernel. That lets KVM expose an idealized interface to qemu that requires no instruction interpretation. More to the point of the original question, virtio is typically implemented on top of an emulated PCI device. The kick operation is implemented as a write to a PCI IO region that's mapped to PIO. If you look at hw/virtio-pci.c, you'll see the entry points. Regards, Anthony Liguori > Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd. > > >> I have not seen any IOCTL calls in QEMU, >> > See kvm*.c and target-xxx/kvm.c > > >> but I suspect ultimately it >> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME) >> calls inside KVM, which can be found here: >> >> arch/x86/kvm/ >> >> And the AMD specific virtualization is done in svm.c whereas that of >> vmx.c is for Intel. >> >> Copying the remark in vmx.c: >> >> /* >> * The exit handlers return 1 if the exit was handled fully and guest execution >> * may resume. Otherwise they set the kvm_run parameter to indicate what needs >> * to be done to userspace and return 0. >> */ >> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { >> [EXIT_REASON_EXCEPTION_ >> >> And after reading the Intel manual, u will understand that "exit" here >> actually refers to the special set of privilege intel instructions, >> which upon being executed by the guest OS, will immediately caused and >> VMEXIT condition, and these are handled by the above handler in >> kvm.ko. >> > in kvm-xxx.ko for x86. > > Also, please don't top post :) > > > Alex > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 10:12:46 -0500 Message-ID: <4C2B5EEE.5090300@codemonkey.ws> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Peter Teoh , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Balachandar Return-path: Received: from mail-qy0-f174.google.com ([209.85.216.174]:50918 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751887Ab0F3PMe (ORCPT ); Wed, 30 Jun 2010 11:12:34 -0400 Received: by qyk38 with SMTP id 38so242160qyk.19 for ; Wed, 30 Jun 2010 08:12:34 -0700 (PDT) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 06/30/2010 10:02 AM, Balachandar wrote: > On Wed, Jun 30, 2010 at 10:59 AM, Balachandar wrote: > >> On Wed, Jun 30, 2010 at 4:17 AM, Peter Teoh wrote: >> >>> Your questioned is answered here: >>> >>> http://www.spinics.net/lists/kvm/msg37526.html >>> >>> And check this paper out: >>> >>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >>> >>> The general concept to remember is that QEMU and KVM just execute the >>> input as binary stream....it does not know what "functions" it is >>> executing...so the binary stream can be any OS (windows / Linux >>> etc)....QEMU just setup the basic block (call basic blocks >>> translation) mechanism, and then execute it block by block. Each >>> block by definition is demarcated by a branch/jump etc. Within the >>> block if there is any privilege instruction, (eg, write MSR registers, >>> load LDT registers etc), then a transition will be made from guest in >>> QEMU into KVM to update the VMCB/VMCS information. (these terms are >>> from Intel/AMD manual). >>> >>> I have not seen any IOCTL calls in QEMU, but I suspect ultimately it >>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME) >>> calls inside KVM, which can be found here: >>> >>> arch/x86/kvm/ >>> >>> And the AMD specific virtualization is done in svm.c whereas that of >>> vmx.c is for Intel. >>> >>> Copying the remark in vmx.c: >>> >>> /* >>> * The exit handlers return 1 if the exit was handled fully and guest execution >>> * may resume. Otherwise they set the kvm_run parameter to indicate what needs >>> * to be done to userspace and return 0. >>> */ >>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { >>> [EXIT_REASON_EXCEPTION_ >>> >>> And after reading the Intel manual, u will understand that "exit" here >>> actually refers to the special set of privilege intel instructions, >>> which upon being executed by the guest OS, will immediately caused and >>> VMEXIT condition, and these are handled by the above handler in >>> kvm.ko. >>> >>> To know the entry point INTO the guest OS (ie, when the guest code >>> will first be run) first must understand that all these VMX operation >>> are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT). Once inside >>> the VMRESUME state, there is no way for it to access any of the hosts >>> resources, only accessible after VMEXIT is triggered. >>> >>> All key APIs are defined here (for Intel) (this is KVM specific, Xen >>> has another mechanism, : >>> >>> static struct kvm_x86_ops vmx_x86_ops = { >>> .cpu_has_kvm_support = cpu_has_kvm_support, >>> .disabled_by_bios = vmx_disabled_by_bios, >>> .hardware_setup = hardware_setup, >>> .hardware_unsetup = hardware_unsetup, >>> ... >>> .run = vmx_vcpu_run, >>> .handle_exit = vmx_handle_exit, >>> .skip_emulated_instruction = skip_emulated_instruction, >>> .set_interrupt_shadow = vmx_set_interrupt_shadow, >>> >>> and vmx_vcpu_run() is the the answer to your question.....i supposed? >>> >>> Perhaps another summary resource: >>> >>> http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWAR05015_WinHEC05.ppt >>> >>> As for virtio_net.....it is implemented in >>> drivers/net/virtio_net.c......not sure what is your question? >>> >>> >> Thank you for your elaborate answer. My question is what is the code >> in qemu-kvm that is called when kick function is called in virtio_net? >> The kick function does some ioport write and this will be trapped by >> the hypervisor into kvm. Then kvm will call some function in qemu-kvm >> userspace for io emulation. So for this particular case virtio_net >> what is the function in qemu-kvm that will be called when kick is >> encountered in the guest? >> >> > I already got the answer from Alexander. If anyone is looking the > function is virtio_net_write in hw/virtio_pci.c > virtio_ioport_write() in hw/virtio_pci.c. It eventually goes to virtio_net_handle_tx, virtio_net_handle_rx, or virtio_net_handle_ctrl depending on which queue is being notified. Regards, Anthony Liguori > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Teoh Subject: Re: Where is the entry of hypercalls in kvm? Date: Thu, 1 Jul 2010 00:28:51 +0800 Message-ID: References: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Balachandar , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Alexander Graf Return-path: Received: from mail-qy0-f174.google.com ([209.85.216.174]:55789 "EHLO mail-qy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752348Ab0F3Q24 convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 12:28:56 -0400 Received: by qyk38 with SMTP id 38so260217qyk.19 for ; Wed, 30 Jun 2010 09:28:55 -0700 (PDT) In-Reply-To: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> Sender: kvm-owner@vger.kernel.org List-ID: Thank you Alex for the reply, very glad to know you!!! On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf wrote: > > On 30.06.2010, at 10:17, Peter Teoh wrote: > >> Your questioned is answered here: >> >> http://www.spinics.net/lists/kvm/msg37526.html >> >> And check this paper out: >> >> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >> >> The general concept to remember is that QEMU and KVM just execute th= e >> input as binary stream....it does not know what "functions" it is >> executing...so the binary stream can be any OS (windows / Linux >> etc)....QEMU just setup the basic block (call basic blocks >> translation) mechanism, and then execute it block by block. =A0 Each >> block by definition is demarcated by a branch/jump etc. =A0 Within t= he >> block if there is any privilege instruction, (eg, write MSR register= s, >> load LDT registers etc), then a transition will be made from guest i= n >> QEMU into KVM to update the VMCB/VMCS information. =A0 (these terms = are >> from Intel/AMD manual). > > Eh, no. > > There are two modes of operation: > > 1) TCG > 2) KVM > Now I am clear, it is translate-all.c vs kvm-all.c as the two main file in QEMU. Thanks for that! > In mode 1, qemu goes through target-xxx/translate.c and converts the = basic blocks you were talking about above to native machine code on the= host system using tcg (see the tcg directory). No KVM is involved, eve= rything happens in user mode. > > In mode 2, qemu executes _everything_ by calling KVM. There is no gue= st code interpreted, looked at or whatever in qemu. Whenever the guest = CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu = fd. > Now I don't understand.....guest codes usually have two parts --> one running in ring3, and another in ring0, so if we were running everything in KVM, won't it posed a security risks? as far as I know, VMware use ring1 to run ALL the guest codes, and transition to ring0 whenever privilege instructions is encountered. so what is the equivalent mechanism in qemu? Key issue I am facing with here is basically "privilege insn", -----> only these should be executing in kvm module, which is running in ring0, and the rest is best to be at lower level? >> >> I have not seen any IOCTL calls in QEMU, > > See kvm*.c and target-xxx/kvm.c > >> but I suspect ultimately it >> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUM= E) >> calls inside KVM, which can be found here: >> >> arch/x86/kvm/ >> >> And the AMD specific virtualization is done in svm.c whereas that of >> vmx.c is for Intel. >> >> Copying the remark in vmx.c: >> >> /* >> * The exit handlers return 1 if the exit was handled fully and guest= execution >> * may resume. =A0Otherwise they set the kvm_run parameter to indicat= e what needs >> * to be done to userspace and return 0. >> */ >> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) =3D { >> =A0 =A0 =A0 =A0[EXIT_REASON_EXCEPTION_ >> >> And after reading the Intel manual, u will understand that "exit" he= re >> actually refers to the special set of privilege intel instructions, >> which upon being executed by the guest OS, will immediately caused a= nd >> VMEXIT condition, and these are handled by the above handler in >> kvm.ko. > > in kvm-xxx.ko for x86. > > Also, please don't top post :) > > > Alex > > Thanks again. --=20 Regards, Peter Teoh From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 18:32:46 +0200 Message-ID: <5BE1AF91-943C-41F7-A242-8FE0AAE33B62@suse.de> References: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: Balachandar , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Peter Teoh Return-path: Received: from cantor.suse.de ([195.135.220.2]:40476 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754244Ab0F3Qct convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 12:32:49 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 30.06.2010, at 18:28, Peter Teoh wrote: > Thank you Alex for the reply, very glad to know you!!! > > On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf wrote: >> >> On 30.06.2010, at 10:17, Peter Teoh wrote: >> >>> Your questioned is answered here: >>> >>> http://www.spinics.net/lists/kvm/msg37526.html >>> >>> And check this paper out: >>> >>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >>> >>> The general concept to remember is that QEMU and KVM just execute the >>> input as binary stream....it does not know what "functions" it is >>> executing...so the binary stream can be any OS (windows / Linux >>> etc)....QEMU just setup the basic block (call basic blocks >>> translation) mechanism, and then execute it block by block. Each >>> block by definition is demarcated by a branch/jump etc. Within the >>> block if there is any privilege instruction, (eg, write MSR registers, >>> load LDT registers etc), then a transition will be made from guest in >>> QEMU into KVM to update the VMCB/VMCS information. (these terms are >>> from Intel/AMD manual). >> >> Eh, no. >> >> There are two modes of operation: >> >> 1) TCG >> 2) KVM >> > > Now I am clear, it is translate-all.c vs kvm-all.c as the two main > file in QEMU. Thanks for that! > >> In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode. >> >> In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd. >> > > Now I don't understand.....guest codes usually have two parts --> one > running in ring3, and another in ring0, so if we were running > everything in KVM, won't it posed a security risks? as far as I > know, VMware use ring1 to run ALL the guest codes, and transition to > ring0 whenever privilege instructions is encountered. so what is the > equivalent mechanism in qemu? Key issue I am facing with here is > basically "privilege insn", -----> only these should be executing in > kvm module, which is running in ring0, and the rest is best to be at > lower level? Modern x86 CPUs give you a fake ring0 mode where privileged instructions can either be trapped or act on shadow CPU state that gets swapped with the host state. See the description of the Secure Virtual Machine (AMD) or vt-x (Intel) frameworks in their respective CPU architecture manuals. Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Where is the entry of hypercalls in kvm? Date: Wed, 30 Jun 2010 11:34:19 -0500 Message-ID: <4C2B720B.6040801@codemonkey.ws> References: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Graf , Balachandar , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Peter Teoh Return-path: Received: from mail-gw0-f46.google.com ([74.125.83.46]:56679 "EHLO mail-gw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753592Ab0F3QeI (ORCPT ); Wed, 30 Jun 2010 12:34:08 -0400 Received: by gwb15 with SMTP id 15so518840gwb.19 for ; Wed, 30 Jun 2010 09:34:06 -0700 (PDT) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 06/30/2010 11:28 AM, Peter Teoh wrote: > Thank you Alex for the reply, very glad to know you!!! > > On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf wrote: > >> On 30.06.2010, at 10:17, Peter Teoh wrote: >> >> >>> Your questioned is answered here: >>> >>> http://www.spinics.net/lists/kvm/msg37526.html >>> >>> And check this paper out: >>> >>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >>> >>> The general concept to remember is that QEMU and KVM just execute the >>> input as binary stream....it does not know what "functions" it is >>> executing...so the binary stream can be any OS (windows / Linux >>> etc)....QEMU just setup the basic block (call basic blocks >>> translation) mechanism, and then execute it block by block. Each >>> block by definition is demarcated by a branch/jump etc. Within the >>> block if there is any privilege instruction, (eg, write MSR registers, >>> load LDT registers etc), then a transition will be made from guest in >>> QEMU into KVM to update the VMCB/VMCS information. (these terms are >>> from Intel/AMD manual). >>> >> Eh, no. >> >> There are two modes of operation: >> >> 1) TCG >> 2) KVM >> >> > Now I am clear, it is translate-all.c vs kvm-all.c as the two main > file in QEMU. Thanks for that! > > >> In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode. >> >> In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd. >> >> > Now I don't understand.....guest codes usually have two parts --> one > running in ring3, and another in ring0, so if we were running > everything in KVM, won't it posed a security risks? as far as I > know, VMware use ring1 to run ALL the guest codes, and transition to > ring0 whenever privilege instructions is encountered. This is not quite accurate anymore. VT and SVM introduce what's often called compressed ring 0 mode. In this new mode, you can execute code in ring 0, 1, 2, or 3 but trap any operations that are potentially sensitive (like IO operations). The act of trapping these events results in a transition from compressed ring 0 to normal ring 0. This transition is called a vmexit. KVM enables compressed ring 0 mode and runs all guest code in that mode. It directly handles all vmexits and decides to pass a subset of those exits down to qemu for further handling. Typically, this subset includes anything that requires device emulation. Regards, Anthony Liguori > so what is the > equivalent mechanism in qemu? Key issue I am facing with here is > basically "privilege insn", -----> only these should be executing in > kvm module, which is running in ring0, and the rest is best to be at > lower level? > > >>> I have not seen any IOCTL calls in QEMU, >>> >> See kvm*.c and target-xxx/kvm.c >> >> >>> but I suspect ultimately it >>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME) >>> calls inside KVM, which can be found here: >>> >>> arch/x86/kvm/ >>> >>> And the AMD specific virtualization is done in svm.c whereas that of >>> vmx.c is for Intel. >>> >>> Copying the remark in vmx.c: >>> >>> /* >>> * The exit handlers return 1 if the exit was handled fully and guest execution >>> * may resume. Otherwise they set the kvm_run parameter to indicate what needs >>> * to be done to userspace and return 0. >>> */ >>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { >>> [EXIT_REASON_EXCEPTION_ >>> >>> And after reading the Intel manual, u will understand that "exit" here >>> actually refers to the special set of privilege intel instructions, >>> which upon being executed by the guest OS, will immediately caused and >>> VMEXIT condition, and these are handled by the above handler in >>> kvm.ko. >>> >> in kvm-xxx.ko for x86. >> >> Also, please don't top post :) >> >> >> Alex >> >> >> > Thanks again. > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Teoh Subject: Re: Where is the entry of hypercalls in kvm? Date: Thu, 1 Jul 2010 00:36:25 +0800 Message-ID: References: <69EC205E-B74A-44CB-B532-FA6CF7589B3B@suse.de> <4C2B5E68.9060102@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexander Graf , Balachandar , kernelnewbies@nl.linux.org, kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from mail-qw0-f46.google.com ([209.85.216.46]:57375 "EHLO mail-qw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751582Ab0F3Qg1 convert rfc822-to-8bit (ORCPT ); Wed, 30 Jun 2010 12:36:27 -0400 Received: by qwi4 with SMTP id 4so330914qwi.19 for ; Wed, 30 Jun 2010 09:36:26 -0700 (PDT) In-Reply-To: <4C2B5E68.9060102@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: just to clarify further. On Wed, Jun 30, 2010 at 11:10 PM, Anthony Liguori wrote: > On 06/30/2010 03:56 AM, Alexander Graf wrote: >> >> On 30.06.2010, at 10:17, Peter Teoh wrote: >> >> >>> >>> Your questioned is answered here: >>> >>> http://www.spinics.net/lists/kvm/msg37526.html >>> >>> And check this paper out: >>> >>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf >>> >>> The general concept to remember is that QEMU and KVM just execute t= he >>> input as binary stream....it does not know what "functions" it is >>> executing...so the binary stream can be any OS (windows / Linux >>> etc)....QEMU just setup the basic block (call basic blocks >>> translation) mechanism, and then execute it block by block. =A0 Eac= h >>> block by definition is demarcated by a branch/jump etc. =A0 Within = the >>> block if there is any privilege instruction, (eg, write MSR registe= rs, >>> load LDT registers etc), then a transition will be made from guest = in >>> QEMU into KVM to update the VMCB/VMCS information. =A0 (these terms= are >>> from Intel/AMD manual). >>> >> >> Eh, no. >> >> There are two modes of operation: >> >> 1) TCG >> 2) KVM >> >> In mode 1, qemu goes through target-xxx/translate.c and converts the= basic >> blocks you were talking about above to native machine code on the ho= st >> system using tcg (see the tcg directory). No KVM is involved, everyt= hing >> happens in user mode. >> >> In mode 2, qemu executes _everything_ by calling KVM. There is no gu= est >> code interpreted, looked at or whatever in qemu. > > Only because there is a mini-x86 interpreter in the kernel. =A0That l= ets KVM > expose an idealized interface to qemu that requires no instruction > interpretation. > =46rom the ioctl call in QEMU, the following will be called: int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_= run) { int r; sigset_t sigsaved; and then subsequently: r =3D emulate_instruction(vcpu, vcpu->arch.mmio_fault_c= r2, 0, EMULTYPE_NO_DECODE); which will interpret the x86 bytecode, correct? So does it distinguish between ring0 vs ring3 insn? > More to the point of the original question, virtio is typically imple= mented > on top of an emulated PCI device. =A0The kick operation is implemente= d as a > write to a PCI IO region that's mapped to PIO. =A0If you look at > hw/virtio-pci.c, you'll see the entry points. > > Regards, > > Anthony Liguori > >> =A0Whenever the guest CPU runs, it runs because qemu called ioctrl(V= CPU_RUN) >> on its kvm vcpu fd. >> >> >>> >>> I have not seen any IOCTL calls in QEMU, >>> >> >> See kvm*.c and target-xxx/kvm.c >> >> >>> >>> but I suspect ultimately it >>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESU= ME) >>> calls inside KVM, which can be found here: >>> >>> arch/x86/kvm/ >>> >>> And the AMD specific virtualization is done in svm.c whereas that o= f >>> vmx.c is for Intel. >>> >>> Copying the remark in vmx.c: >>> >>> /* >>> * The exit handlers return 1 if the exit was handled fully and gues= t >>> execution >>> * may resume. =A0Otherwise they set the kvm_run parameter to indica= te what >>> needs >>> * to be done to userspace and return 0. >>> */ >>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) =3D { >>> =A0 =A0 =A0 =A0[EXIT_REASON_EXCEPTION_ >>> >>> And after reading the Intel manual, u will understand that "exit" h= ere >>> actually refers to the special set of privilege intel instructions, >>> which upon being executed by the guest OS, will immediately caused = and >>> VMEXIT condition, and these are handled by the above handler in >>> kvm.ko. >>> >> >> in kvm-xxx.ko for x86. >> >> Also, please don't top post :) >> >> >> Alex >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html >> > > --=20 Regards, Peter Teoh