Where is the entry of hypercalls in kvm?

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Where is the entry of hypercalls in kvm?
@ 2010-06-26  1:06 Balachandar
  2010-06-30  8:17 ` Peter Teoh
  0 siblings, 1 reply; 11+ messages in thread
From: Balachandar @ 2010-06-26  1:06 UTC (permalink / raw)
  To: kernelnewbies, kvm

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

Hello,
 I am trying to understand the virtio mechanism in linux. I read that the
kick function will notify the host side about the newly published buffers. I
am looking especially at virtio_net.Once a packet is ready for transmission
the kick function is called
here<http://lxr.linux.no/#linux+v2.6.34/drivers/net/virtio_net.c#L588>.
>From here i traced the call and i think it goes to
this<http://lxr.linux.no/#linux+v2.6.34/drivers/virtio/virtio_pci.c#L185>.
>From here where does it go? Which code contains the backend driver of
virtio. Where is the code in the hypervisor which this kick will go to?
Thank you...

Thanks,
Bala

[-- Attachment #2: Type: text/html, Size: 680 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-26  1:06 Where is the entry of hypercalls in kvm? Balachandar
@ 2010-06-30  8:17 ` Peter Teoh
  2010-06-30  8:56   ` Alexander Graf
  2010-06-30 14:59   ` Balachandar
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Teoh @ 2010-06-30  8:17 UTC (permalink / raw)
  To: Balachandar; +Cc: kernelnewbies, kvm

Your questioned is answered here:

http://www.spinics.net/lists/kvm/msg37526.html

And check this paper out:

http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf

The general concept to remember is that QEMU and KVM just execute the
input as binary stream....it does not know what "functions" it is
executing...so the binary stream can be any OS (windows / Linux
etc)....QEMU just setup the basic block (call basic blocks
translation) mechanism, and then execute it block by block.   Each
block by definition is demarcated by a branch/jump etc.   Within the
block if there is any privilege instruction, (eg, write MSR registers,
load LDT registers etc), then a transition will be made from guest in
QEMU into KVM to update the VMCB/VMCS information.   (these terms are
from Intel/AMD manual).

I have not seen any IOCTL calls in QEMU, but I suspect ultimately it
should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
calls inside KVM, which can be found here:

arch/x86/kvm/

And the AMD specific virtualization is done in svm.c whereas that of
vmx.c is for Intel.

Copying the remark in vmx.c:

/*
 * The exit handlers return 1 if the exit was handled fully and guest execution
 * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
 * to be done to userspace and return 0.
 */
static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
        [EXIT_REASON_EXCEPTION_

And after reading the Intel manual, u will understand that "exit" here
actually refers to the special set of privilege intel instructions,
which upon being executed by the guest OS, will immediately caused and
VMEXIT condition, and these are handled by the above handler in
kvm.ko.

To know the entry point INTO the guest OS (ie, when the guest code
will first be run) first must understand that all these VMX operation
are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT).   Once inside
the VMRESUME state, there is no way for it to access any of the hosts
resources, only accessible after VMEXIT is triggered.

All key APIs are defined here (for Intel) (this is KVM specific, Xen
has another mechanism, :

static struct kvm_x86_ops vmx_x86_ops = {
        .cpu_has_kvm_support = cpu_has_kvm_support,
        .disabled_by_bios = vmx_disabled_by_bios,
        .hardware_setup = hardware_setup,
        .hardware_unsetup = hardware_unsetup,
...
        .run = vmx_vcpu_run,
        .handle_exit = vmx_handle_exit,
        .skip_emulated_instruction = skip_emulated_instruction,
        .set_interrupt_shadow = vmx_set_interrupt_shadow,

and vmx_vcpu_run() is the the answer to your question.....i supposed?

Perhaps another summary resource:

http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWAR05015_WinHEC05.ppt

As for virtio_net.....it is implemented in
drivers/net/virtio_net.c......not sure what is your question?

On Sat, Jun 26, 2010 at 9:06 AM, Balachandar <bala1486@gmail.com> wrote:
> Hello,
>  I am trying to understand the virtio mechanism in linux. I read that the
> kick function will notify the host side about the newly published buffers. I
> am looking especially at virtio_net.Once a packet is ready for transmission
> the kick function is called here. From here i traced the call and i think it
> goes to this. From here where does it go? Which code contains the backend
> driver of virtio. Where is the code in the hypervisor which this kick will
> go to? Thank you...
>
> Thanks,
> Bala
>
>

-- 
Regards,
Peter Teoh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30  8:17 ` Peter Teoh
@ 2010-06-30  8:56   ` Alexander Graf
  2010-06-30 15:10     ` Anthony Liguori
  2010-06-30 16:28     ` Peter Teoh
  2010-06-30 14:59   ` Balachandar
  1 sibling, 2 replies; 11+ messages in thread
From: Alexander Graf @ 2010-06-30  8:56 UTC (permalink / raw)
  To: Peter Teoh; +Cc: Balachandar, kernelnewbies, kvm


On 30.06.2010, at 10:17, Peter Teoh wrote:

> Your questioned is answered here:
> 
> http://www.spinics.net/lists/kvm/msg37526.html
> 
> And check this paper out:
> 
> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
> 
> The general concept to remember is that QEMU and KVM just execute the
> input as binary stream....it does not know what "functions" it is
> executing...so the binary stream can be any OS (windows / Linux
> etc)....QEMU just setup the basic block (call basic blocks
> translation) mechanism, and then execute it block by block.   Each
> block by definition is demarcated by a branch/jump etc.   Within the
> block if there is any privilege instruction, (eg, write MSR registers,
> load LDT registers etc), then a transition will be made from guest in
> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
> from Intel/AMD manual).

Eh, no.

There are two modes of operation:

1) TCG
2) KVM

In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode.

In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd.

> 
> I have not seen any IOCTL calls in QEMU,

See kvm*.c and target-xxx/kvm.c

> but I suspect ultimately it
> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
> calls inside KVM, which can be found here:
> 
> arch/x86/kvm/
> 
> And the AMD specific virtualization is done in svm.c whereas that of
> vmx.c is for Intel.
> 
> Copying the remark in vmx.c:
> 
> /*
> * The exit handlers return 1 if the exit was handled fully and guest execution
> * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
> * to be done to userspace and return 0.
> */
> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>        [EXIT_REASON_EXCEPTION_
> 
> And after reading the Intel manual, u will understand that "exit" here
> actually refers to the special set of privilege intel instructions,
> which upon being executed by the guest OS, will immediately caused and
> VMEXIT condition, and these are handled by the above handler in
> kvm.ko.

in kvm-xxx.ko for x86.

Also, please don't top post :)


Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30  8:56   ` Alexander Graf
@ 2010-06-30 15:10     ` Anthony Liguori
  2010-06-30 16:36       ` Peter Teoh
  2010-06-30 16:28     ` Peter Teoh
  1 sibling, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2010-06-30 15:10 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Peter Teoh, Balachandar, kernelnewbies, kvm

On 06/30/2010 03:56 AM, Alexander Graf wrote:
> On 30.06.2010, at 10:17, Peter Teoh wrote:
>
>    
>> Your questioned is answered here:
>>
>> http://www.spinics.net/lists/kvm/msg37526.html
>>
>> And check this paper out:
>>
>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>
>> The general concept to remember is that QEMU and KVM just execute the
>> input as binary stream....it does not know what "functions" it is
>> executing...so the binary stream can be any OS (windows / Linux
>> etc)....QEMU just setup the basic block (call basic blocks
>> translation) mechanism, and then execute it block by block.   Each
>> block by definition is demarcated by a branch/jump etc.   Within the
>> block if there is any privilege instruction, (eg, write MSR registers,
>> load LDT registers etc), then a transition will be made from guest in
>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>> from Intel/AMD manual).
>>      
> Eh, no.
>
> There are two modes of operation:
>
> 1) TCG
> 2) KVM
>
> In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode.
>
> In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu.

Only because there is a mini-x86 interpreter in the kernel.  That lets 
KVM expose an idealized interface to qemu that requires no instruction 
interpretation.

More to the point of the original question, virtio is typically 
implemented on top of an emulated PCI device.  The kick operation is 
implemented as a write to a PCI IO region that's mapped to PIO.  If you 
look at hw/virtio-pci.c, you'll see the entry points.

Regards,

Anthony Liguori

>   Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd.
>
>    
>> I have not seen any IOCTL calls in QEMU,
>>      
> See kvm*.c and target-xxx/kvm.c
>
>    
>> but I suspect ultimately it
>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
>> calls inside KVM, which can be found here:
>>
>> arch/x86/kvm/
>>
>> And the AMD specific virtualization is done in svm.c whereas that of
>> vmx.c is for Intel.
>>
>> Copying the remark in vmx.c:
>>
>> /*
>> * The exit handlers return 1 if the exit was handled fully and guest execution
>> * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
>> * to be done to userspace and return 0.
>> */
>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>         [EXIT_REASON_EXCEPTION_
>>
>> And after reading the Intel manual, u will understand that "exit" here
>> actually refers to the special set of privilege intel instructions,
>> which upon being executed by the guest OS, will immediately caused and
>> VMEXIT condition, and these are handled by the above handler in
>> kvm.ko.
>>      
> in kvm-xxx.ko for x86.
>
> Also, please don't top post :)
>
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30 15:10     ` Anthony Liguori
@ 2010-06-30 16:36       ` Peter Teoh
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Teoh @ 2010-06-30 16:36 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Alexander Graf, Balachandar, kernelnewbies, kvm

just to clarify further.

On Wed, Jun 30, 2010 at 11:10 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 06/30/2010 03:56 AM, Alexander Graf wrote:
>>
>> On 30.06.2010, at 10:17, Peter Teoh wrote:
>>
>>
>>>
>>> Your questioned is answered here:
>>>
>>> http://www.spinics.net/lists/kvm/msg37526.html
>>>
>>> And check this paper out:
>>>
>>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>>
>>> The general concept to remember is that QEMU and KVM just execute the
>>> input as binary stream....it does not know what "functions" it is
>>> executing...so the binary stream can be any OS (windows / Linux
>>> etc)....QEMU just setup the basic block (call basic blocks
>>> translation) mechanism, and then execute it block by block.   Each
>>> block by definition is demarcated by a branch/jump etc.   Within the
>>> block if there is any privilege instruction, (eg, write MSR registers,
>>> load LDT registers etc), then a transition will be made from guest in
>>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>>> from Intel/AMD manual).
>>>
>>
>> Eh, no.
>>
>> There are two modes of operation:
>>
>> 1) TCG
>> 2) KVM
>>
>> In mode 1, qemu goes through target-xxx/translate.c and converts the basic
>> blocks you were talking about above to native machine code on the host
>> system using tcg (see the tcg directory). No KVM is involved, everything
>> happens in user mode.
>>
>> In mode 2, qemu executes _everything_ by calling KVM. There is no guest
>> code interpreted, looked at or whatever in qemu.
>
> Only because there is a mini-x86 interpreter in the kernel.  That lets KVM
> expose an idealized interface to qemu that requires no instruction
> interpretation.
>

From the ioctl call in QEMU, the following will be called:

int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
        int r;
        sigset_t sigsaved;

and then subsequently:

                r = emulate_instruction(vcpu, vcpu->arch.mmio_fault_cr2, 0,
                                        EMULTYPE_NO_DECODE);

which will interpret the x86 bytecode, correct?   So does it
distinguish between ring0 vs ring3 insn?

> More to the point of the original question, virtio is typically implemented
> on top of an emulated PCI device.  The kick operation is implemented as a
> write to a PCI IO region that's mapped to PIO.  If you look at
> hw/virtio-pci.c, you'll see the entry points.
>
> Regards,
>
> Anthony Liguori
>
>>  Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN)
>> on its kvm vcpu fd.
>>
>>
>>>
>>> I have not seen any IOCTL calls in QEMU,
>>>
>>
>> See kvm*.c and target-xxx/kvm.c
>>
>>
>>>
>>> but I suspect ultimately it
>>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
>>> calls inside KVM, which can be found here:
>>>
>>> arch/x86/kvm/
>>>
>>> And the AMD specific virtualization is done in svm.c whereas that of
>>> vmx.c is for Intel.
>>>
>>> Copying the remark in vmx.c:
>>>
>>> /*
>>> * The exit handlers return 1 if the exit was handled fully and guest
>>> execution
>>> * may resume.  Otherwise they set the kvm_run parameter to indicate what
>>> needs
>>> * to be done to userspace and return 0.
>>> */
>>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>>        [EXIT_REASON_EXCEPTION_
>>>
>>> And after reading the Intel manual, u will understand that "exit" here
>>> actually refers to the special set of privilege intel instructions,
>>> which upon being executed by the guest OS, will immediately caused and
>>> VMEXIT condition, and these are handled by the above handler in
>>> kvm.ko.
>>>
>>
>> in kvm-xxx.ko for x86.
>>
>> Also, please don't top post :)
>>
>>
>> Alex
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>



-- 
Regards,
Peter Teoh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30  8:56   ` Alexander Graf
  2010-06-30 15:10     ` Anthony Liguori
@ 2010-06-30 16:28     ` Peter Teoh
  2010-06-30 16:32       ` Alexander Graf
  2010-06-30 16:34       ` Anthony Liguori
  1 sibling, 2 replies; 11+ messages in thread
From: Peter Teoh @ 2010-06-30 16:28 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Balachandar, kernelnewbies, kvm

Thank you Alex for the reply, very glad to know you!!!

On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 30.06.2010, at 10:17, Peter Teoh wrote:
>
>> Your questioned is answered here:
>>
>> http://www.spinics.net/lists/kvm/msg37526.html
>>
>> And check this paper out:
>>
>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>
>> The general concept to remember is that QEMU and KVM just execute the
>> input as binary stream....it does not know what "functions" it is
>> executing...so the binary stream can be any OS (windows / Linux
>> etc)....QEMU just setup the basic block (call basic blocks
>> translation) mechanism, and then execute it block by block.   Each
>> block by definition is demarcated by a branch/jump etc.   Within the
>> block if there is any privilege instruction, (eg, write MSR registers,
>> load LDT registers etc), then a transition will be made from guest in
>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>> from Intel/AMD manual).
>
> Eh, no.
>
> There are two modes of operation:
>
> 1) TCG
> 2) KVM
>

Now I am clear, it is translate-all.c vs kvm-all.c as the two main
file in QEMU.   Thanks for that!

> In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode.
>
> In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd.
>

Now I don't understand.....guest codes usually have two parts --> one
running in ring3, and another in ring0, so if we were running
everything in KVM, won't it posed a security risks?   as far as I
know, VMware use ring1 to run ALL the guest codes, and transition to
ring0 whenever privilege instructions is encountered.   so what is the
equivalent mechanism in qemu?   Key issue I am facing with here is
basically "privilege insn", -----> only these should be executing in
kvm module, which is running in ring0, and the rest is best to be at
lower level?

>>
>> I have not seen any IOCTL calls in QEMU,
>
> See kvm*.c and target-xxx/kvm.c
>
>> but I suspect ultimately it
>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
>> calls inside KVM, which can be found here:
>>
>> arch/x86/kvm/
>>
>> And the AMD specific virtualization is done in svm.c whereas that of
>> vmx.c is for Intel.
>>
>> Copying the remark in vmx.c:
>>
>> /*
>> * The exit handlers return 1 if the exit was handled fully and guest execution
>> * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
>> * to be done to userspace and return 0.
>> */
>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>        [EXIT_REASON_EXCEPTION_
>>
>> And after reading the Intel manual, u will understand that "exit" here
>> actually refers to the special set of privilege intel instructions,
>> which upon being executed by the guest OS, will immediately caused and
>> VMEXIT condition, and these are handled by the above handler in
>> kvm.ko.
>
> in kvm-xxx.ko for x86.
>
> Also, please don't top post :)
>
>
> Alex
>
>

Thanks again.

-- 
Regards,
Peter Teoh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30 16:28     ` Peter Teoh
@ 2010-06-30 16:32       ` Alexander Graf
  2010-06-30 16:34       ` Anthony Liguori
  1 sibling, 0 replies; 11+ messages in thread
From: Alexander Graf @ 2010-06-30 16:32 UTC (permalink / raw)
  To: Peter Teoh; +Cc: Balachandar, kernelnewbies, kvm


On 30.06.2010, at 18:28, Peter Teoh wrote:

> Thank you Alex for the reply, very glad to know you!!!
> 
> On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 30.06.2010, at 10:17, Peter Teoh wrote:
>> 
>>> Your questioned is answered here:
>>> 
>>> http://www.spinics.net/lists/kvm/msg37526.html
>>> 
>>> And check this paper out:
>>> 
>>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>> 
>>> The general concept to remember is that QEMU and KVM just execute the
>>> input as binary stream....it does not know what "functions" it is
>>> executing...so the binary stream can be any OS (windows / Linux
>>> etc)....QEMU just setup the basic block (call basic blocks
>>> translation) mechanism, and then execute it block by block.   Each
>>> block by definition is demarcated by a branch/jump etc.   Within the
>>> block if there is any privilege instruction, (eg, write MSR registers,
>>> load LDT registers etc), then a transition will be made from guest in
>>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>>> from Intel/AMD manual).
>> 
>> Eh, no.
>> 
>> There are two modes of operation:
>> 
>> 1) TCG
>> 2) KVM
>> 
> 
> Now I am clear, it is translate-all.c vs kvm-all.c as the two main
> file in QEMU.   Thanks for that!
> 
>> In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode.
>> 
>> In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd.
>> 
> 
> Now I don't understand.....guest codes usually have two parts --> one
> running in ring3, and another in ring0, so if we were running
> everything in KVM, won't it posed a security risks?   as far as I
> know, VMware use ring1 to run ALL the guest codes, and transition to
> ring0 whenever privilege instructions is encountered.   so what is the
> equivalent mechanism in qemu?   Key issue I am facing with here is
> basically "privilege insn", -----> only these should be executing in
> kvm module, which is running in ring0, and the rest is best to be at
> lower level?

Modern x86 CPUs give you a fake ring0 mode where privileged instructions can either be trapped or act on shadow CPU state that gets swapped with the host state.

See the description of the Secure Virtual Machine (AMD) or vt-x (Intel) frameworks in their respective CPU architecture manuals.


Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30 16:28     ` Peter Teoh
  2010-06-30 16:32       ` Alexander Graf
@ 2010-06-30 16:34       ` Anthony Liguori
  1 sibling, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2010-06-30 16:34 UTC (permalink / raw)
  To: Peter Teoh; +Cc: Alexander Graf, Balachandar, kernelnewbies, kvm

On 06/30/2010 11:28 AM, Peter Teoh wrote:
> Thank you Alex for the reply, very glad to know you!!!
>
> On Wed, Jun 30, 2010 at 4:56 PM, Alexander Graf<agraf@suse.de>  wrote:
>    
>> On 30.06.2010, at 10:17, Peter Teoh wrote:
>>
>>      
>>> Your questioned is answered here:
>>>
>>> http://www.spinics.net/lists/kvm/msg37526.html
>>>
>>> And check this paper out:
>>>
>>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>>
>>> The general concept to remember is that QEMU and KVM just execute the
>>> input as binary stream....it does not know what "functions" it is
>>> executing...so the binary stream can be any OS (windows / Linux
>>> etc)....QEMU just setup the basic block (call basic blocks
>>> translation) mechanism, and then execute it block by block.   Each
>>> block by definition is demarcated by a branch/jump etc.   Within the
>>> block if there is any privilege instruction, (eg, write MSR registers,
>>> load LDT registers etc), then a transition will be made from guest in
>>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>>> from Intel/AMD manual).
>>>        
>> Eh, no.
>>
>> There are two modes of operation:
>>
>> 1) TCG
>> 2) KVM
>>
>>      
> Now I am clear, it is translate-all.c vs kvm-all.c as the two main
> file in QEMU.   Thanks for that!
>
>    
>> In mode 1, qemu goes through target-xxx/translate.c and converts the basic blocks you were talking about above to native machine code on the host system using tcg (see the tcg directory). No KVM is involved, everything happens in user mode.
>>
>> In mode 2, qemu executes _everything_ by calling KVM. There is no guest code interpreted, looked at or whatever in qemu. Whenever the guest CPU runs, it runs because qemu called ioctrl(VCPU_RUN) on its kvm vcpu fd.
>>
>>      
> Now I don't understand.....guest codes usually have two parts -->  one
> running in ring3, and another in ring0, so if we were running
> everything in KVM, won't it posed a security risks?   as far as I
> know, VMware use ring1 to run ALL the guest codes, and transition to
> ring0 whenever privilege instructions is encountered.

This is not quite accurate anymore.

VT and SVM introduce what's often called compressed ring 0 mode.  In 
this new mode, you can execute code in ring 0, 1, 2, or 3 but trap any 
operations that are potentially sensitive (like IO operations).  The act 
of trapping these events results in a transition from compressed ring 0 
to normal ring 0.  This transition is called a vmexit.

KVM enables compressed ring 0 mode and runs all guest code in that 
mode.  It directly handles all vmexits and decides to pass a subset of 
those exits down to qemu for further handling.  Typically, this subset 
includes anything that requires device emulation.

Regards,

Anthony Liguori

>     so what is the
> equivalent mechanism in qemu?   Key issue I am facing with here is
> basically "privilege insn", ----->  only these should be executing in
> kvm module, which is running in ring0, and the rest is best to be at
> lower level?
>
>    
>>> I have not seen any IOCTL calls in QEMU,
>>>        
>> See kvm*.c and target-xxx/kvm.c
>>
>>      
>>> but I suspect ultimately it
>>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
>>> calls inside KVM, which can be found here:
>>>
>>> arch/x86/kvm/
>>>
>>> And the AMD specific virtualization is done in svm.c whereas that of
>>> vmx.c is for Intel.
>>>
>>> Copying the remark in vmx.c:
>>>
>>> /*
>>> * The exit handlers return 1 if the exit was handled fully and guest execution
>>> * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
>>> * to be done to userspace and return 0.
>>> */
>>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>>         [EXIT_REASON_EXCEPTION_
>>>
>>> And after reading the Intel manual, u will understand that "exit" here
>>> actually refers to the special set of privilege intel instructions,
>>> which upon being executed by the guest OS, will immediately caused and
>>> VMEXIT condition, and these are handled by the above handler in
>>> kvm.ko.
>>>        
>> in kvm-xxx.ko for x86.
>>
>> Also, please don't top post :)
>>
>>
>> Alex
>>
>>
>>      
> Thanks again.
>
>    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30  8:17 ` Peter Teoh
  2010-06-30  8:56   ` Alexander Graf
@ 2010-06-30 14:59   ` Balachandar
  2010-06-30 15:02     ` Balachandar
  1 sibling, 1 reply; 11+ messages in thread
From: Balachandar @ 2010-06-30 14:59 UTC (permalink / raw)
  To: Peter Teoh; +Cc: kernelnewbies, kvm

On Wed, Jun 30, 2010 at 4:17 AM, Peter Teoh <htmldeveloper@gmail.com> wrote:
> Your questioned is answered here:
>
> http://www.spinics.net/lists/kvm/msg37526.html
>
> And check this paper out:
>
> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>
> The general concept to remember is that QEMU and KVM just execute the
> input as binary stream....it does not know what "functions" it is
> executing...so the binary stream can be any OS (windows / Linux
> etc)....QEMU just setup the basic block (call basic blocks
> translation) mechanism, and then execute it block by block.   Each
> block by definition is demarcated by a branch/jump etc.   Within the
> block if there is any privilege instruction, (eg, write MSR registers,
> load LDT registers etc), then a transition will be made from guest in
> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
> from Intel/AMD manual).
>
> I have not seen any IOCTL calls in QEMU, but I suspect ultimately it
> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
> calls inside KVM, which can be found here:
>
> arch/x86/kvm/
>
> And the AMD specific virtualization is done in svm.c whereas that of
> vmx.c is for Intel.
>
> Copying the remark in vmx.c:
>
> /*
>  * The exit handlers return 1 if the exit was handled fully and guest execution
>  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
>  * to be done to userspace and return 0.
>  */
> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>        [EXIT_REASON_EXCEPTION_
>
> And after reading the Intel manual, u will understand that "exit" here
> actually refers to the special set of privilege intel instructions,
> which upon being executed by the guest OS, will immediately caused and
> VMEXIT condition, and these are handled by the above handler in
> kvm.ko.
>
> To know the entry point INTO the guest OS (ie, when the guest code
> will first be run) first must understand that all these VMX operation
> are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT).   Once inside
> the VMRESUME state, there is no way for it to access any of the hosts
> resources, only accessible after VMEXIT is triggered.
>
> All key APIs are defined here (for Intel) (this is KVM specific, Xen
> has another mechanism, :
>
> static struct kvm_x86_ops vmx_x86_ops = {
>        .cpu_has_kvm_support = cpu_has_kvm_support,
>        .disabled_by_bios = vmx_disabled_by_bios,
>        .hardware_setup = hardware_setup,
>        .hardware_unsetup = hardware_unsetup,
> ...
>        .run = vmx_vcpu_run,
>        .handle_exit = vmx_handle_exit,
>        .skip_emulated_instruction = skip_emulated_instruction,
>        .set_interrupt_shadow = vmx_set_interrupt_shadow,
>
> and vmx_vcpu_run() is the the answer to your question.....i supposed?
>
> Perhaps another summary resource:
>
> http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWAR05015_WinHEC05.ppt
>
> As for virtio_net.....it is implemented in
> drivers/net/virtio_net.c......not sure what is your question?
>
Thank you for your elaborate answer. My question is what is the code
in qemu-kvm that is called when kick function is called in virtio_net?
The kick function does some ioport write and this will be trapped by
the hypervisor into kvm. Then kvm will call some function in qemu-kvm
userspace for io emulation. So for this particular case virtio_net
what is the function in qemu-kvm that will be called when kick is
encountered in the guest?

Thanks,
Bala

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30 14:59   ` Balachandar
@ 2010-06-30 15:02     ` Balachandar
  2010-06-30 15:12       ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Balachandar @ 2010-06-30 15:02 UTC (permalink / raw)
  To: Peter Teoh; +Cc: kernelnewbies, kvm

On Wed, Jun 30, 2010 at 10:59 AM, Balachandar <bala1486@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 4:17 AM, Peter Teoh <htmldeveloper@gmail.com> wrote:
>> Your questioned is answered here:
>>
>> http://www.spinics.net/lists/kvm/msg37526.html
>>
>> And check this paper out:
>>
>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>
>> The general concept to remember is that QEMU and KVM just execute the
>> input as binary stream....it does not know what "functions" it is
>> executing...so the binary stream can be any OS (windows / Linux
>> etc)....QEMU just setup the basic block (call basic blocks
>> translation) mechanism, and then execute it block by block.   Each
>> block by definition is demarcated by a branch/jump etc.   Within the
>> block if there is any privilege instruction, (eg, write MSR registers,
>> load LDT registers etc), then a transition will be made from guest in
>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>> from Intel/AMD manual).
>>
>> I have not seen any IOCTL calls in QEMU, but I suspect ultimately it
>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
>> calls inside KVM, which can be found here:
>>
>> arch/x86/kvm/
>>
>> And the AMD specific virtualization is done in svm.c whereas that of
>> vmx.c is for Intel.
>>
>> Copying the remark in vmx.c:
>>
>> /*
>>  * The exit handlers return 1 if the exit was handled fully and guest execution
>>  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
>>  * to be done to userspace and return 0.
>>  */
>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>        [EXIT_REASON_EXCEPTION_
>>
>> And after reading the Intel manual, u will understand that "exit" here
>> actually refers to the special set of privilege intel instructions,
>> which upon being executed by the guest OS, will immediately caused and
>> VMEXIT condition, and these are handled by the above handler in
>> kvm.ko.
>>
>> To know the entry point INTO the guest OS (ie, when the guest code
>> will first be run) first must understand that all these VMX operation
>> are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT).   Once inside
>> the VMRESUME state, there is no way for it to access any of the hosts
>> resources, only accessible after VMEXIT is triggered.
>>
>> All key APIs are defined here (for Intel) (this is KVM specific, Xen
>> has another mechanism, :
>>
>> static struct kvm_x86_ops vmx_x86_ops = {
>>        .cpu_has_kvm_support = cpu_has_kvm_support,
>>        .disabled_by_bios = vmx_disabled_by_bios,
>>        .hardware_setup = hardware_setup,
>>        .hardware_unsetup = hardware_unsetup,
>> ...
>>        .run = vmx_vcpu_run,
>>        .handle_exit = vmx_handle_exit,
>>        .skip_emulated_instruction = skip_emulated_instruction,
>>        .set_interrupt_shadow = vmx_set_interrupt_shadow,
>>
>> and vmx_vcpu_run() is the the answer to your question.....i supposed?
>>
>> Perhaps another summary resource:
>>
>> http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWAR05015_WinHEC05.ppt
>>
>> As for virtio_net.....it is implemented in
>> drivers/net/virtio_net.c......not sure what is your question?
>>
> Thank you for your elaborate answer. My question is what is the code
> in qemu-kvm that is called when kick function is called in virtio_net?
> The kick function does some ioport write and this will be trapped by
> the hypervisor into kvm. Then kvm will call some function in qemu-kvm
> userspace for io emulation. So for this particular case virtio_net
> what is the function in qemu-kvm that will be called when kick is
> encountered in the guest?
>
I already got the answer from Alexander. If anyone is looking the
function is virtio_net_write in hw/virtio_pci.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Where is the entry of hypercalls in kvm?
  2010-06-30 15:02     ` Balachandar
@ 2010-06-30 15:12       ` Anthony Liguori
  0 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2010-06-30 15:12 UTC (permalink / raw)
  To: Balachandar; +Cc: Peter Teoh, kernelnewbies, kvm

On 06/30/2010 10:02 AM, Balachandar wrote:
> On Wed, Jun 30, 2010 at 10:59 AM, Balachandar<bala1486@gmail.com>  wrote:
>    
>> On Wed, Jun 30, 2010 at 4:17 AM, Peter Teoh<htmldeveloper@gmail.com>  wrote:
>>      
>>> Your questioned is answered here:
>>>
>>> http://www.spinics.net/lists/kvm/msg37526.html
>>>
>>> And check this paper out:
>>>
>>> http://ozlabs.org/~rusty/virtio-spec/virtio-paper.pdf
>>>
>>> The general concept to remember is that QEMU and KVM just execute the
>>> input as binary stream....it does not know what "functions" it is
>>> executing...so the binary stream can be any OS (windows / Linux
>>> etc)....QEMU just setup the basic block (call basic blocks
>>> translation) mechanism, and then execute it block by block.   Each
>>> block by definition is demarcated by a branch/jump etc.   Within the
>>> block if there is any privilege instruction, (eg, write MSR registers,
>>> load LDT registers etc), then a transition will be made from guest in
>>> QEMU into KVM to update the VMCB/VMCS information.   (these terms are
>>> from Intel/AMD manual).
>>>
>>> I have not seen any IOCTL calls in QEMU, but I suspect ultimately it
>>> should drop to a VMRUN (for AMD, Intel called it VMLAUNCH or VMRESUME)
>>> calls inside KVM, which can be found here:
>>>
>>> arch/x86/kvm/
>>>
>>> And the AMD specific virtualization is done in svm.c whereas that of
>>> vmx.c is for Intel.
>>>
>>> Copying the remark in vmx.c:
>>>
>>> /*
>>>   * The exit handlers return 1 if the exit was handled fully and guest execution
>>>   * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
>>>   * to be done to userspace and return 0.
>>>   */
>>> static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>>>         [EXIT_REASON_EXCEPTION_
>>>
>>> And after reading the Intel manual, u will understand that "exit" here
>>> actually refers to the special set of privilege intel instructions,
>>> which upon being executed by the guest OS, will immediately caused and
>>> VMEXIT condition, and these are handled by the above handler in
>>> kvm.ko.
>>>
>>> To know the entry point INTO the guest OS (ie, when the guest code
>>> will first be run) first must understand that all these VMX operation
>>> are a state machine (3, VMLAUNCH, VMRESUME and VMEXIT).   Once inside
>>> the VMRESUME state, there is no way for it to access any of the hosts
>>> resources, only accessible after VMEXIT is triggered.
>>>
>>> All key APIs are defined here (for Intel) (this is KVM specific, Xen
>>> has another mechanism, :
>>>
>>> static struct kvm_x86_ops vmx_x86_ops = {
>>>         .cpu_has_kvm_support = cpu_has_kvm_support,
>>>         .disabled_by_bios = vmx_disabled_by_bios,
>>>         .hardware_setup = hardware_setup,
>>>         .hardware_unsetup = hardware_unsetup,
>>> ...
>>>         .run = vmx_vcpu_run,
>>>         .handle_exit = vmx_handle_exit,
>>>         .skip_emulated_instruction = skip_emulated_instruction,
>>>         .set_interrupt_shadow = vmx_set_interrupt_shadow,
>>>
>>> and vmx_vcpu_run() is the the answer to your question.....i supposed?
>>>
>>> Perhaps another summary resource:
>>>
>>> http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWAR05015_WinHEC05.ppt
>>>
>>> As for virtio_net.....it is implemented in
>>> drivers/net/virtio_net.c......not sure what is your question?
>>>
>>>        
>> Thank you for your elaborate answer. My question is what is the code
>> in qemu-kvm that is called when kick function is called in virtio_net?
>> The kick function does some ioport write and this will be trapped by
>> the hypervisor into kvm. Then kvm will call some function in qemu-kvm
>> userspace for io emulation. So for this particular case virtio_net
>> what is the function in qemu-kvm that will be called when kick is
>> encountered in the guest?
>>
>>      
> I already got the answer from Alexander. If anyone is looking the
> function is virtio_net_write in hw/virtio_pci.c
>    

virtio_ioport_write() in hw/virtio_pci.c.  It eventually goes to 
virtio_net_handle_tx, virtio_net_handle_rx, or virtio_net_handle_ctrl 
depending on which queue is being notified.

Regards,

Anthony Liguori

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-06-30 16:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-26  1:06 Where is the entry of hypercalls in kvm? Balachandar
2010-06-30  8:17 ` Peter Teoh
2010-06-30  8:56   ` Alexander Graf
2010-06-30 15:10     ` Anthony Liguori
2010-06-30 16:36       ` Peter Teoh
2010-06-30 16:28     ` Peter Teoh
2010-06-30 16:32       ` Alexander Graf
2010-06-30 16:34       ` Anthony Liguori
2010-06-30 14:59   ` Balachandar
2010-06-30 15:02     ` Balachandar
2010-06-30 15:12       ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox