tlb flush after each vm_exit, also virtual interrupts injection

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* tlb flush after each vm_exit, also virtual interrupts injection
@ 2016-07-27 10:19 charls chap
  2016-07-28  8:20 ` Fwd: " Charls D. Chap
  2016-07-28 13:25 ` Radim Krčmář
  0 siblings, 2 replies; 13+ messages in thread
From: charls chap @ 2016-07-27 10:19 UTC (permalink / raw)
  To: kvm

Hello List,

1)
I've seen some slides, back in 08, in which it is described
that the use of VPID, will solve the problem of TLB flush after each VM_EXIT.
But, i see from the code that it actually does a flush after a VM_EXIT.

Obviously, i am wrong. So I need some help,
Where to look, i mean which lines of code, in order to figure out, what is
happening with TLB flush and VM_EXITS

2) system call from ing 0 (non-root), to ring 0(root)
Could guest os, do a system call to host os?

3) what is the mechanism of virtual interrupt injection
What is the mechanism that is used for a virtual interrupt injection,
in full virtualization?

Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
to which point of guest? guest complete_bh?

4)
I've seen from bibliography, that KVM operates in protection ring -1.
What doe it mean? Is there HW implementation for that ring?

Why not in ring 0?

Looking forward for your help

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Fwd: tlb flush after each vm_exit, also virtual interrupts injection
  2016-07-27 10:19 tlb flush after each vm_exit, also virtual interrupts injection charls chap
@ 2016-07-28  8:20 ` Charls D. Chap
  2016-08-02 17:33   ` Paolo Bonzini
  2016-07-28 13:25 ` Radim Krčmář
  1 sibling, 1 reply; 13+ messages in thread
From: Charls D. Chap @ 2016-07-28  8:20 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini

---------- Forwarded message ----------
From: charls chap <chapcharls@gmail.com>
Date: Wed, Jul 27, 2016 at 1:19 PM
Subject: tlb flush after each vm_exit, also virtual interrupts injection
To: kvm@vger.kernel.org

Hello List,

1)
I've seen some slides, back in 08, in which it is described
that the use of VPID, will solve the problem of TLB flush after each VM_EXIT.
But, i see from the code that it actually does a flush after a VM_EXIT.

Obviously, i am wrong. So I need some help,
Where to look, i mean which lines of code, in order to figure out, what is
happening with TLB flush and VM_EXITS

2) system call from ing 0 (non-root), to ring 0(root)
Could guest os, do a system call to host os?

3) what is the mechanism of virtual interrupt injection
What is the mechanism that is used for a virtual interrupt injection,
in full virtualization?

Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
to which point of guest? guest complete_bh?

4)
I've seen from bibliography, that KVM operates in protection ring -1.
What doe it mean? Is there HW implementation for that ring?

Why not in ring 0?

Looking forward for your help

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-07-28  8:20 ` Fwd: " Charls D. Chap
@ 2016-08-02 17:33   ` Paolo Bonzini
  2016-08-03 14:43     ` Charls D. Chap
  2016-08-25  9:12     ` Wanpeng Li
  0 siblings, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-08-02 17:33 UTC (permalink / raw)
  To: Charls D. Chap; +Cc: kvm

> 1) I've seen some slides, back in 08, in which it is described
> that the use of VPID, will solve the problem of TLB flush after each VM_EXIT.
> But, i see from the code that it actually does a flush after a VM_EXIT.
> 
> Obviously, i am wrong. So I need some help,
> Where to look, i mean which lines of code, in order to figure out, what is
> happening with TLB flush and VM_EXITS

You are saying that you "see from the code that it actually does a flush
after a VM_EXIT".  Where is this?

> 2) system call from ing 0 (non-root), to ring 0(root)
> Could guest os, do a system call to host os?

No.  You'd need a program running on the host, and a channel between
this program and a guest (such as a socket or a serial port).

> 3) what is the mechanism of virtual interrupt injection
> What is the mechanism that is used for a virtual interrupt injection,
> in full virtualization?
> 
> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
> to which point of guest? guest complete_bh?

Interrupt injections happens through ioctls on the KVM file descriptors
(the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).

When the LAPIC is emulated by userspace (not the common case) this is
done with the KVM_INTERRUPT ioctl.  When the LAPIC is emulated in kernel,
there are various mechanisms.

ioctl                   when?                interrupt kind
------------------------------------------------------------------------
KVM_INTERRUPT           i8259 in userspace   EXTINT
KVM_SET_GSI_ROUTING     (always)             IOAPIC
KVM_SIGNAL_MSI          (always)             MSI
KVM_SET_GSI_ROUTING     (always)             MSI
KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING

After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order
to trigger the interrupt.  In QEMU this corresponds to qemu_irq_raise,
pci_set_irq or msi_notify.

After KVM_IRQFD, the host writes to an eventfd in order to trigger the
interrupt.  In QEMU this corresponds to event_notifier_set.

(For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS
because it's faster, but they provide the same functionality).

> 4)
> I've seen from bibliography, that KVM operates in protection ring -1.
> What doe it mean? Is there HW implementation for that ring?
> 
> Why not in ring 0?

Ring -1 is not a particularly good name.  The right name is that KVM
operates in VMX ring 0 root mode, while the guest operates in VMX
non-root mode (which can be any of ring 0-1-2-3 depending on the
current privilege level of the guest).

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-02 17:33   ` Paolo Bonzini
@ 2016-08-03 14:43     ` Charls D. Chap
  2016-08-03 15:56       ` Paolo Bonzini
  2016-08-25  9:12     ` Wanpeng Li
  1 sibling, 1 reply; 13+ messages in thread
From: Charls D. Chap @ 2016-08-03 14:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> 1) I've seen some slides, back in 08, in which it is described
>> that the use of VPID, will solve the problem of TLB flush after each VM_EXIT.
>> But, i see from the code that it actually does a flush after a VM_EXIT.
>>
>> Obviously, i am wrong. So I need some help,
>> Where to look, i mean which lines of code, in order to figure out, what is
>> happening with TLB flush and VM_EXITS
>
> You are saying that you "see from the code that it actually does a flush
> after a VM_EXIT".  Where is this?
>
>> 2) system call from ing 0 (non-root), to ring 0(root)
>> Could guest os, do a system call to host os?
>
> No.  You'd need a program running on the host, and a channel between
> this program and a guest (such as a socket or a serial port).
>
>> 3) what is the mechanism of virtual interrupt injection
>> What is the mechanism that is used for a virtual interrupt injection,
>> in full virtualization?
>>
>> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
>> to which point of guest? guest complete_bh?
>
> Interrupt injections happens through ioctls on the KVM file descriptors
> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).
>
> When the LAPIC is emulated by userspace (not the common case) this is
> done with the KVM_INTERRUPT ioctl.  When the LAPIC is emulated in kernel,
> there are various mechanisms.
>
> ioctl                   when?                interrupt kind
> ------------------------------------------------------------------------
> KVM_INTERRUPT           i8259 in userspace   EXTINT
> KVM_SET_GSI_ROUTING     (always)             IOAPIC
> KVM_SIGNAL_MSI          (always)             MSI
> KVM_SET_GSI_ROUTING     (always)             MSI
> KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>
> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order
> to trigger the interrupt.  In QEMU this corresponds to qemu_irq_raise,
> pci_set_irq or msi_notify.
>
What do you mean by "this corresponds",
There is an kvm_vcpu_ioctl from host kernel to guest?
or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest??

Why not call directly  vcpu_enter_guest(struct kvm_vcpu *vcpu)
avoiding the switch to QEMU?



So in the case of write I/O using virtio-blk dataplane=off
For the return I/O path: what qemu/host, host/qemu, and  qemu/guest are there?

The above ioctls go from host kvm to qemu, and the qemu
notifies guest? how?
ioctl(SET_GSI_ROUTING)
ioctl(KVM_IRQFD)

For the return path: What is going to happen after the host there is
the real I/O completion, the host complete bh is executed?  We go
through iothread to guest, in order to executte the
virtio-blk-complete request?

One last Question about vmentry and vmexit code, it seems to me that
vmentry and vm exit share the same asm block of code:
I can understand that in 8719 line, we switch to non-root guest mode
and the lines 8720 and below are not executed. Is this the vmentry?

And when a vmexit happens, the instructions from 8721 and below is the
vmexit part?
How did the context change?, i mean, which instruction, made the jump,
and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ")
\n\t"?

--------------------------------------

/* Enter guest mode */
8716                 "jne 1f \n\t"
8717                 __ex(ASM_VMX_VMLAUNCH) "\n\t"
8718                 "jmp 2f \n\t"
8719                 "1: " __ex(ASM_VMX_VMRESUME) "\n\t"
8720                 "2: "
8721                 /* Save guest registers, load host registers, keep flags */
8722                 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
8723                 "pop %0 \n\t"
8724                 "mov %%" _ASM_AX ", %c[rax](%0) \n\t"
8725                 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
8726                 __ASM_SIZE(pop) " %c[rcx](%0) \n\t"
8727                 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t"
8728                 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t"
8729                 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t"
8730                 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t"





> After KVM_IRQFD, the host writes to an eventfd in order to trigger the
> interrupt.  In QEMU this corresponds to event_notifier_set.
>
> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS
> because it's faster, but they provide the same functionality).
>
>> 4)
>> I've seen from bibliography, that KVM operates in protection ring -1.
>> What doe it mean? Is there HW implementation for that ring?
>>
>> Why not in ring 0?
>
> Ring -1 is not a particularly good name.  The right name is that KVM
> operates in VMX ring 0 root mode, while the guest operates in VMX
> non-root mode (which can be any of ring 0-1-2-3 depending on the
> current privilege level of the guest).
>
> Paolo

thanks
Charls

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-03 14:43     ` Charls D. Chap
@ 2016-08-03 15:56       ` Paolo Bonzini
  2016-08-05 11:29         ` Charls D. Chap
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-08-03 15:56 UTC (permalink / raw)
  To: Charls D. Chap; +Cc: kvm



On 03/08/2016 16:43, Charls D. Chap wrote:
> On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 3) what is the mechanism of virtual interrupt injection
>>> What is the mechanism that is used for a virtual interrupt injection,
>>> in full virtualization?
>>>
>>> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
>>> to which point of guest? guest complete_bh?
>>
>> Interrupt injections happens through ioctls on the KVM file descriptors
>> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).
>>
>> ioctl                   when?                interrupt kind
>> ------------------------------------------------------------------------
>> KVM_INTERRUPT           i8259 in userspace   EXTINT
>> KVM_SET_GSI_ROUTING     (always)             IOAPIC
>> KVM_SIGNAL_MSI          (always)             MSI
>> KVM_SET_GSI_ROUTING     (always)             MSI
>> KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>>
>> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
>> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order
>> to trigger the interrupt.  In QEMU this corresponds to qemu_irq_raise,
>> pci_set_irq or msi_notify.
>>
> What do you mean by "this corresponds",
> There is an kvm_vcpu_ioctl from host kernel to guest?
> or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest??

It's kvm_vcpu_ioctl or kvm_vm_ioctl, and it goes from host userspace to
host kernel (ioctl is a syscall).  The ioctl is invoked when QEMU
generates an interrupt with qemu_irq_raise (sometimes called directly,
sometimes through pci_set_irq) or msi_notify.

> Why not call directly  vcpu_enter_guest(struct kvm_vcpu *vcpu)
> avoiding the switch to QEMU?

Two reasons.  First, it's QEMU that wants to generate the interrupt.
The ioctl or eventfd is how KVM receives the signal.

For kernel event sources, those that are part of KVM such as i8254.c
generate the interrupt through kvm_set_irq.  But this is the exception,
not the rule.  In general, KVM wants to be self-contained and exposes
interfaces to connect other parts of the kernel to KVM.  irqfd is the
main such interface; it is used by both vhost and VFIO, for example.

> So in the case of write I/O using virtio-blk dataplane=off
> [...] What is going to happen after the host there is
> the real I/O completion, the host complete bh is executed?  We go
> through iothread to guest, in order to executte the
> virtio-blk-complete request?

  virtio_blk_req_complete
  -> virtio_notify
  -> virtio_pci_notify
  -> either msix_notify or pci_set_irq

The paths then are different.  Assuming you are using the kernel's LAPIC
implementation (which has a QEMU "bridge" in hw/i386/kvm/apic.c), for
msix_notify it goes like this:

  msix_notify
  -> msi_send_message
  -> address_space_stl_le
  -> ...
  -> kvm_apic_mem_write
  -> kvm_irqchip_send_msi
  -> kvm_vm_ioctl

while for pci_set_irq:

  pci_set_irq
  -> pci_irq_handler
  -> pci_change_irq_level
  -> piix3_set_irq
  -> piix3_set_irq_level
  -> piix3_set_irq_pic
  -> qemu_set_irq(piix3->pic[pic_irq], ...)
  -> kvm_pc_gsi_handler
  -> qemu_set_irq(s->i8259_irq[n], ...)
  -> kvm_pic_set_irq
  -> kvm_set_irq
  -> kvm_vm_ioctl

> One last Question about vmentry and vmexit code, it seems to me that
> vmentry and vm exit share the same asm block of code:
> I can understand that in 8719 line, we switch to non-root guest mode
> and the lines 8720 and below are not executed. Is this the vmentry?

Yes, it's either line 8717 or line 8719.

> And when a vmexit happens, the instructions from 8721 and below is the
> vmexit part?
> How did the context change?, i mean, which instruction, made the jump,
> and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ")
> \n\t"?

It can be one of many conditions, only some of which correspond to
particular instructions.  All the reasons for vmexit are listed in the
SDM.  They include instructions (e.g. moves to control registers,
RDMSR/WRMSR, HLT, CPUID, etc.), exceptions injected in the guest,
interrupts injected in the host, page faults on EPT pages, conditions
that the processor cannot handle (triple fault, task switch), conditions
requested previously by the hypervisor ("interrupt window" and NMI
window), etc.

You really need to read the manual. :)

Paolo

> --------------------------------------
> 
> /* Enter guest mode */
> 8716                 "jne 1f \n\t"
> 8717                 __ex(ASM_VMX_VMLAUNCH) "\n\t"
> 8718                 "jmp 2f \n\t"
> 8719                 "1: " __ex(ASM_VMX_VMRESUME) "\n\t"
> 8720                 "2: "
> 8721                 /* Save guest registers, load host registers, keep flags */
> 8722                 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
> 8723                 "pop %0 \n\t"
> 8724                 "mov %%" _ASM_AX ", %c[rax](%0) \n\t"
> 8725                 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
> 8726                 __ASM_SIZE(pop) " %c[rcx](%0) \n\t"
> 8727                 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t"
> 8728                 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t"
> 8729                 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t"
> 8730                 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t"
> 
> 
> 
> 
> 
>> After KVM_IRQFD, the host writes to an eventfd in order to trigger the
>> interrupt.  In QEMU this corresponds to event_notifier_set.
>>
>> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS
>> because it's faster, but they provide the same functionality).
>>
>>> 4)
>>> I've seen from bibliography, that KVM operates in protection ring -1.
>>> What doe it mean? Is there HW implementation for that ring?
>>>
>>> Why not in ring 0?
>>
>> Ring -1 is not a particularly good name.  The right name is that KVM
>> operates in VMX ring 0 root mode, while the guest operates in VMX
>> non-root mode (which can be any of ring 0-1-2-3 depending on the
>> current privilege level of the guest).
>>
>> Paolo
> 
> thanks
> Charls
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-03 15:56       ` Paolo Bonzini
@ 2016-08-05 11:29         ` Charls D. Chap
  2016-08-05 11:59           ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Charls D. Chap @ 2016-08-05 11:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm

On Wed, Aug 3, 2016 at 6:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 03/08/2016 16:43, Charls D. Chap wrote:
>> On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> 3) what is the mechanism of virtual interrupt injection
>>>> What is the mechanism that is used for a virtual interrupt injection,
>>>> in full virtualization?
>>>>
>>>> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?
>>>> to which point of guest? guest complete_bh?
>>>
>>> Interrupt injections happens through ioctls on the KVM file descriptors
>>> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).
>>>
>>> ioctl                   when?                interrupt kind
>>> ------------------------------------------------------------------------
>>> KVM_INTERRUPT           i8259 in userspace   EXTINT
>>> KVM_SET_GSI_ROUTING     (always)             IOAPIC
>>> KVM_SIGNAL_MSI          (always)             MSI
>>> KVM_SET_GSI_ROUTING     (always)             MSI
>>> KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>>>
>>> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
>>> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order
>>> to trigger the interrupt.  In QEMU this corresponds to qemu_irq_raise,
>>> pci_set_irq or msi_notify.
>>>
>> What do you mean by "this corresponds",
>> There is an kvm_vcpu_ioctl from host kernel to guest?
>> or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest??
>
> It's kvm_vcpu_ioctl or kvm_vm_ioctl, and it goes from host userspace to
> host kernel (ioctl is a syscall).  The ioctl is invoked when QEMU
> generates an interrupt with qemu_irq_raise (sometimes called directly,
> sometimes through pci_set_irq) or msi_notify.
>
>> Why not call directly  vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> avoiding the switch to QEMU?
>
> Two reasons.  First, it's QEMU that wants to generate the interrupt.
> The ioctl or eventfd is how KVM receives the signal.
>
> For kernel event sources, those that are part of KVM such as i8254.c
> generate the interrupt through kvm_set_irq.  But this is the exception,
> not the rule.  In general, KVM wants to be self-contained and exposes
> interfaces to connect other parts of the kernel to KVM.  irqfd is the
> main such interface; it is used by both vhost and VFIO, for example.
>
>> So in the case of write I/O using virtio-blk dataplane=off
>> [...] What is going to happen after the host there is
>> the real I/O completion, the host complete bh is executed?  We go
>> through iothread to guest, in order to executte the
>> virtio-blk-complete request?
>

How did the control transfer to QEMU user space (and which thread is
running vcpu or worker)
->virtio_blk_device_realize
-> virtio_blk_req_complete
Was it the "real" interrupt for I/O completion from the device?

Which qemu thread executes the code you mentioned?, vcpu or a
worker(iothread or main_loop)  When did iothread finish its work?




>   virtio_blk_req_complete
>   -> virtio_notify
>   -> virtio_pci_notify
>   -> either msix_notify or pci_set_irq
>
> The paths then are different.  Assuming you are using the kernel's LAPIC
> implementation (which has a QEMU "bridge" in hw/i386/kvm/apic.c), for
> msix_notify it goes like this:
>
>   msix_notify
>   -> msi_send_message
>   -> address_space_stl_le
>   -> ...
>   -> kvm_apic_mem_write
>   -> kvm_irqchip_send_msi
>   -> kvm_vm_ioctl
>
> while for pci_set_irq:
>
>   pci_set_irq
>   -> pci_irq_handler
>   -> pci_change_irq_level
>   -> piix3_set_irq
>   -> piix3_set_irq_level
>   -> piix3_set_irq_pic
>   -> qemu_set_irq(piix3->pic[pic_irq], ...)
>   -> kvm_pc_gsi_handler
>   -> qemu_set_irq(s->i8259_irq[n], ...)
>   -> kvm_pic_set_irq
>   -> kvm_set_irq
>   -> kvm_vm_ioctl
>
>> One last Question about vmentry and vmexit code, it seems to me that
>> vmentry and vm exit share the same asm block of code:
>> I can understand that in 8719 line, we switch to non-root guest mode
>> and the lines 8720 and below are not executed. Is this the vmentry?
>
> Yes, it's either line 8717 or line 8719.
>
>> And when a vmexit happens, the instructions from 8721 and below is the
>> vmexit part?
>> How did the context change?, i mean, which instruction, made the jump,
>> and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ")
>> \n\t"?


I know that there are many exit reasons, but it's not clear to me
HOW exactly, transfer the control from the execution of one of these
instructions
to VMEXIT point which is "vmx_return: " _ASM_PTR " 2b \n\t"
Where does this extraction happened and we jumped to this label?
Is it inside of the corresponding ioctl implementation?

I guess the answer is: "read the manual", which is fine to me, because
you already helped me a lot :)


>
> It can be one of many conditions, only some of which correspond to
> particular instructions.  All the reasons for vmexit are listed in the
> SDM.  They include instructions (e.g. moves to control registers,
> RDMSR/WRMSR, HLT, CPUID, etc.), exceptions injected in the guest,
> interrupts injected in the host, page faults on EPT pages, conditions
> that the processor cannot handle (triple fault, task switch), conditions
> requested previously by the hypervisor ("interrupt window" and NMI
> window), etc.
>
> You really need to read the manual. :)
>



> Paolo
>
>> --------------------------------------
>>
>> /* Enter guest mode */
>> 8716                 "jne 1f \n\t"
>> 8717                 __ex(ASM_VMX_VMLAUNCH) "\n\t"
>> 8718                 "jmp 2f \n\t"
>> 8719                 "1: " __ex(ASM_VMX_VMRESUME) "\n\t"
>> 8720                 "2: "
>> 8721                 /* Save guest registers, load host registers, keep flags */
>> 8722                 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
>> 8723                 "pop %0 \n\t"
>> 8724                 "mov %%" _ASM_AX ", %c[rax](%0) \n\t"
>> 8725                 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
>> 8726                 __ASM_SIZE(pop) " %c[rcx](%0) \n\t"
>> 8727                 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t"
>> 8728                 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t"
>> 8729                 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t"
>> 8730                 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t"
>>
>>
>>
>>
>>
>>> After KVM_IRQFD, the host writes to an eventfd in order to trigger the
>>> interrupt.  In QEMU this corresponds to event_notifier_set.
>>>
>>> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS
>>> because it's faster, but they provide the same functionality).
>>>
>>>> 4)
>>>> I've seen from bibliography, that KVM operates in protection ring -1.
>>>> What doe it mean? Is there HW implementation for that ring?
>>>>
>>>> Why not in ring 0?
>>>
>>> Ring -1 is not a particularly good name.  The right name is that KVM
>>> operates in VMX ring 0 root mode, while the guest operates in VMX
>>> non-root mode (which can be any of ring 0-1-2-3 depending on the
>>> current privilege level of the guest).
>>>
>>> Paolo
>>
>> thanks
>> Charls
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-05 11:29         ` Charls D. Chap
@ 2016-08-05 11:59           ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-08-05 11:59 UTC (permalink / raw)
  To: Charls D. Chap; +Cc: kvm

> >> So in the case of write I/O using virtio-blk dataplane=off
> >> [...] What is going to happen after the host there is
> >> the real I/O completion, the host complete bh is executed?  We go
> >> through iothread to guest, in order to executte the
> >> virtio-blk-complete request?
> >
> 
> How did the control transfer to QEMU user space (and which thread is
> running vcpu or worker)
> ->virtio_blk_device_realize
> -> virtio_blk_req_complete
> Was it the "real" interrupt for I/O completion from the device?
> 
> Which qemu thread executes the code you mentioned?, vcpu or a
> worker(iothread or main_loop)  When did iothread finish its work?

There are two ways:

1) the VCPU thread starts the I/O (control is transferred to QEMU user space
by leaving KVM_RUN).  The I/O system call happens in a worker thread.  When the
systemc call is finished the worker thread wakes up the I/O thread and the I/O
thread executes virtio_blk_req_complete.

2) the VCPU thread (which is running KVM_RUN) writes to an eventfd, which wakes
up the I/O thread.  The I/O thread runs the I/O system call in a worker
thread, same as case 1.  Also like case 1, when the I/O is finished the
worker thread wakes up the I/O thread and the I/O thread executes
virtio_blk_req_complete.

> I know that there are many exit reasons, but it's not clear to me
> HOW exactly, transfer the control from the execution of one of these
> instructions
> to VMEXIT point which is "vmx_return: " _ASM_PTR " 2b \n\t"
> Where does this extraction happened and we jumped to this label?
> Is it inside of the corresponding ioctl implementation?
> 
> I guess the answer is: "read the manual", which is fine to me, because
> you already helped me a lot :)

This is a more specific question, and thus easier to answer: after a
vmexit the instruction pointer is reset to the VMCS's HOST_RIP field,
and KVM writes the address of vmx_return to that field. :)

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-02 17:33   ` Paolo Bonzini
  2016-08-03 14:43     ` Charls D. Chap
@ 2016-08-25  9:12     ` Wanpeng Li
  2016-08-29  9:55       ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2016-08-25  9:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Charls D. Chap, kvm

2016-08-03 1:33 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
[...]
>
> Interrupt injections happens through ioctls on the KVM file descriptors
> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others).
>
> When the LAPIC is emulated by userspace (not the common case) this is
> done with the KVM_INTERRUPT ioctl.  When the LAPIC is emulated in kernel,
> there are various mechanisms.
>
> ioctl                   when?                interrupt kind
> ------------------------------------------------------------------------
> KVM_INTERRUPT           i8259 in userspace   EXTINT
> KVM_SET_GSI_ROUTING     (always)             IOAPIC
> KVM_SIGNAL_MSI          (always)             MSI
> KVM_SET_GSI_ROUTING     (always)             MSI
> KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>
> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM

MSI routing is also set by KVM_SET_GSI_ROUTING，though MSI/MSI-X
doesn't associated with GSI, this is intended to save a KVM API,
right? In addition, kvm_send_userspace_msi() which is called in
KVM_SIGNAL_MSI path set MSI routing entry again, I saw there are
patches which will update gsi routing after changed MSI-X
configuration(https://patchwork.kernel.org/patch/6827431/), so why set
MSI routing entry again during send MSI?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-25  9:12     ` Wanpeng Li
@ 2016-08-29  9:55       ` Paolo Bonzini
  2016-08-29 10:22         ` Wanpeng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-08-29  9:55 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Charls D. Chap, kvm



On 25/08/2016 11:12, Wanpeng Li wrote:
>> >
>> > ioctl                   when?                interrupt kind
>> > ------------------------------------------------------------------------
>> > KVM_INTERRUPT           i8259 in userspace   EXTINT
>> > KVM_SET_GSI_ROUTING     (always)             IOAPIC
>> > KVM_SIGNAL_MSI          (always)             MSI
>> > KVM_SET_GSI_ROUTING     (always)             MSI
>> > KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>> >
>> > After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
> MSI routing is also set by KVM_SET_GSI_ROUTING，though MSI/MSI-X
> doesn't associated with GSI, this is intended to save a KVM API,
> right? In addition, kvm_send_userspace_msi() which is called in
> KVM_SIGNAL_MSI path set MSI routing entry again,

No, kvm_send_userspace_msi simply converts the struct used by userspace
(struct kvm_msi) into the one used by the kernel (struct
kvm_kernel_irq_routing_entry).

 I saw there are
> patches which will update gsi routing after changed MSI-X
> configuration(https://patchwork.kernel.org/patch/6827431/)
> , so why set MSI routing entry again during send MSI?

KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if
userspace wants it can use both.  Note that you've linked a patch for
lkvm, not Linux or QEMU.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-29  9:55       ` Paolo Bonzini
@ 2016-08-29 10:22         ` Wanpeng Li
  2016-08-29 16:39           ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2016-08-29 10:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Charls D. Chap, kvm

2016-08-29 17:55 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 25/08/2016 11:12, Wanpeng Li wrote:
>>> >
>>> > ioctl                   when?                interrupt kind
>>> > ------------------------------------------------------------------------
>>> > KVM_INTERRUPT           i8259 in userspace   EXTINT
>>> > KVM_SET_GSI_ROUTING     (always)             IOAPIC
>>> > KVM_SIGNAL_MSI          (always)             MSI
>>> > KVM_SET_GSI_ROUTING     (always)             MSI
>>> > KVM_IRQFD                                    any that can use KVM_SET_GSI_ROUTING
>>> >
>>> > After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM
>> MSI routing is also set by KVM_SET_GSI_ROUTING，though MSI/MSI-X
>> doesn't associated with GSI, this is intended to save a KVM API,
>> right? In addition, kvm_send_userspace_msi() which is called in
>> KVM_SIGNAL_MSI path set MSI routing entry again,
>
> No, kvm_send_userspace_msi simply converts the struct used by userspace
> (struct kvm_msi) into the one used by the kernel (struct
> kvm_kernel_irq_routing_entry).

Agreed. :)

>
>  I saw there are
>> patches which will update gsi routing after changed MSI-X
>> configuration(https://patchwork.kernel.org/patch/6827431/)
>> , so why set MSI routing entry again during send MSI?
>
> KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if
> userspace wants it can use both.  Note that you've linked a patch for
> lkvm, not Linux or QEMU.

Do you mean that MSI/MSI-X don't necessary have a routing table?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-29 10:22         ` Wanpeng Li
@ 2016-08-29 16:39           ` Paolo Bonzini
  2016-08-30  0:39             ` Wanpeng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2016-08-29 16:39 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Charls D. Chap, kvm



On 29/08/2016 12:22, Wanpeng Li wrote:
>>> >> patches which will update gsi routing after changed MSI-X
>>> >> configuration(https://patchwork.kernel.org/patch/6827431/)
>>> >> , so why set MSI routing entry again during send MSI?
>> >
>> > KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if
>> > userspace wants it can use both.  Note that you've linked a patch for
>> > lkvm, not Linux or QEMU.
> Do you mean that MSI/MSI-X don't necessary have a routing table?

Generally it's not if you don't use irqfd.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-08-29 16:39           ` Paolo Bonzini
@ 2016-08-30  0:39             ` Wanpeng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2016-08-30  0:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Charls D. Chap, kvm

2016-08-30 0:39 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 29/08/2016 12:22, Wanpeng Li wrote:
>>>> >> patches which will update gsi routing after changed MSI-X
>>>> >> configuration(https://patchwork.kernel.org/patch/6827431/)
>>>> >> , so why set MSI routing entry again during send MSI?
>>> >
>>> > KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if
>>> > userspace wants it can use both.  Note that you've linked a patch for
>>> > lkvm, not Linux or QEMU.
>> Do you mean that MSI/MSI-X don't necessary have a routing table?
>
> Generally it's not if you don't use irqfd.

Thanks for pointing out. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tlb flush after each vm_exit, also virtual interrupts injection
  2016-07-27 10:19 tlb flush after each vm_exit, also virtual interrupts injection charls chap
  2016-07-28  8:20 ` Fwd: " Charls D. Chap
@ 2016-07-28 13:25 ` Radim Krčmář
  1 sibling, 0 replies; 13+ messages in thread
From: Radim Krčmář @ 2016-07-28 13:25 UTC (permalink / raw)
  To: charls chap; +Cc: kvm

2016-07-27 13:19+0300, charls chap:
> Hello List,
> 
> 1)
> I've seen some slides, back in 08, in which it is described
> that the use of VPID, will solve the problem of TLB flush after each VM_EXIT.

VPID solves a problem of excessive TLB flushing by tagging TLB entries
with VPID.
VMX without VPID features flushes TLB on transitions (entry/exit),
because the hardware cannot tell what is valid in current context.

> But, i see from the code that it actually does a flush after a VM_EXIT.

Please quote the code you are seeing.

> Obviously, i am wrong. So I need some help,
> Where to look, i mean which lines of code, in order to figure out, what is
> happening with TLB flush and VM_EXITS

I don't know what you know, so I can only recommend:

 1) read SDM for a while
 2) git grep -W -i 'tlb\|vpid' arch/x86/kvm virt/kvm
 3) goto (1)

> 2) system call from ing 0 (non-root), to ring 0(root)
> Could guest os, do a system call to host os?

Somewhat, there are many ways how to communicate, it would not be a
system call in linux terminology, though.

Maybe you are thinking about hypercalls?

(In any case, KVM was not designed for sharing host kernel
 infrastructure with programs running in non-root ring 0.)

> 3) what is the mechanism of virtual interrupt injection
> What is the mechanism that is used for a virtual interrupt injection,
> in full virtualization?

Every interrupt delivery starts by configuring data structures that
depends on the chosen method.  There are two main categories of methods:

1) Hardware can be configured while the guest is running.
   These methods usually send a special interrupt to the physical CPU
   that evaluates the configured data structures.
   Hardware can also be the one that configures data structures, so
   there is no need for a hypervisor intervention for sending the
   interrupt.

2) Hardware cannot be configured while the guest is running.
   Configured data structures are evaluated on guest entry.
   (The interrupt might also be postponed until guest state allows it,
    e.g. TPR.)

Hardware will deliver both interrupts using the guest state.

> Host injects an interrupt to guest, HOW?  eg. hardware interrupt?

Same as above.

> to which point of guest? guest complete_bh?

The guest shouldn't be able to tell a difference, so at any point that
is possible in the host (also a subset of them).

> 4)
> I've seen from bibliography, that KVM operates in protection ring -1.
> What doe it mean? Is there HW implementation for that ring?
> 
> Why not in ring 0?

If we are taking about VMX, ring -1 is an analogy.

Host (KVM) operates with CPL 0 in VMX root mode, which was likely called
ring -1 by the authors.  A guest operates with CPL 0 too, but it is in
VMX non-root mode, so called ring 0.

(VMX can also operate in dual monitor mode, so the analogy could be
 extended to call VMX operating in SMM as ring -2.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-08-30  0:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-27 10:19 tlb flush after each vm_exit, also virtual interrupts injection charls chap
2016-07-28  8:20 ` Fwd: " Charls D. Chap
2016-08-02 17:33   ` Paolo Bonzini
2016-08-03 14:43     ` Charls D. Chap
2016-08-03 15:56       ` Paolo Bonzini
2016-08-05 11:29         ` Charls D. Chap
2016-08-05 11:59           ` Paolo Bonzini
2016-08-25  9:12     ` Wanpeng Li
2016-08-29  9:55       ` Paolo Bonzini
2016-08-29 10:22         ` Wanpeng Li
2016-08-29 16:39           ` Paolo Bonzini
2016-08-30  0:39             ` Wanpeng Li
2016-07-28 13:25 ` Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).