* tlb flush after each vm_exit, also virtual interrupts injection @ 2016-07-27 10:19 charls chap 2016-07-28 8:20 ` Fwd: " Charls D. Chap 2016-07-28 13:25 ` Radim Krčmář 0 siblings, 2 replies; 13+ messages in thread From: charls chap @ 2016-07-27 10:19 UTC (permalink / raw) To: kvm Hello List, 1) I've seen some slides, back in 08, in which it is described that the use of VPID, will solve the problem of TLB flush after each VM_EXIT. But, i see from the code that it actually does a flush after a VM_EXIT. Obviously, i am wrong. So I need some help, Where to look, i mean which lines of code, in order to figure out, what is happening with TLB flush and VM_EXITS 2) system call from ing 0 (non-root), to ring 0(root) Could guest os, do a system call to host os? 3) what is the mechanism of virtual interrupt injection What is the mechanism that is used for a virtual interrupt injection, in full virtualization? Host injects an interrupt to guest, HOW? eg. hardware interrupt? to which point of guest? guest complete_bh? 4) I've seen from bibliography, that KVM operates in protection ring -1. What doe it mean? Is there HW implementation for that ring? Why not in ring 0? Looking forward for your help ^ permalink raw reply [flat|nested] 13+ messages in thread
* Fwd: tlb flush after each vm_exit, also virtual interrupts injection 2016-07-27 10:19 tlb flush after each vm_exit, also virtual interrupts injection charls chap @ 2016-07-28 8:20 ` Charls D. Chap 2016-08-02 17:33 ` Paolo Bonzini 2016-07-28 13:25 ` Radim Krčmář 1 sibling, 1 reply; 13+ messages in thread From: Charls D. Chap @ 2016-07-28 8:20 UTC (permalink / raw) To: kvm; +Cc: pbonzini ---------- Forwarded message ---------- From: charls chap <chapcharls@gmail.com> Date: Wed, Jul 27, 2016 at 1:19 PM Subject: tlb flush after each vm_exit, also virtual interrupts injection To: kvm@vger.kernel.org Hello List, 1) I've seen some slides, back in 08, in which it is described that the use of VPID, will solve the problem of TLB flush after each VM_EXIT. But, i see from the code that it actually does a flush after a VM_EXIT. Obviously, i am wrong. So I need some help, Where to look, i mean which lines of code, in order to figure out, what is happening with TLB flush and VM_EXITS 2) system call from ing 0 (non-root), to ring 0(root) Could guest os, do a system call to host os? 3) what is the mechanism of virtual interrupt injection What is the mechanism that is used for a virtual interrupt injection, in full virtualization? Host injects an interrupt to guest, HOW? eg. hardware interrupt? to which point of guest? guest complete_bh? 4) I've seen from bibliography, that KVM operates in protection ring -1. What doe it mean? Is there HW implementation for that ring? Why not in ring 0? Looking forward for your help ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-07-28 8:20 ` Fwd: " Charls D. Chap @ 2016-08-02 17:33 ` Paolo Bonzini 2016-08-03 14:43 ` Charls D. Chap 2016-08-25 9:12 ` Wanpeng Li 0 siblings, 2 replies; 13+ messages in thread From: Paolo Bonzini @ 2016-08-02 17:33 UTC (permalink / raw) To: Charls D. Chap; +Cc: kvm > 1) I've seen some slides, back in 08, in which it is described > that the use of VPID, will solve the problem of TLB flush after each VM_EXIT. > But, i see from the code that it actually does a flush after a VM_EXIT. > > Obviously, i am wrong. So I need some help, > Where to look, i mean which lines of code, in order to figure out, what is > happening with TLB flush and VM_EXITS You are saying that you "see from the code that it actually does a flush after a VM_EXIT". Where is this? > 2) system call from ing 0 (non-root), to ring 0(root) > Could guest os, do a system call to host os? No. You'd need a program running on the host, and a channel between this program and a guest (such as a socket or a serial port). > 3) what is the mechanism of virtual interrupt injection > What is the mechanism that is used for a virtual interrupt injection, > in full virtualization? > > Host injects an interrupt to guest, HOW? eg. hardware interrupt? > to which point of guest? guest complete_bh? Interrupt injections happens through ioctls on the KVM file descriptors (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others). When the LAPIC is emulated by userspace (not the common case) this is done with the KVM_INTERRUPT ioctl. When the LAPIC is emulated in kernel, there are various mechanisms. ioctl when? interrupt kind ------------------------------------------------------------------------ KVM_INTERRUPT i8259 in userspace EXTINT KVM_SET_GSI_ROUTING (always) IOAPIC KVM_SIGNAL_MSI (always) MSI KVM_SET_GSI_ROUTING (always) MSI KVM_IRQFD any that can use KVM_SET_GSI_ROUTING After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order to trigger the interrupt. In QEMU this corresponds to qemu_irq_raise, pci_set_irq or msi_notify. After KVM_IRQFD, the host writes to an eventfd in order to trigger the interrupt. In QEMU this corresponds to event_notifier_set. (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS because it's faster, but they provide the same functionality). > 4) > I've seen from bibliography, that KVM operates in protection ring -1. > What doe it mean? Is there HW implementation for that ring? > > Why not in ring 0? Ring -1 is not a particularly good name. The right name is that KVM operates in VMX ring 0 root mode, while the guest operates in VMX non-root mode (which can be any of ring 0-1-2-3 depending on the current privilege level of the guest). Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-02 17:33 ` Paolo Bonzini @ 2016-08-03 14:43 ` Charls D. Chap 2016-08-03 15:56 ` Paolo Bonzini 2016-08-25 9:12 ` Wanpeng Li 1 sibling, 1 reply; 13+ messages in thread From: Charls D. Chap @ 2016-08-03 14:43 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kvm On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> 1) I've seen some slides, back in 08, in which it is described >> that the use of VPID, will solve the problem of TLB flush after each VM_EXIT. >> But, i see from the code that it actually does a flush after a VM_EXIT. >> >> Obviously, i am wrong. So I need some help, >> Where to look, i mean which lines of code, in order to figure out, what is >> happening with TLB flush and VM_EXITS > > You are saying that you "see from the code that it actually does a flush > after a VM_EXIT". Where is this? > >> 2) system call from ing 0 (non-root), to ring 0(root) >> Could guest os, do a system call to host os? > > No. You'd need a program running on the host, and a channel between > this program and a guest (such as a socket or a serial port). > >> 3) what is the mechanism of virtual interrupt injection >> What is the mechanism that is used for a virtual interrupt injection, >> in full virtualization? >> >> Host injects an interrupt to guest, HOW? eg. hardware interrupt? >> to which point of guest? guest complete_bh? > > Interrupt injections happens through ioctls on the KVM file descriptors > (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others). > > When the LAPIC is emulated by userspace (not the common case) this is > done with the KVM_INTERRUPT ioctl. When the LAPIC is emulated in kernel, > there are various mechanisms. > > ioctl when? interrupt kind > ------------------------------------------------------------------------ > KVM_INTERRUPT i8259 in userspace EXTINT > KVM_SET_GSI_ROUTING (always) IOAPIC > KVM_SIGNAL_MSI (always) MSI > KVM_SET_GSI_ROUTING (always) MSI > KVM_IRQFD any that can use KVM_SET_GSI_ROUTING > > After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM > file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order > to trigger the interrupt. In QEMU this corresponds to qemu_irq_raise, > pci_set_irq or msi_notify. > What do you mean by "this corresponds", There is an kvm_vcpu_ioctl from host kernel to guest? or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest?? Why not call directly vcpu_enter_guest(struct kvm_vcpu *vcpu) avoiding the switch to QEMU? So in the case of write I/O using virtio-blk dataplane=off For the return I/O path: what qemu/host, host/qemu, and qemu/guest are there? The above ioctls go from host kvm to qemu, and the qemu notifies guest? how? ioctl(SET_GSI_ROUTING) ioctl(KVM_IRQFD) For the return path: What is going to happen after the host there is the real I/O completion, the host complete bh is executed? We go through iothread to guest, in order to executte the virtio-blk-complete request? One last Question about vmentry and vmexit code, it seems to me that vmentry and vm exit share the same asm block of code: I can understand that in 8719 line, we switch to non-root guest mode and the lines 8720 and below are not executed. Is this the vmentry? And when a vmexit happens, the instructions from 8721 and below is the vmexit part? How did the context change?, i mean, which instruction, made the jump, and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"? -------------------------------------- /* Enter guest mode */ 8716 "jne 1f \n\t" 8717 __ex(ASM_VMX_VMLAUNCH) "\n\t" 8718 "jmp 2f \n\t" 8719 "1: " __ex(ASM_VMX_VMRESUME) "\n\t" 8720 "2: " 8721 /* Save guest registers, load host registers, keep flags */ 8722 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t" 8723 "pop %0 \n\t" 8724 "mov %%" _ASM_AX ", %c[rax](%0) \n\t" 8725 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t" 8726 __ASM_SIZE(pop) " %c[rcx](%0) \n\t" 8727 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t" 8728 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t" 8729 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t" 8730 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t" > After KVM_IRQFD, the host writes to an eventfd in order to trigger the > interrupt. In QEMU this corresponds to event_notifier_set. > > (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS > because it's faster, but they provide the same functionality). > >> 4) >> I've seen from bibliography, that KVM operates in protection ring -1. >> What doe it mean? Is there HW implementation for that ring? >> >> Why not in ring 0? > > Ring -1 is not a particularly good name. The right name is that KVM > operates in VMX ring 0 root mode, while the guest operates in VMX > non-root mode (which can be any of ring 0-1-2-3 depending on the > current privilege level of the guest). > > Paolo thanks Charls ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-03 14:43 ` Charls D. Chap @ 2016-08-03 15:56 ` Paolo Bonzini 2016-08-05 11:29 ` Charls D. Chap 0 siblings, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2016-08-03 15:56 UTC (permalink / raw) To: Charls D. Chap; +Cc: kvm On 03/08/2016 16:43, Charls D. Chap wrote: > On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> 3) what is the mechanism of virtual interrupt injection >>> What is the mechanism that is used for a virtual interrupt injection, >>> in full virtualization? >>> >>> Host injects an interrupt to guest, HOW? eg. hardware interrupt? >>> to which point of guest? guest complete_bh? >> >> Interrupt injections happens through ioctls on the KVM file descriptors >> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others). >> >> ioctl when? interrupt kind >> ------------------------------------------------------------------------ >> KVM_INTERRUPT i8259 in userspace EXTINT >> KVM_SET_GSI_ROUTING (always) IOAPIC >> KVM_SIGNAL_MSI (always) MSI >> KVM_SET_GSI_ROUTING (always) MSI >> KVM_IRQFD any that can use KVM_SET_GSI_ROUTING >> >> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM >> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order >> to trigger the interrupt. In QEMU this corresponds to qemu_irq_raise, >> pci_set_irq or msi_notify. >> > What do you mean by "this corresponds", > There is an kvm_vcpu_ioctl from host kernel to guest? > or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest?? It's kvm_vcpu_ioctl or kvm_vm_ioctl, and it goes from host userspace to host kernel (ioctl is a syscall). The ioctl is invoked when QEMU generates an interrupt with qemu_irq_raise (sometimes called directly, sometimes through pci_set_irq) or msi_notify. > Why not call directly vcpu_enter_guest(struct kvm_vcpu *vcpu) > avoiding the switch to QEMU? Two reasons. First, it's QEMU that wants to generate the interrupt. The ioctl or eventfd is how KVM receives the signal. For kernel event sources, those that are part of KVM such as i8254.c generate the interrupt through kvm_set_irq. But this is the exception, not the rule. In general, KVM wants to be self-contained and exposes interfaces to connect other parts of the kernel to KVM. irqfd is the main such interface; it is used by both vhost and VFIO, for example. > So in the case of write I/O using virtio-blk dataplane=off > [...] What is going to happen after the host there is > the real I/O completion, the host complete bh is executed? We go > through iothread to guest, in order to executte the > virtio-blk-complete request? virtio_blk_req_complete -> virtio_notify -> virtio_pci_notify -> either msix_notify or pci_set_irq The paths then are different. Assuming you are using the kernel's LAPIC implementation (which has a QEMU "bridge" in hw/i386/kvm/apic.c), for msix_notify it goes like this: msix_notify -> msi_send_message -> address_space_stl_le -> ... -> kvm_apic_mem_write -> kvm_irqchip_send_msi -> kvm_vm_ioctl while for pci_set_irq: pci_set_irq -> pci_irq_handler -> pci_change_irq_level -> piix3_set_irq -> piix3_set_irq_level -> piix3_set_irq_pic -> qemu_set_irq(piix3->pic[pic_irq], ...) -> kvm_pc_gsi_handler -> qemu_set_irq(s->i8259_irq[n], ...) -> kvm_pic_set_irq -> kvm_set_irq -> kvm_vm_ioctl > One last Question about vmentry and vmexit code, it seems to me that > vmentry and vm exit share the same asm block of code: > I can understand that in 8719 line, we switch to non-root guest mode > and the lines 8720 and below are not executed. Is this the vmentry? Yes, it's either line 8717 or line 8719. > And when a vmexit happens, the instructions from 8721 and below is the > vmexit part? > How did the context change?, i mean, which instruction, made the jump, > and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ") > \n\t"? It can be one of many conditions, only some of which correspond to particular instructions. All the reasons for vmexit are listed in the SDM. They include instructions (e.g. moves to control registers, RDMSR/WRMSR, HLT, CPUID, etc.), exceptions injected in the guest, interrupts injected in the host, page faults on EPT pages, conditions that the processor cannot handle (triple fault, task switch), conditions requested previously by the hypervisor ("interrupt window" and NMI window), etc. You really need to read the manual. :) Paolo > -------------------------------------- > > /* Enter guest mode */ > 8716 "jne 1f \n\t" > 8717 __ex(ASM_VMX_VMLAUNCH) "\n\t" > 8718 "jmp 2f \n\t" > 8719 "1: " __ex(ASM_VMX_VMRESUME) "\n\t" > 8720 "2: " > 8721 /* Save guest registers, load host registers, keep flags */ > 8722 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t" > 8723 "pop %0 \n\t" > 8724 "mov %%" _ASM_AX ", %c[rax](%0) \n\t" > 8725 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t" > 8726 __ASM_SIZE(pop) " %c[rcx](%0) \n\t" > 8727 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t" > 8728 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t" > 8729 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t" > 8730 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t" > > > > > >> After KVM_IRQFD, the host writes to an eventfd in order to trigger the >> interrupt. In QEMU this corresponds to event_notifier_set. >> >> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS >> because it's faster, but they provide the same functionality). >> >>> 4) >>> I've seen from bibliography, that KVM operates in protection ring -1. >>> What doe it mean? Is there HW implementation for that ring? >>> >>> Why not in ring 0? >> >> Ring -1 is not a particularly good name. The right name is that KVM >> operates in VMX ring 0 root mode, while the guest operates in VMX >> non-root mode (which can be any of ring 0-1-2-3 depending on the >> current privilege level of the guest). >> >> Paolo > > thanks > Charls > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-03 15:56 ` Paolo Bonzini @ 2016-08-05 11:29 ` Charls D. Chap 2016-08-05 11:59 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Charls D. Chap @ 2016-08-05 11:29 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kvm On Wed, Aug 3, 2016 at 6:56 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 03/08/2016 16:43, Charls D. Chap wrote: >> On Tue, Aug 2, 2016 at 8:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> 3) what is the mechanism of virtual interrupt injection >>>> What is the mechanism that is used for a virtual interrupt injection, >>>> in full virtualization? >>>> >>>> Host injects an interrupt to guest, HOW? eg. hardware interrupt? >>>> to which point of guest? guest complete_bh? >>> >>> Interrupt injections happens through ioctls on the KVM file descriptors >>> (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others). >>> >>> ioctl when? interrupt kind >>> ------------------------------------------------------------------------ >>> KVM_INTERRUPT i8259 in userspace EXTINT >>> KVM_SET_GSI_ROUTING (always) IOAPIC >>> KVM_SIGNAL_MSI (always) MSI >>> KVM_SET_GSI_ROUTING (always) MSI >>> KVM_IRQFD any that can use KVM_SET_GSI_ROUTING >>> >>> After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM >>> file descriptor (either KVM_IRQ_LINE or KVM_IRQ_LINE_STATUS) in order >>> to trigger the interrupt. In QEMU this corresponds to qemu_irq_raise, >>> pci_set_irq or msi_notify. >>> >> What do you mean by "this corresponds", >> There is an kvm_vcpu_ioctl from host kernel to guest? >> or kvm_vcpu_ioct from host kernel, to host userspace (qemu) to guest?? > > It's kvm_vcpu_ioctl or kvm_vm_ioctl, and it goes from host userspace to > host kernel (ioctl is a syscall). The ioctl is invoked when QEMU > generates an interrupt with qemu_irq_raise (sometimes called directly, > sometimes through pci_set_irq) or msi_notify. > >> Why not call directly vcpu_enter_guest(struct kvm_vcpu *vcpu) >> avoiding the switch to QEMU? > > Two reasons. First, it's QEMU that wants to generate the interrupt. > The ioctl or eventfd is how KVM receives the signal. > > For kernel event sources, those that are part of KVM such as i8254.c > generate the interrupt through kvm_set_irq. But this is the exception, > not the rule. In general, KVM wants to be self-contained and exposes > interfaces to connect other parts of the kernel to KVM. irqfd is the > main such interface; it is used by both vhost and VFIO, for example. > >> So in the case of write I/O using virtio-blk dataplane=off >> [...] What is going to happen after the host there is >> the real I/O completion, the host complete bh is executed? We go >> through iothread to guest, in order to executte the >> virtio-blk-complete request? > How did the control transfer to QEMU user space (and which thread is running vcpu or worker) ->virtio_blk_device_realize -> virtio_blk_req_complete Was it the "real" interrupt for I/O completion from the device? Which qemu thread executes the code you mentioned?, vcpu or a worker(iothread or main_loop) When did iothread finish its work? > virtio_blk_req_complete > -> virtio_notify > -> virtio_pci_notify > -> either msix_notify or pci_set_irq > > The paths then are different. Assuming you are using the kernel's LAPIC > implementation (which has a QEMU "bridge" in hw/i386/kvm/apic.c), for > msix_notify it goes like this: > > msix_notify > -> msi_send_message > -> address_space_stl_le > -> ... > -> kvm_apic_mem_write > -> kvm_irqchip_send_msi > -> kvm_vm_ioctl > > while for pci_set_irq: > > pci_set_irq > -> pci_irq_handler > -> pci_change_irq_level > -> piix3_set_irq > -> piix3_set_irq_level > -> piix3_set_irq_pic > -> qemu_set_irq(piix3->pic[pic_irq], ...) > -> kvm_pc_gsi_handler > -> qemu_set_irq(s->i8259_irq[n], ...) > -> kvm_pic_set_irq > -> kvm_set_irq > -> kvm_vm_ioctl > >> One last Question about vmentry and vmexit code, it seems to me that >> vmentry and vm exit share the same asm block of code: >> I can understand that in 8719 line, we switch to non-root guest mode >> and the lines 8720 and below are not executed. Is this the vmentry? > > Yes, it's either line 8717 or line 8719. > >> And when a vmexit happens, the instructions from 8721 and below is the >> vmexit part? >> How did the context change?, i mean, which instruction, made the jump, >> and now we are in this line "mov %0, %c[wordsize](%%" _ASM_SP ") >> \n\t"? I know that there are many exit reasons, but it's not clear to me HOW exactly, transfer the control from the execution of one of these instructions to VMEXIT point which is "vmx_return: " _ASM_PTR " 2b \n\t" Where does this extraction happened and we jumped to this label? Is it inside of the corresponding ioctl implementation? I guess the answer is: "read the manual", which is fine to me, because you already helped me a lot :) > > It can be one of many conditions, only some of which correspond to > particular instructions. All the reasons for vmexit are listed in the > SDM. They include instructions (e.g. moves to control registers, > RDMSR/WRMSR, HLT, CPUID, etc.), exceptions injected in the guest, > interrupts injected in the host, page faults on EPT pages, conditions > that the processor cannot handle (triple fault, task switch), conditions > requested previously by the hypervisor ("interrupt window" and NMI > window), etc. > > You really need to read the manual. :) > > Paolo > >> -------------------------------------- >> >> /* Enter guest mode */ >> 8716 "jne 1f \n\t" >> 8717 __ex(ASM_VMX_VMLAUNCH) "\n\t" >> 8718 "jmp 2f \n\t" >> 8719 "1: " __ex(ASM_VMX_VMRESUME) "\n\t" >> 8720 "2: " >> 8721 /* Save guest registers, load host registers, keep flags */ >> 8722 "mov %0, %c[wordsize](%%" _ASM_SP ") \n\t" >> 8723 "pop %0 \n\t" >> 8724 "mov %%" _ASM_AX ", %c[rax](%0) \n\t" >> 8725 "mov %%" _ASM_BX ", %c[rbx](%0) \n\t" >> 8726 __ASM_SIZE(pop) " %c[rcx](%0) \n\t" >> 8727 "mov %%" _ASM_DX ", %c[rdx](%0) \n\t" >> 8728 "mov %%" _ASM_SI ", %c[rsi](%0) \n\t" >> 8729 "mov %%" _ASM_DI ", %c[rdi](%0) \n\t" >> 8730 "mov %%" _ASM_BP ", %c[rbp](%0) \n\t" >> >> >> >> >> >>> After KVM_IRQFD, the host writes to an eventfd in order to trigger the >>> interrupt. In QEMU this corresponds to event_notifier_set. >>> >>> (For MSI, KVM_SIGNAL_MSI is preferred to KVM_IRQ_LINE/KVM_IRQ_LINE_STATUS >>> because it's faster, but they provide the same functionality). >>> >>>> 4) >>>> I've seen from bibliography, that KVM operates in protection ring -1. >>>> What doe it mean? Is there HW implementation for that ring? >>>> >>>> Why not in ring 0? >>> >>> Ring -1 is not a particularly good name. The right name is that KVM >>> operates in VMX ring 0 root mode, while the guest operates in VMX >>> non-root mode (which can be any of ring 0-1-2-3 depending on the >>> current privilege level of the guest). >>> >>> Paolo >> >> thanks >> Charls >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-05 11:29 ` Charls D. Chap @ 2016-08-05 11:59 ` Paolo Bonzini 0 siblings, 0 replies; 13+ messages in thread From: Paolo Bonzini @ 2016-08-05 11:59 UTC (permalink / raw) To: Charls D. Chap; +Cc: kvm > >> So in the case of write I/O using virtio-blk dataplane=off > >> [...] What is going to happen after the host there is > >> the real I/O completion, the host complete bh is executed? We go > >> through iothread to guest, in order to executte the > >> virtio-blk-complete request? > > > > How did the control transfer to QEMU user space (and which thread is > running vcpu or worker) > ->virtio_blk_device_realize > -> virtio_blk_req_complete > Was it the "real" interrupt for I/O completion from the device? > > Which qemu thread executes the code you mentioned?, vcpu or a > worker(iothread or main_loop) When did iothread finish its work? There are two ways: 1) the VCPU thread starts the I/O (control is transferred to QEMU user space by leaving KVM_RUN). The I/O system call happens in a worker thread. When the systemc call is finished the worker thread wakes up the I/O thread and the I/O thread executes virtio_blk_req_complete. 2) the VCPU thread (which is running KVM_RUN) writes to an eventfd, which wakes up the I/O thread. The I/O thread runs the I/O system call in a worker thread, same as case 1. Also like case 1, when the I/O is finished the worker thread wakes up the I/O thread and the I/O thread executes virtio_blk_req_complete. > I know that there are many exit reasons, but it's not clear to me > HOW exactly, transfer the control from the execution of one of these > instructions > to VMEXIT point which is "vmx_return: " _ASM_PTR " 2b \n\t" > Where does this extraction happened and we jumped to this label? > Is it inside of the corresponding ioctl implementation? > > I guess the answer is: "read the manual", which is fine to me, because > you already helped me a lot :) This is a more specific question, and thus easier to answer: after a vmexit the instruction pointer is reset to the VMCS's HOST_RIP field, and KVM writes the address of vmx_return to that field. :) Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-02 17:33 ` Paolo Bonzini 2016-08-03 14:43 ` Charls D. Chap @ 2016-08-25 9:12 ` Wanpeng Li 2016-08-29 9:55 ` Paolo Bonzini 1 sibling, 1 reply; 13+ messages in thread From: Wanpeng Li @ 2016-08-25 9:12 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Charls D. Chap, kvm 2016-08-03 1:33 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: [...] > > Interrupt injections happens through ioctls on the KVM file descriptors > (the CPU file descriptor for KVM_INTERRUPT, the VM file descriptors for others). > > When the LAPIC is emulated by userspace (not the common case) this is > done with the KVM_INTERRUPT ioctl. When the LAPIC is emulated in kernel, > there are various mechanisms. > > ioctl when? interrupt kind > ------------------------------------------------------------------------ > KVM_INTERRUPT i8259 in userspace EXTINT > KVM_SET_GSI_ROUTING (always) IOAPIC > KVM_SIGNAL_MSI (always) MSI > KVM_SET_GSI_ROUTING (always) MSI > KVM_IRQFD any that can use KVM_SET_GSI_ROUTING > > After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM MSI routing is also set by KVM_SET_GSI_ROUTING,though MSI/MSI-X doesn't associated with GSI, this is intended to save a KVM API, right? In addition, kvm_send_userspace_msi() which is called in KVM_SIGNAL_MSI path set MSI routing entry again, I saw there are patches which will update gsi routing after changed MSI-X configuration(https://patchwork.kernel.org/patch/6827431/), so why set MSI routing entry again during send MSI? Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-25 9:12 ` Wanpeng Li @ 2016-08-29 9:55 ` Paolo Bonzini 2016-08-29 10:22 ` Wanpeng Li 0 siblings, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2016-08-29 9:55 UTC (permalink / raw) To: Wanpeng Li; +Cc: Charls D. Chap, kvm On 25/08/2016 11:12, Wanpeng Li wrote: >> > >> > ioctl when? interrupt kind >> > ------------------------------------------------------------------------ >> > KVM_INTERRUPT i8259 in userspace EXTINT >> > KVM_SET_GSI_ROUTING (always) IOAPIC >> > KVM_SIGNAL_MSI (always) MSI >> > KVM_SET_GSI_ROUTING (always) MSI >> > KVM_IRQFD any that can use KVM_SET_GSI_ROUTING >> > >> > After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM > MSI routing is also set by KVM_SET_GSI_ROUTING,though MSI/MSI-X > doesn't associated with GSI, this is intended to save a KVM API, > right? In addition, kvm_send_userspace_msi() which is called in > KVM_SIGNAL_MSI path set MSI routing entry again, No, kvm_send_userspace_msi simply converts the struct used by userspace (struct kvm_msi) into the one used by the kernel (struct kvm_kernel_irq_routing_entry). I saw there are > patches which will update gsi routing after changed MSI-X > configuration(https://patchwork.kernel.org/patch/6827431/) > , so why set MSI routing entry again during send MSI? KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if userspace wants it can use both. Note that you've linked a patch for lkvm, not Linux or QEMU. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-29 9:55 ` Paolo Bonzini @ 2016-08-29 10:22 ` Wanpeng Li 2016-08-29 16:39 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Wanpeng Li @ 2016-08-29 10:22 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Charls D. Chap, kvm 2016-08-29 17:55 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > > > On 25/08/2016 11:12, Wanpeng Li wrote: >>> > >>> > ioctl when? interrupt kind >>> > ------------------------------------------------------------------------ >>> > KVM_INTERRUPT i8259 in userspace EXTINT >>> > KVM_SET_GSI_ROUTING (always) IOAPIC >>> > KVM_SIGNAL_MSI (always) MSI >>> > KVM_SET_GSI_ROUTING (always) MSI >>> > KVM_IRQFD any that can use KVM_SET_GSI_ROUTING >>> > >>> > After KVM_SET_GSI_ROUTING, the host invokes another ioctl on the VM >> MSI routing is also set by KVM_SET_GSI_ROUTING,though MSI/MSI-X >> doesn't associated with GSI, this is intended to save a KVM API, >> right? In addition, kvm_send_userspace_msi() which is called in >> KVM_SIGNAL_MSI path set MSI routing entry again, > > No, kvm_send_userspace_msi simply converts the struct used by userspace > (struct kvm_msi) into the one used by the kernel (struct > kvm_kernel_irq_routing_entry). Agreed. :) > > I saw there are >> patches which will update gsi routing after changed MSI-X >> configuration(https://patchwork.kernel.org/patch/6827431/) >> , so why set MSI routing entry again during send MSI? > > KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if > userspace wants it can use both. Note that you've linked a patch for > lkvm, not Linux or QEMU. Do you mean that MSI/MSI-X don't necessary have a routing table? Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-29 10:22 ` Wanpeng Li @ 2016-08-29 16:39 ` Paolo Bonzini 2016-08-30 0:39 ` Wanpeng Li 0 siblings, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2016-08-29 16:39 UTC (permalink / raw) To: Wanpeng Li; +Cc: Charls D. Chap, kvm On 29/08/2016 12:22, Wanpeng Li wrote: >>> >> patches which will update gsi routing after changed MSI-X >>> >> configuration(https://patchwork.kernel.org/patch/6827431/) >>> >> , so why set MSI routing entry again during send MSI? >> > >> > KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if >> > userspace wants it can use both. Note that you've linked a patch for >> > lkvm, not Linux or QEMU. > Do you mean that MSI/MSI-X don't necessary have a routing table? Generally it's not if you don't use irqfd. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-08-29 16:39 ` Paolo Bonzini @ 2016-08-30 0:39 ` Wanpeng Li 0 siblings, 0 replies; 13+ messages in thread From: Wanpeng Li @ 2016-08-30 0:39 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Charls D. Chap, kvm 2016-08-30 0:39 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > > > On 29/08/2016 12:22, Wanpeng Li wrote: >>>> >> patches which will update gsi routing after changed MSI-X >>>> >> configuration(https://patchwork.kernel.org/patch/6827431/) >>>> >> , so why set MSI routing entry again during send MSI? >>> > >>> > KVM_SIGNAL_MSI does not need a previous KVM_SET_GSI_ROUTING, but if >>> > userspace wants it can use both. Note that you've linked a patch for >>> > lkvm, not Linux or QEMU. >> Do you mean that MSI/MSI-X don't necessary have a routing table? > > Generally it's not if you don't use irqfd. Thanks for pointing out. :) Regards, Wanpeng Li ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tlb flush after each vm_exit, also virtual interrupts injection 2016-07-27 10:19 tlb flush after each vm_exit, also virtual interrupts injection charls chap 2016-07-28 8:20 ` Fwd: " Charls D. Chap @ 2016-07-28 13:25 ` Radim Krčmář 1 sibling, 0 replies; 13+ messages in thread From: Radim Krčmář @ 2016-07-28 13:25 UTC (permalink / raw) To: charls chap; +Cc: kvm 2016-07-27 13:19+0300, charls chap: > Hello List, > > 1) > I've seen some slides, back in 08, in which it is described > that the use of VPID, will solve the problem of TLB flush after each VM_EXIT. VPID solves a problem of excessive TLB flushing by tagging TLB entries with VPID. VMX without VPID features flushes TLB on transitions (entry/exit), because the hardware cannot tell what is valid in current context. > But, i see from the code that it actually does a flush after a VM_EXIT. Please quote the code you are seeing. > Obviously, i am wrong. So I need some help, > Where to look, i mean which lines of code, in order to figure out, what is > happening with TLB flush and VM_EXITS I don't know what you know, so I can only recommend: 1) read SDM for a while 2) git grep -W -i 'tlb\|vpid' arch/x86/kvm virt/kvm 3) goto (1) > 2) system call from ing 0 (non-root), to ring 0(root) > Could guest os, do a system call to host os? Somewhat, there are many ways how to communicate, it would not be a system call in linux terminology, though. Maybe you are thinking about hypercalls? (In any case, KVM was not designed for sharing host kernel infrastructure with programs running in non-root ring 0.) > 3) what is the mechanism of virtual interrupt injection > What is the mechanism that is used for a virtual interrupt injection, > in full virtualization? Every interrupt delivery starts by configuring data structures that depends on the chosen method. There are two main categories of methods: 1) Hardware can be configured while the guest is running. These methods usually send a special interrupt to the physical CPU that evaluates the configured data structures. Hardware can also be the one that configures data structures, so there is no need for a hypervisor intervention for sending the interrupt. 2) Hardware cannot be configured while the guest is running. Configured data structures are evaluated on guest entry. (The interrupt might also be postponed until guest state allows it, e.g. TPR.) Hardware will deliver both interrupts using the guest state. > Host injects an interrupt to guest, HOW? eg. hardware interrupt? Same as above. > to which point of guest? guest complete_bh? The guest shouldn't be able to tell a difference, so at any point that is possible in the host (also a subset of them). > 4) > I've seen from bibliography, that KVM operates in protection ring -1. > What doe it mean? Is there HW implementation for that ring? > > Why not in ring 0? If we are taking about VMX, ring -1 is an analogy. Host (KVM) operates with CPL 0 in VMX root mode, which was likely called ring -1 by the authors. A guest operates with CPL 0 too, but it is in VMX non-root mode, so called ring 0. (VMX can also operate in dual monitor mode, so the analogy could be extended to call VMX operating in SMM as ring -2.) ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-08-30 0:39 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-27 10:19 tlb flush after each vm_exit, also virtual interrupts injection charls chap 2016-07-28 8:20 ` Fwd: " Charls D. Chap 2016-08-02 17:33 ` Paolo Bonzini 2016-08-03 14:43 ` Charls D. Chap 2016-08-03 15:56 ` Paolo Bonzini 2016-08-05 11:29 ` Charls D. Chap 2016-08-05 11:59 ` Paolo Bonzini 2016-08-25 9:12 ` Wanpeng Li 2016-08-29 9:55 ` Paolo Bonzini 2016-08-29 10:22 ` Wanpeng Li 2016-08-29 16:39 ` Paolo Bonzini 2016-08-30 0:39 ` Wanpeng Li 2016-07-28 13:25 ` Radim Krčmář
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).