From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 3/3] KVM: Enable MSI-X for KVM assigned device Date: Wed, 18 Feb 2009 10:45:19 +0000 Message-ID: <499BE6BF.2050904@redhat.com> References: <1234950262-4142-1-git-send-email-sheng@linux.intel.com> <1234950262-4142-4-git-send-email-sheng@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org To: Sheng Yang Return-path: Received: from mx2.redhat.com ([66.187.237.31]:58937 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751048AbZBRKov (ORCPT ); Wed, 18 Feb 2009 05:44:51 -0500 In-Reply-To: <1234950262-4142-4-git-send-email-sheng@linux.intel.com> Sender: kvm-owner@vger.kernel.org List-ID: Sheng Yang wrote: > This patch finally enable MSI-X. > > What we need for MSI-X: > 1. Intercept one page in MMIO region of device. So that we can get guest desired > MSI-X table and set up the real one. Now this have been done by guest, and > transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY. > > 2. Information for incoming interrupt. Now one device can have more than one > interrupt, and they are all handled by one workqueue structure. So we need to > identify them. The previous patch enable gsi_msg_pending_bitmap get this done. > > 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X > message address/data. We used same entry number for the host and guest here, so > that it's easy to find the correlated guest gsi. > > What we lack for now: > 1. The PCI spec said nothing can existed with MSI-X table in the same page of > MMIO region, except pending bits. The patch ignore pending bits as the first > step (so they are always 0 - no pending). > > 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS > can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch > didn't support this, and Linux also don't work in this way. > > 3. The patch didn't implement MSI-X mask all and mask single entry. I would > implement the former in driver/pci/msi.c later. And for single entry, userspace > should have reposibility to handle it. > > Signed-off-by: Sheng Yang > --- > include/linux/kvm.h | 8 ++++ > virt/kvm/kvm_main.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++--- > 2 files changed, 109 insertions(+), 6 deletions(-) > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index a2dfbe0..78480d0 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -440,6 +440,9 @@ struct kvm_irq_routing { > }; > > #endif > +#if defined(CONFIG_X86) > +#define KVM_CAP_DEVICE_MSIX 26 > +#endif > We switched to a different way of depending on CONFIG_X86, see the other KVM_CAP defines. > > struct kvm_assigned_msix_nr { > __u32 assigned_dev_id; > __u16 entry_nr; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4010802..d3acb37 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -280,13 +280,33 @@ static void kvm_free_assigned_irq(struct kvm *kvm, > * now, the kvm state is still legal for probably we also have to wait > * interrupt_work done. > */ > - disable_irq_nosync(assigned_dev->host_irq); > - cancel_work_sync(&assigned_dev->interrupt_work); > + if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) { > + int i; > + for (i = 0; i < assigned_dev->entries_nr; i++) > + disable_irq_nosync(assigned_dev-> > + host_msix_entries[i].vector); > + > + cancel_work_sync(&assigned_dev->interrupt_work); > + > + for (i = 0; i < assigned_dev->entries_nr; i++) > + free_irq(assigned_dev->host_msix_entries[i].vector, > + (void *)assigned_dev); > + > + assigned_dev->entries_nr = 0; > + kfree(assigned_dev->host_msix_entries); > + kfree(assigned_dev->guest_msix_entries); > + pci_disable_msix(assigned_dev->dev); > + } else { > + /* Deal with MSI and INTx */ > + disable_irq_nosync(assigned_dev->host_irq); > + cancel_work_sync(&assigned_dev->interrupt_work); > How about always have an array? That will also allow us to deal with INTx where x=B,C,D. Currently for MSI and INTx the array will hold just one active element. > > - free_irq(assigned_dev->host_irq, (void *)assigned_dev); > + free_irq(assigned_dev->host_irq, (void *)assigned_dev); > > - if (assigned_dev->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) > - pci_disable_msi(assigned_dev->dev); > + if (assigned_dev->irq_requested_type & > + KVM_ASSIGNED_DEV_HOST_MSI) > + pci_disable_msi(assigned_dev->dev); > + } > All those flags and bits are worrying me. Maybe each entry in the array can have an ops member, and disabling would work by calling ->ops->disable(). We can do that later. > > assigned_dev->irq_requested_type = 0; > } > @@ -415,6 +435,69 @@ static int assigned_device_update_msi(struct kvm *kvm, > adev->irq_requested_type |= KVM_ASSIGNED_DEV_HOST_MSI; > return 0; > } > + > +static int assigned_device_update_msix(struct kvm *kvm, > + struct kvm_assigned_dev_kernel *adev, > + struct kvm_assigned_irq *airq) > +{ > + /* TODO Deal with KVM_DEV_IRQ_ASSIGNED_MASK_MSIX */ > + int i, r; > + > + adev->ack_notifier.gsi = -1; > + > + if (irqchip_in_kernel(kvm)) { > + if (airq->flags & KVM_DEV_IRQ_ASSIGN_MASK_MSIX) { > + printk(KERN_WARNING > + "kvm: unsupported mask MSI-X, flags 0x%x!\n", > + airq->flags); > error, not printk > + return 0; > + } > + > + if (!(airq->flags & KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX)) { > + /* Guest disable MSI-X */ > + kvm_free_assigned_irq(kvm, adev); > + if (msi2intx) { > + pci_enable_msi(adev->dev); > + if (adev->dev->msi_enabled) > + return assigned_device_update_msi(kvm, > + adev, airq); > + } > + return assigned_device_update_intx(kvm, adev, airq); > + } > + > + /* host_msix_entries and guest_msix_entries should have been > + * initialized */ > + > + if (adev->entries_nr == 0) { > + printk(KERN_ERR > + "kvm: MSI-X entry not initialized!\n"); > + return -EFAULT; > -EFAULT is for accesses to unmapped user memory; drop the printk. > + } > + > + kvm_free_assigned_irq(kvm, adev); > + > + r = pci_enable_msix(adev->dev, adev->host_msix_entries, > + adev->entries_nr); > + if (r) { > + printk(KERN_ERR "Fail to enable MSI-X feature!\n"); > Drop the printk. > + return r; > + } > + > + for (i = 0; i < adev->entries_nr; i++) { > + r = request_irq((adev->host_msix_entries + i)->vector, > + kvm_assigned_dev_intr, 0, > + "kvm_assigned_msix_device", > + (void *)adev); > + if (r) > + return r; > + } > + } > + > + adev->irq_requested_type |= KVM_ASSIGNED_DEV_MSIX; > + > + return 0; > +} > + > #endif > > static int kvm_vm_ioctl_assign_irq(struct kvm *kvm, > @@ -461,12 +544,24 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm, > } > } > > - if ((match->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) && > + if (match->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) > + current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX; > + else if ((match->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) && > (match->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI)) > current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSI; > > changed_flags = assigned_irq->flags ^ current_flags; > > +#ifdef CONFIG_X86 > + if (changed_flags & KVM_DEV_IRQ_ASSIGN_MSIX_ACTION) { > + r = assigned_device_update_msix(kvm, match, assigned_irq); > + if (r) { > + printk(KERN_WARNING "kvm: failed to execute " > + "MSI-X action!\n"); > + goto out_release; > + } > + } else > +#endif > if ((changed_flags & KVM_DEV_IRQ_ASSIGN_MSI_ACTION) || > (msi2intx && match->dev->msi_enabled)) { > #ifdef CONFIG_X86 > -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.