public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sheng Yang <sheng@linux.intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
Date: Tue, 12 Oct 2010 14:49:58 +0800	[thread overview]
Message-ID: <201010121449.58332.sheng@linux.intel.com> (raw)
In-Reply-To: <20101011100100.GA26294@redhat.com>

On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > > 
> > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > 
> > > I think I see an issue here, noted below.  Some general comments:
> > > - The mask bit seems broken for injecting interrupts from
> > > 
> > >   userspace (with interrupts/ioctls).
> > >   I think we must test it on injection path.
> > 
> > I am not quite understand how it related to userspace interrupt injection
> > here... This patch only cover assigned devices for now.
> 
> Well, this is a kernel/userspace interface, if it's broken for userspace
> injection now we'll have to go through pain to fix it in a compatible
> way later when we want to use it for userspace injection.
> You might want to ask why we want the kernel to support making
> userspace-injected interrupts when userspace can just avoid injecting
> them, and the answer would be that with irqfd the injection might be
> handled in a separate process.

OK, I've understood how it related to userspace interrupt injection. But I still 
can't see why the interface is broken...
> 
> We currently handle this by destroying irqfd when irq is masked,
> an ioctl instead would be much faster.
> 
> > > - We'll need a way to support the pending bit.
> > > 
> > >   Disabling interrupts will not let us do it.
> > 
> > We may need a way to support pending bit, though I don't know which guest
> > has used it... And we can still know if there is interrupt pending by
> > check the real hardware's pending bit if it's necessary.
> 
> That's what I'm saying: since instead of masking the vector in hardware
> you disable irq in the APIC, the pending bit that we read from hardware
> will not have the correct value.

Are you sure? This disable_irq() has nothing to do with APIC. The disable callback 
in msi_chip didn't do anything but mark the IRQ status as IRQ_DISABLED, and the 
follow interrupt(if there are any) would be acked and masked, using mask callback 
in msi_chip. 

irq_to_desc() need to be exported for my initial version, in order to use mask 
callback. But later I think it would be clear and better if we use general IRQ 
function to do it. And I don't think the current solution would prevent us from 
reading hardware pending bits.

--
regards
Yang, Sheng

> 
> If we fix this, pending bit handling can be done by userspace.
> 
> > (And we haven't seen any problem by
> > leaving the bit 0 so far, and it's not in this patch's scope.)
> 
> I don't know about anyone using this, either, but the PCI spec does
> require support of polling mode where the pending bit is polled instead
> of interrupts. So yes, not a high priority to implement, but let's give
> the way we intend to support this in the future some thought.
> 
> > > > ---
> > > > 
> > > >  arch/x86/kvm/x86.c       |    1 +
> > > >  include/linux/kvm.h      |    9 ++++++++-
> > > >  include/linux/kvm_host.h |    1 +
> > > >  virt/kvm/assigned-dev.c  |   39
> > 
> > +++++++++++++++++++++++++++++++++++++++
> > 
> > > >  4 files changed, 49 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 8412c91..e6933e6 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > 
> > > >  	case KVM_CAP_DEBUGREGS:
> > > >  	case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > > 
> > > >  	case KVM_CAP_XSAVE:
> > > > +	case KVM_CAP_DEVICE_MSIX_MASK:
> > > >  		r = 1;
> > > >  		break;
> > > >  	
> > > >  	case KVM_CAP_COALESCED_MMIO:
> > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > index 919ae53..f2b7cdc 100644
> > > > --- a/include/linux/kvm.h
> > > > +++ b/include/linux/kvm.h
> > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > > 
> > > >  #endif
> > > >  #define KVM_CAP_PPC_GET_PVINFO 57
> > > >  #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > > 
> > > > +#ifdef __KVM_HAVE_MSIX
> > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > > +#endif
> > > > 
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > 
> > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > > 
> > > >  };
> > > >  
> > > >  #define KVM_MAX_MSIX_PER_DEV		256
> > > > 
> > > > +
> > > > +#define KVM_MSIX_FLAG_MASK	1
> > > > +
> > > > 
> > > >  struct kvm_assigned_msix_entry {
> > > >  
> > > >  	__u32 assigned_dev_id;
> > > >  	__u32 gsi;
> > > >  	__u16 entry; /* The index of entry in the MSI-X table */
> > > > 
> > > > -	__u16 padding[3];
> > > > +	__u16 flags;
> > > > +	__u16 padding[2];
> > > > 
> > > >  };
> > > >  
> > > >  #endif /* __LINUX_KVM_H */
> > > > 
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index 0b89d00..a43405c 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > > 
> > > >  };
> > > >  
> > > >  #define KVM_ASSIGNED_MSIX_PENDING		0x1
> > > > 
> > > > +#define KVM_ASSIGNED_MSIX_MASK			0x2
> > > > 
> > > >  struct kvm_guest_msix_entry {
> > > >  
> > > >  	u32 vector;
> > > >  	u16 entry;
> > > > 
> > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > > index 7c98928..15b8c32 100644
> > > > --- a/virt/kvm/assigned-dev.c
> > > > +++ b/virt/kvm/assigned-dev.c
> > > > @@ -17,6 +17,8 @@
> > > > 
> > > >  #include <linux/pci.h>
> > > >  #include <linux/interrupt.h>
> > > >  #include <linux/slab.h>
> > > > 
> > > > +#include <linux/irqnr.h>
> > > > +
> > > > 
> > > >  #include "irq.h"
> > > >  
> > > >  static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > > >  list_head *head,
> > > > 
> > > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > >  	return r;
> > > >  
> > > >  }
> > > > 
> > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > > *assigned_dev, +			     int index)
> > > > +{
> > > > +	int irq;
> > > > +
> > > > +	if (!assigned_dev->dev->msix_enabled ||
> > > > +	    !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > > +		return;
> > > > +
> > > > +	irq = assigned_dev->host_msix_entries[index].vector;
> > > > +
> > > > +	ASSERT(irq != 0);
> > > > +
> > > > +	if (assigned_dev->guest_msix_entries[index].flags &
> > > > +			KVM_ASSIGNED_MSIX_MASK)
> > > > +		disable_irq(irq);
> > > > +	else {
> > > > +		enable_irq(irq);
> > > > +		if (assigned_dev->guest_msix_entries[index].flags &
> > > > +				KVM_ASSIGNED_MSIX_PENDING)
> > > > +			schedule_work(&assigned_dev->interrupt_work);
> > > > +	}
> > > > +}
> > > > +
> > > 
> > > What happens if guest masks an entry and then we hot-unplug the
> > > device and remove it from guest? It looks like interrupt
> > > will stay disabled?
> > 
> > I don't think so. pci_disable_msix() which was called in hot-unplug path
> > would recycle all IRQs used by the device. It should be the same as VM
> > shutdown.
> > 
> > Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't
> > be used by other devices.
> > 
> > --
> > regards
> > Yang, Sheng
> > 
> > > >  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > >  
> > > >  				       struct kvm_assigned_msix_entry *entry)
> > > >  
> > > >  {
> > > > 
> > > > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct
> > > > kvm *kvm,
> > > > 
> > > >  			adev->guest_msix_entries[i].entry = entry->entry;
> > > >  			adev->guest_msix_entries[i].vector = entry->gsi;
> > > >  			adev->host_msix_entries[i].entry = entry->entry;
> > > > 
> > > > +			if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > +					!(adev->guest_msix_entries[i].flags &
> > > > +					KVM_ASSIGNED_MSIX_MASK)) {
> > > > +				adev->guest_msix_entries[i].flags |=
> > > > +					KVM_ASSIGNED_MSIX_MASK;
> > > > +				update_msix_mask(adev, i);
> > > > +			} else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > +					(adev->guest_msix_entries[i].flags &
> > > > +					KVM_ASSIGNED_MSIX_MASK)) {
> > > > +				adev->guest_msix_entries[i].flags &=
> > > > +					~KVM_ASSIGNED_MSIX_MASK;
> > > > +				update_msix_mask(adev, i);
> > > > +			}
> > > > 
> > > >  			break;
> > > >  		
> > > >  		}
> > > >  	
> > > >  	if (i == adev->entries_nr) {

  reply	other threads:[~2010-10-12  6:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-28  9:44 [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Sheng Yang
2010-09-28  9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
2010-09-28 13:36   ` Michael S. Tsirkin
2010-09-29  1:17     ` Sheng Yang
2010-09-30 16:25       ` Marcelo Tosatti
2010-09-29  8:36   ` Avi Kivity
2010-09-30  5:41     ` Sheng Yang
2010-10-03 13:03       ` Avi Kivity
2010-10-11  9:32         ` Sheng Yang
2010-09-30 16:18   ` Marcelo Tosatti
2010-10-11  9:49     ` Sheng Yang
2010-10-11 17:11       ` Marcelo Tosatti
2010-10-03 11:12   ` Michael S. Tsirkin
2010-10-11  9:28     ` Sheng Yang
2010-10-11 10:01       ` Michael S. Tsirkin
2010-10-12  6:49         ` Sheng Yang [this message]
2010-10-12 18:28           ` Michael S. Tsirkin
2010-10-13  1:03             ` Sheng Yang
2010-10-13 10:35               ` Michael S. Tsirkin
2010-10-12 18:30           ` Michael S. Tsirkin
2010-10-13  0:58             ` Sheng Yang
2010-10-13 12:03               ` Michael S. Tsirkin
2010-09-28  9:44 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up for MSI-X code Sheng Yang
2010-09-28  9:44 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
2010-09-30 16:44 ` [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Marcelo Tosatti
2010-10-11  9:34   ` Sheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201010121449.58332.sheng@linux.intel.com \
    --to=sheng@linux.intel.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox