Re: [Qemu-devel] [PATCH] pci-assign: Enable MSIX on device to match guest

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: jan.kiszka@siemens.com, qemu-devel@nongnu.org, qemu-stable@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] pci-assign: Enable MSIX on device to match guest
Date: Sun, 6 Jan 2013 15:23:13 +0200	[thread overview]
Message-ID: <20130106132313.GD18612@redhat.com> (raw)
In-Reply-To: <1357141782.3224.645.camel@bling.home>

On Wed, Jan 02, 2013 at 08:49:42AM -0700, Alex Williamson wrote:
> On Fri, 2012-12-21 at 08:46 -0700, Alex Williamson wrote:
> > On Fri, 2012-12-21 at 14:17 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2012 at 03:15:38PM -0700, Alex Williamson wrote:
> > > > On Thu, 2012-12-20 at 18:38 +0200, Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 20, 2012 at 09:05:50AM -0700, Alex Williamson wrote:
> > > > > > When a guest enables MSIX on a device we evaluate the MSIX vector
> > > > > > table, typically find no unmasked vectors and don't switch the device
> > > > > > to MSIX mode.  This generally works fine and the device will be
> > > > > > switched once the guest enables and therefore unmasks a vector.
> > > > > > Unfortunately some drivers enable MSIX, then use interfaces to send
> > > > > > commands between VF & PF or PF & firmware that act based on the host
> > > > > > state of the device.  These therefore break when MSIX is managed
> > > > > > lazily.  This change re-enables the previous test used to enable MSIX
> > > > > > (see qemu-kvm a6b402c9), which basically guesses whether a vector
> > > > > > will be used based on the data field of the vector table.
> > > > > > 
> > > > > > Cc: qemu-stable@nongnu.org
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > 
> > > > > Same question: can't we enable and mask MSIX through config sysfs?
> > > > > In this case it can be done in userspace ...
> > > > 
> > > > In this case userspace could do this, but I think it's still incredibly
> > > > dangerous.  Kernel space drivers can also directly enable MSI-X on a
> > > > device, but you might get shot for writing one that did.
> > > 
> > > What would be the reason for the kernel driver to do this?
> > 
> > Maybe they don't know how many vectors to use until they enable MSI-X
> > and query some firmware interface.  It's a hypothetical situation, I'm
> > just trying to illustrate that if a kernel driver did want to do this,
> > they'd have to develop interfaces to allow it, not just manually poke
> > their MSI-X enable bit.
> > 
> > > >  We should
> > > > follow the rules, play be the existing kernel interfaces, and work to
> > > > eventually improve those interfaces.  Thanks,
> > > > 
> > > > Alex
> > > 
> > > I'm not against adding an interface for this long term but we have
> > > existing kernels to support too.  IMHO it would be nicer than
> > > the data hack which relies on non-documented guest behaviour
> > > that might change without warning in the future.
> > 
> > We've unwittingly used the data hack for years and only ripped it out
> > because it was undocumented.  The patch below adds documentation for it,
> > so at least we have a more clear understanding of why it was there if we
> > want to try to rip it out again.  This fully supports existing kernels
> > and as I mention below, we might be able to do better with limiting how
> > many vectors we enabled, but I think this is the right initial fix and
> > right fix for stable and we can continue to experiment from here.
> 
> Happy new year.  I'd like to close on this as we do currently have a
> regression for devices that cannot handle MSI-X being lazily enabled.
> The option here is to document and revert to the old style
> initialization behavior where we look at the data field of the vector to
> get a hint whether the guest intends to make use of the vector.  This
> gives us the same behavior as we had previously, but still allows
> vectors to be added, so we maintain the current FreeBSD support.  This
> much needs to go to stable.

By the way, could you clarify what exactly happens with FreeBSD?


> For the development tree, I think we can do better.  Using the data
> field is not 100% reliable in giving us the number of vectors the guest
> actually intends to use.  Instead we'd like to enable MSI-X with no
> vectors and add vectors as the guest unmasks them.  The host Linux MSI
> API currently doesn't allow this, so I think the next best thing is to
> enable MSI-X with a single vector in the case where MSI-X is enabled but
> no vectors are unmasked.  This conserves vectors on the host though we
> do potentially allow spurious interrupts through the enabled vector
> (though we previously enabled multiple vectors using the above data
> method without problems).

Hmm I think this can lose interrupts if they are sent before
there is a handler. The data hack only sends interrupts when
there's a handler so it's safe.

> The alternative that you're proposing to this longer term solution is to
> manually mask all vectors in the physical MSI-X vector table from
> userspace

This masking is not necessary I think: all vectors are masked
after reset IIRC.

> then manually enable MSI-X on the physical device (through
> pci-sysfs resource and config access respectively).  This puts the
> physical device is a state that better matches the guest view of the
> devices, but I'm doubtful that the risk is worth the reward.  This adds
> a new state to the qemu MSI-X model where we have entirely host kernel
> managed physical IRQ state, except for this.  It also creates a
> synchronization problem that the physical device moves to a new
> interrupt state outside of the control of the host kernel, possibly
> bypassing any quirks for the host platform.

Hmm as long as all vectors are masked, MSIX is a per device state so I
don't see why any quirks would be needed.

> Another option is to modify the host MSI API to allow the interface we
> want, splitting enabling MSI-X from vector allocation.  That of course
> has a much longer lead time. 

It's a requirement to make it really robust though, isn't it?

> We can certainly continue the discussion on this, but we need a fix for
> stable and I don't think either of these longer term methods are known
> to have the reliability or simplicity of reverting to the previous
> initialization criteria.  Please ack if you agree.  Thanks,
> 
> Alex

I agree with your approach to stable, so ACK.

For long term, I think we should start with fixing up
things in the kernel. Then for old kernels, qemu can do
the userspace hack (or some other hack if we find a better one).

> > > > > > ---
> > > > > >  hw/kvm/pci-assign.c |   17 +++++++++++++++--
> > > > > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > I think we might be able to do a little better than this, but I think
> > > > > > this is the right fix for stable and we can build on it to perhaps only
> > > > > > enable a single vector.
> > > > > > 
> > > > > > diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
> > > > > > index e80dad0..12a219b 100644
> > > > > > --- a/hw/kvm/pci-assign.c
> > > > > > +++ b/hw/kvm/pci-assign.c
> > > > > > @@ -1025,6 +1025,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
> > > > > >      return (entry->ctrl & cpu_to_le32(0x1)) != 0;
> > > > > >  }
> > > > > >  
> > > > > > +/*
> > > > > > + * When MSI-X is first enabled the vector table typically has all the
> > > > > > + * vectors masked, so we can't use that as the obvious test to figure out
> > > > > > + * how many vectors to initially enable.  Instead we look at the data field
> > > > > > + * because this is what worked for pci-assign for a long time.  This makes
> > > > > > + * sure the physical MSI-X state tracks the guest's view, which is important
> > > > > > + * for some VF/PF and PF/fw communication channels.
> > > > > > + */
> > > > > > +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry)
> > > > > > +{
> > > > > > +    return !entry->data;
> > > > > > +}
> > > > > > +
> > > > > >  static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >  {
> > > > > >      AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
> > > > > > @@ -1035,7 +1048,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >  
> > > > > >      /* Get the usable entry number for allocating */
> > > > > >      for (i = 0; i < adev->msix_max; i++, entry++) {
> > > > > > -        if (assigned_dev_msix_masked(entry)) {
> > > > > > +        if (assigned_dev_msix_skipped(entry)) {
> > > > > >              continue;
> > > > > >          }
> > > > > >          entries_nr++;
> > > > > > @@ -1064,7 +1077,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > > > > >      for (i = 0; i < adev->msix_max; i++, entry++) {
> > > > > >          adev->msi_virq[i] = -1;
> > > > > >  
> > > > > > -        if (assigned_dev_msix_masked(entry)) {
> > > > > > +        if (assigned_dev_msix_skipped(entry)) {
> > > > > >              continue;
> > > > > >          }
> > > > > >  
> > > > 
> > > > 
> > 
> > 
> 
>

next prev parent reply	other threads:[~2013-01-06 13:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-20 16:05 [Qemu-devel] [PATCH] pci-assign: Enable MSIX on device to match guest Alex Williamson
2012-12-20 16:38 ` Michael S. Tsirkin
2012-12-20 22:15   ` Alex Williamson
2012-12-21 12:17     ` Michael S. Tsirkin
2012-12-21 15:46       ` Alex Williamson
2013-01-02 15:49         ` Alex Williamson
2013-01-06 13:23           ` Michael S. Tsirkin [this message]
2013-01-06 15:57             ` Alex Williamson
2013-01-07 15:02               ` Michael S. Tsirkin
2013-01-07 15:09                 ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130106132313.GD18612@redhat.com \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).