qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] vfio-pci issues with multiple devices on the same root port
Date: Mon, 15 Dec 2014 16:22:44 +0100	[thread overview]
Message-ID: <548EFCC4.8040104@kamp.de> (raw)
In-Reply-To: <1418656297.1095.252.camel@bling.home>

On 15.12.2014 16:11, Alex Williamson wrote:
> On Sat, 2014-12-13 at 21:43 +0100, Peter Lieven wrote:
>> Am 12.12.2014 um 23:21 schrieb Alex Williamson:
>>> On Fri, 2014-12-12 at 22:38 +0100, Peter Lieven wrote:
>>>> Hi,
>>>>
>>>> we have a Cisco UCS infrastructure where we have fnic Fibre-Channel Adapters that we expose to guests. The UCS
>>>> infrastruture allows to create virtual HBAs that can be exposed to a host so its possible to have quite a lot of them.
>>>>
>>>> We ran into a strange issue when we started having more than one vServer with a FibreChannel Adapter passed
>>>> thru with vfio-pci.
>>>>
>>>> When a hypervisor shuts down it the kernel sees the following error:
>>>>
>>>>   pcieport 0000:00:07.0: AER: Uncorrected (Non-Fatal) error received: id=0038
>>>>   pcieport 0000:00:07.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0038(Receiver ID)
>>>>   pcieport 0000:00:07.0:   device [8086:340e] error status/mask=00200000/00100000
>>>>   pcieport 0000:00:07.0:    [21] Unknown Error Bit (First)
>>>>   pcieport 0000:00:07.0: broadcast error_detected message
>>>>   pcieport 0000:00:07.0: AER: Device recovery failed
>>>>
>>>> Bit 21 seems to be ACS Violation. And 0000:00:07.0 is the PCIE Root Port on that System.
>>>>
>>>> This wouldn't be a big problem, altough I would like to find out what the ACS Violation causes.
>>>>
>>>> The real problem is that all other vfio-pci cards on that root port get notified of this error and the connected vServers are suspended
>>>> with RUN_STATE_INTERNAL_ERROR.
>>>>
>>>> Any ideas to work around this other than hacking qemu to not register an error handler or modifying vfio_err_notifier_handler
>>>> to not suspend the vServer?
>>> You could set bit 21 in the AER uncorrected error mask register to avoid
>>> the root port signaling the error.  Is bit 21 already clear in the
>>> severity register to make this non-fatal?
>>>
>>>> Is it correct that all children of a root port are notified? Should qemu distinguish between fatal and non-fatal errors when
>>>> suspending a vServer?
>>> Yes, each child is notified.  QEMU only gets an eventfd signal, which is
>>> supposed to occur only for fatal errors.  I don't quite understand why
>>> this apparently non-fatal error is getting through.  The kernel-side
>>> VFIO code is where filtering of fatal vs non-fatal should occur.
>> Had a look at vfio-pci.c from master. I can't see where there is a filtering of fatal vs. non-fatal
> I'm under the impression that fatal vs non-fatal would be determined
> somewhere in the PCI layers and the driver would only be notified for
> uncorrected/fatal.  Are we missing that filtering?  Thanks,

As far as I am understand vfio_pci_aer_err_detected in drivers/vfio/pci/vfio_pci.c
is called to recover potential recoverable errors and the driver decides if the
error was recoverable by the return code.

Peter

      reply	other threads:[~2014-12-15 15:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-12 21:38 [Qemu-devel] vfio-pci issues with multiple devices on the same root port Peter Lieven
2014-12-12 22:21 ` Alex Williamson
2014-12-13 20:36   ` Peter Lieven
2014-12-15 15:08     ` Alex Williamson
2014-12-13 20:43   ` Peter Lieven
2014-12-15 15:11     ` Alex Williamson
2014-12-15 15:22       ` Peter Lieven [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548EFCC4.8040104@kamp.de \
    --to=pl@kamp.de \
    --cc=alex.williamson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).