All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: qemu-devel@nongnu.org, alex.williamson@redhat.com,
	izumi.taku@jp.fujitsu.com,
	Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
	Dou Liyang <douly.fnst@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH RFC v11 3/4] vfio-pci: pass the aer error to guest
Date: Tue, 10 Jan 2017 00:55:00 +0200	[thread overview]
Message-ID: <20170110005437-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1483175588-17006-4-git-send-email-caoj.fnst@cn.fujitsu.com>

On Sat, Dec 31, 2016 at 05:13:07PM +0800, Cao jin wrote:
> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> 
> When physical device has uncorrectable error hanppened, the vfio_pci
> driver will signal the uncorrectable error status register value to
> corresponding QEMU's vfio-pci device via the eventfd registered by this
> device, then, the vfio-pci's error eventfd handler will be invoked in
> event loop.
> 
> Construct and pass the aer message to root port, root port will trigger an
> interrupt to signal guest, then, the guest driver will do the recovery.
> 
> Note: Now only support non-fatal error's recovery, fatal error will
> still result in vm stop.
> 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 76a8ac3..9861f72 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>  static void vfio_err_notifier_handler(void *opaque)
>  {
>      VFIOPCIDevice *vdev = opaque;
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIEAERMsg msg = {
> +        .severity = 0,
> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
> +    };
> +    int len;
> +    uint64_t uncor_status;
> +
> +    /* Read uncorrectable error status from driver */
> +    len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status));
> +    if (len != sizeof(uncor_status)) {
> +        error_report("vfio-pci: uncor error status reading returns"
> +                     " invalid number of bytes: %d", len);
> +        return; //Or goto stop?

I would definitely suggest this to make sure we don't regress.

> +    }
> +
> +    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
> +        goto stop;
> +    }
> +
> +    /* Populate the aer msg and send it to root port */
> +    if (dev->exp.aer_cap) {
> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> +        bool isfatal = uncor_status &
> +                       pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
> +
> +	if (isfatal) {
> +	    goto stop;
> +	}
> +
> +        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> +                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
>  
> -    if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> +        error_report("vfio-pci device %d sending AER to root port. uncor"
> +                     " status = 0x%"PRIx64, dev->devfn, uncor_status);
> +        pcie_aer_msg(dev, &msg);
>          return;
>      }
>  
> +stop:
>      /*
> -     * TBD. Retrieve the error details and decide what action
> -     * needs to be taken. One of the actions could be to pass
> -     * the error to the guest and have the guest driver recover
> -     * from the error. This requires that PCIe capabilities be
> -     * exposed to the guest. For now, we just terminate the
> -     * guest to contain the error.
> +     * Terminate the guest in case of
> +     * 1. AER capability is not exposed to guest.
> +     * 2. AER capability is exposed, but error is fatal, only non-fatal
> +     * error is handled now.
>       */
>  
> -    error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);
> +    error_report("%s(%s) fatal error detected. Please collect any data"
> +            " possible and then kill the guest", __func__, vdev->vbasedev.name);
>  
>      vm_stop(RUN_STATE_INTERNAL_ERROR);
>  }
> -- 
> 1.8.3.1
> 
> 

  reply	other threads:[~2017-01-09 22:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-31  9:13 [Qemu-devel] [PATCH RFC v11 0/4] vfio-pci: pass non-fatal error to guest Cao jin
2016-12-31  9:13 ` [Qemu-devel] [PATCH v11 1/4] pcie_aer: support configurable AER capa version Cao jin
2016-12-31  9:13 ` [Qemu-devel] [PATCH RFC v11 2/4] vfio: new function to init aer cap for vfio device Cao jin
2017-01-18 22:09   ` Alex Williamson
2017-01-20  6:03     ` Cao jin
2017-01-20 18:12       ` Alex Williamson
2016-12-31  9:13 ` [Qemu-devel] [PATCH RFC v11 3/4] vfio-pci: pass the aer error to guest Cao jin
2017-01-09 22:55   ` Michael S. Tsirkin [this message]
2017-01-18 22:31   ` Alex Williamson
2017-01-20  6:50     ` Cao jin
2017-01-20  6:57     ` Tian, Kevin
2017-01-20 18:21       ` Alex Williamson
2017-01-22  4:43         ` Tian, Kevin
2016-12-31  9:13 ` [Qemu-devel] [PATCH RFC v11 4/4] vfio: add 'aer' property to expose aercap Cao jin
2017-01-18 22:36   ` Alex Williamson
2017-01-20  6:04     ` Cao jin
2017-01-20 18:01       ` Alex Williamson
2016-12-31  9:17 ` [Qemu-devel] [PATCH RFC v11 0/4] vfio-pci: pass non-fatal error to guest no-reply
2017-01-18 21:43 ` Alex Williamson
2017-01-19  6:15   ` Cao jin
2017-01-19  6:25   ` Cao jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170110005437-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=douly.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.