From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org, izumi.taku@jp.fujitsu.com, mst@redhat.com,
Dou Liyang <douly.fnst@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER
Date: Tue, 28 Mar 2017 21:49:17 +0800 [thread overview]
Message-ID: <58DA69DD.2060203@cn.fujitsu.com> (raw)
In-Reply-To: <20170324161224.270f334e@t450s.home>
On 03/25/2017 06:12 AM, Alex Williamson wrote:
> On Thu, 23 Mar 2017 17:09:23 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>
>> Make use of the non fatal error eventfd that the kernel module provide
>> to process the AER non fatal error. Fatal error still goes into the
>> legacy way which results in VM stop.
>>
>> Register the handler, wait for notification. Construct aer message and
>> pass it to root port on notification. Root port will trigger an interrupt
>> to signal guest, then guest driver will do the recovery.
>
> Can we guarantee this is the better solution in all cases or could
> there be guests without AER support where the VM stop is the better
> solution?
>
Currently, we only have VM stop on errors, that looks the same as a
sudden power down to me. With this solution, we have about
50%(non-fatal) chance to reduce the sudden power-down risk.
What if a guest doesn't support AER? It looks the same as a host
without AER support. Now I only can speculate the worst condition: guest
crash, would that be quite different from a sudden power-down?
>>
>> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>> hw/vfio/pci.c | 202 +++++++++++++++++++++++++++++++++++++++++++++
>> hw/vfio/pci.h | 2 +
>> linux-headers/linux/vfio.h | 2 +
>> 3 files changed, 206 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 3d0d005..c6786d5 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2432,6 +2432,200 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>> vfio_put_base_device(&vdev->vbasedev);
>> }
>>
>> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
>> +{
>> + VFIOPCIDevice *vdev = opaque;
>> + PCIDevice *dev = &vdev->pdev;
>> + PCIEAERMsg msg = {
>> + .severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
>> + .source_id = pci_requester_id(dev),
>> + };
>> +
>> + if (!event_notifier_test_and_clear(&vdev->non_fatal_err_notifier)) {
>> + return;
>> + }
>> +
>> + /* Populate the aer msg and send it to root port */
>> + if (dev->exp.aer_cap) {
>
> Why would we have registered this notifier otherwise?
>
>> + uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>> + uint32_t uncor_status;
>> + bool isfatal;
>> +
>> + uncor_status = vfio_pci_read_config(dev,
>> + dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
>> + if (!uncor_status) {
>> + return;
>> + }
>> +
>> + isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
>> + if (isfatal) {
>> + goto stop;
>> + }
>
> Huh? How can we get a non-fatal error notice for a fatal error? (and
> why are we saving this to a variable rather than testing it within the
> 'if' condition?
>
Both of these are for the unsure corner cases.
Is it possible that register reading shows a fatal error?
Saving it into a variable just is personal taste: more neat.
>> +
>> + error_report("%s sending non fatal event to root port. uncor status = "
>> + "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
>> + pcie_aer_msg(dev, &msg);
>> + return;
>> + }
>> +
>> +stop:
>> + /* Terminate the guest in case of fatal error */
>> + error_report("%s: Device detected a fatal error. VM stopped",
>> + vdev->vbasedev.name);
>> + vm_stop(RUN_STATE_INTERNAL_ERROR);
>
> Shouldn't we use the existing error index if we can't make use of
> correctable errors?
>
Why? If register reading shows it is actually a fatal error, is it the
same as fatal error handler is notified? what we use the existing error
index for?
>> @@ -2860,6 +3054,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>> }
>> }
>>
>> + vfio_register_passive_reset_notifier(vdev);
>> + vfio_register_non_fatal_err_notifier(vdev);
>
> I think it's wrong that we configure these unconditionally. Why do we
> care about these unless we're configuring the guest to receive AER
> events?
>
But do we have ways to know whether the guest has AER support? For now,
I don't think so.
If guest don't have AER support, for the worst condition: guest crash,
it is not worse than a sudden power-down.
--
Sincerely,
Cao jin
next prev parent reply other threads:[~2017-03-28 13:40 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-23 9:09 [Qemu-devel] [PATCH v3 0/3] vfio-pci: support recovery of AER non fatal error Cao jin
2017-03-23 9:09 ` [Qemu-devel] [PATCH v3 1/3] pcie aer: verify if AER functionality is available Cao jin
2017-03-24 22:12 ` Alex Williamson
2017-03-28 13:47 ` Cao jin
2017-03-28 16:12 ` Alex Williamson
2017-03-28 16:16 ` Michael S. Tsirkin
2017-03-23 9:09 ` [Qemu-devel] [PATCH v3 2/3] vfio pci: new function to init AER capability Cao jin
2017-03-24 22:12 ` Alex Williamson
2017-03-28 13:47 ` Cao jin
2017-03-23 9:09 ` [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER Cao jin
2017-03-24 22:12 ` Alex Williamson
2017-03-28 13:49 ` Cao jin [this message]
2017-03-28 16:12 ` Alex Williamson
2017-03-28 23:59 ` Michael S. Tsirkin
2017-03-29 2:55 ` Alex Williamson
2017-04-25 20:32 ` Michael S. Tsirkin
2017-04-26 0:06 ` Alex Williamson
2017-03-24 22:12 ` [Qemu-devel] [PATCH v3 0/3] vfio-pci: support recovery of AER non fatal error Alex Williamson
2017-03-28 13:47 ` Cao jin
2017-03-28 16:12 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58DA69DD.2060203@cn.fujitsu.com \
--to=caoj.fnst@cn.fujitsu.com \
--cc=alex.williamson@redhat.com \
--cc=douly.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).