From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>,
Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org, mst@redhat.com
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Tue, 26 Apr 2016 11:39:02 +0800 [thread overview]
Message-ID: <571EE2D6.4000100@cn.fujitsu.com> (raw)
In-Reply-To: <570EEC42.3040300@cn.fujitsu.com>
On 04/14/2016 09:02 AM, Chen Fan wrote:
>
> On 04/12/2016 05:38 AM, Alex Williamson wrote:
>> On Tue, 5 Apr 2016 19:42:02 +0800
>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>
>>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>
>>> for supporting aer recovery, host and guest would run the same aer
>>> recovery code, that would do the secondary bus reset if the error
>>> is fatal, the aer recovery process:
>>> 1. error_detected
>>> 2. reset_link (if fatal)
>>> 3. slot_reset/mmio_enabled
>>> 4. resume
>>>
>>> it indicates that host will do secondary bus reset to reset
>>> the physical devices under bus in step 2, that would cause
>>> devices in D3 status in a short time. but in qemu, we register
>>> an error detected handler, that would be invoked as host broadcasts
>>> the error-detected event in step 1, in order to avoid guest do
>>> reset_link when host do reset_link simultaneously. it may cause
>>> fatal error. we introduce a resmue notifier to assure host reset
>>> completely. then do guest aer injection.
>> Why is it safe to continue running the VM between the error detected
>> notification and the resume notification? We're just pushing back the
>> point at which we inject the AER into the guest, potentially negating
>> any benefit by allowing the VM to consume bad data. Shouldn't we
>> instead be immediately notifying the VM on error detected, but stalling
>> any access to the device until resume is signaled? How do we know that
>> resume will ever be signaled? We have both the problem that we may be
>> running on an older kernel that won't support a resume notification and
>> the problem that seeing a resume notification depends on the host being
>> able to successfully complete a link reset after fatal error. We can
>> detect support for resume notification, but we still need a strategy
>> for never receiving it. Thanks,
> That's make sense, but I haven't came up with a good idea. do you have
> any idea, Alex?
>
ping...
> Thanks,
> Chen
>
>
>>
>> Alex
>>
>>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>> ---
>>> hw/vfio/pci.c | 157
>>> +++++++++++++++++++++++++++++++++++----------
>>> hw/vfio/pci.h | 2 +
>>> linux-headers/linux/vfio.h | 1 +
>>> 3 files changed, 126 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 691ff5e..d79fb3d 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2610,12 +2610,7 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>> static void vfio_err_notifier_handler(void *opaque)
>>> {
>>> VFIOPCIDevice *vdev = opaque;
>>> - PCIDevice *dev = &vdev->pdev;
>>> Error *local_err = NULL;
>>> - PCIEAERMsg msg = {
>>> - .severity = 0,
>>> - .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>>> - };
>>> if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
>>> return;
>>> @@ -2636,35 +2631,7 @@ static void vfio_err_notifier_handler(void
>>> *opaque)
>>> goto stop;
>>> }
>>> - /*
>>> - * we should read the error details from the real hardware
>>> - * configuration spaces, here we only need to do is signaling
>>> - * to guest an uncorrectable error has occurred.
>>> - */
>>> - if (dev->exp.aer_cap) {
>>> - uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>>> - uint32_t uncor_status;
>>> - bool isfatal;
>>> -
>>> - uncor_status = vfio_pci_read_config(dev,
>>> - dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS,
>>> 4);
>>> -
>>> - /*
>>> - * if the error is not emitted by this device, we can
>>> - * just ignore it.
>>> - */
>>> - if (!(uncor_status & ~0UL)) {
>>> - return;
>>> - }
>>> -
>>> - isfatal = uncor_status & pci_get_long(aer_cap +
>>> PCI_ERR_UNCOR_SEVER);
>>> -
>>> - msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>>> - PCI_ERR_ROOT_CMD_NONFATAL_EN;
>>> -
>>> - pcie_aer_msg(dev, &msg);
>>> - return;
>>> - }
>>> + return;
>>> stop:
>>> /*
>>> @@ -2757,6 +2724,126 @@ static void
>>> vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>>> event_notifier_cleanup(&vdev->err_notifier);
>>> }
>>> +static void vfio_resume_notifier_handler(void *opaque)
>>> +{
>>> + VFIOPCIDevice *vdev = opaque;
>>> + PCIDevice *dev = &vdev->pdev;
>>> + PCIEAERMsg msg = {
>>> + .severity = 0,
>>> + .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>>> + };
>>> +
>>> + if (!event_notifier_test_and_clear(&vdev->resume_notifier)) {
>>> + return;
>>> + }
>>> +
>>> + /*
>>> + * we should read the error details from the real hardware
>>> + * configuration spaces, here we only need to do is signaling
>>> + * to guest an uncorrectable error has occurred.
>>> + */
>>> + if (dev->exp.aer_cap) {
>>> + uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>>> + uint32_t uncor_status;
>>> + bool isfatal;
>>> +
>>> + uncor_status = vfio_pci_read_config(dev,
>>> + dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS,
>>> 4);
>>> +
>>> + /*
>>> + * if the error is not emitted by this device, we can
>>> + * just ignore it.
>>> + */
>>> + if (!(uncor_status & ~0UL)) {
>>> + return;
>>> + }
>>> +
>>> + isfatal = uncor_status & pci_get_long(aer_cap +
>>> PCI_ERR_UNCOR_SEVER);
>>> +
>>> + msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>>> + PCI_ERR_ROOT_CMD_NONFATAL_EN;
>>> +
>>> + pcie_aer_msg(dev, &msg);
>>> + }
>>> +}
>>> +
>>> +static void vfio_register_aer_resume_notifier(VFIOPCIDevice *vdev)
>>> +{
>>> + int ret;
>>> + int argsz;
>>> + struct vfio_irq_set *irq_set;
>>> + int32_t *pfd;
>>> +
>>> + if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
>>> + return;
>>> + }
>>> +
>>> + if (event_notifier_init(&vdev->resume_notifier, 0)) {
>>> + error_report("vfio: Unable to init event notifier for"
>>> + " resume notification");
>>> + return;
>>> + }
>>> +
>>> + argsz = sizeof(*irq_set) + sizeof(*pfd);
>>> +
>>> + irq_set = g_malloc0(argsz);
>>> + irq_set->argsz = argsz;
>>> + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
>>> + VFIO_IRQ_SET_ACTION_TRIGGER;
>>> + irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
>>> + irq_set->start = 0;
>>> + irq_set->count = 1;
>>> + pfd = (int32_t *)&irq_set->data;
>>> +
>>> + *pfd = event_notifier_get_fd(&vdev->resume_notifier);
>>> + qemu_set_fd_handler(*pfd, vfio_resume_notifier_handler, NULL,
>>> vdev);
>>> +
>>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>> + if (ret) {
>>> + error_report("vfio: Failed to set up resume notification");
>>> + qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
>>> + event_notifier_cleanup(&vdev->resume_notifier);
>>> + } else {
>>> + vdev->resume_enabled = true;
>>> + }
>>> + g_free(irq_set);
>>> +}
>>> +
>>> +static void vfio_unregister_aer_resume_notifier(VFIOPCIDevice *vdev)
>>> +{
>>> + int argsz;
>>> + struct vfio_irq_set *irq_set;
>>> + int32_t *pfd;
>>> + int ret;
>>> +
>>> + if (!vdev->resume_enabled) {
>>> + return;
>>> + }
>>> +
>>> + argsz = sizeof(*irq_set) + sizeof(*pfd);
>>> +
>>> + irq_set = g_malloc0(argsz);
>>> + irq_set->argsz = argsz;
>>> + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
>>> + VFIO_IRQ_SET_ACTION_TRIGGER;
>>> + irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
>>> + irq_set->start = 0;
>>> + irq_set->count = 1;
>>> + pfd = (int32_t *)&irq_set->data;
>>> + *pfd = -1;
>>> +
>>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>> + if (ret) {
>>> + error_report("vfio: Failed to de-assign error fd: %m");
>>> + }
>>> + g_free(irq_set);
>>> + qemu_set_fd_handler(event_notifier_get_fd(&vdev->resume_notifier),
>>> + NULL, NULL, vdev);
>>> + event_notifier_cleanup(&vdev->resume_notifier);
>>> +
>>> + vdev->resume_enabled = false;
>>> +}
>>> +
>>> static void vfio_req_notifier_handler(void *opaque)
>>> {
>>> VFIOPCIDevice *vdev = opaque;
>>> @@ -3062,6 +3149,7 @@ static int vfio_initfn(PCIDevice *pdev)
>>> }
>>> vfio_register_err_notifier(vdev);
>>> + vfio_register_aer_resume_notifier(vdev);
>>> vfio_register_req_notifier(vdev);
>>> vfio_setup_resetfn_quirk(vdev);
>>> @@ -3092,6 +3180,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>>> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>> vfio_unregister_req_notifier(vdev);
>>> + vfio_unregister_aer_resume_notifier(vdev);
>>> vfio_unregister_err_notifier(vdev);
>>> pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>> vfio_disable_interrupts(vdev);
>>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>>> index 9fb0206..3ebc58f 100644
>>> --- a/hw/vfio/pci.h
>>> +++ b/hw/vfio/pci.h
>>> @@ -119,6 +119,7 @@ typedef struct VFIOPCIDevice {
>>> VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
>>> PCIHostDeviceAddress host;
>>> EventNotifier err_notifier;
>>> + EventNotifier resume_notifier;
>>> EventNotifier req_notifier;
>>> int (*resetfn)(struct VFIOPCIDevice *);
>>> uint32_t vendor_id;
>>> @@ -144,6 +145,7 @@ typedef struct VFIOPCIDevice {
>>> bool no_kvm_msi;
>>> bool no_kvm_msix;
>>> bool single_depend_dev;
>>> + bool resume_enabled;
>>> } VFIOPCIDevice;
>>> uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr,
>>> int len);
>>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>>> index 15e096c..6d1826d 100644
>>> --- a/linux-headers/linux/vfio.h
>>> +++ b/linux-headers/linux/vfio.h
>>> @@ -345,6 +345,7 @@ enum {
>>> VFIO_PCI_MSIX_IRQ_INDEX,
>>> VFIO_PCI_ERR_IRQ_INDEX,
>>> VFIO_PCI_REQ_IRQ_INDEX,
>>> + VFIO_PCI_RESUME_IRQ_INDEX,
>>> VFIO_PCI_NUM_IRQS
>>> };
>>
>>
>> .
>>
>
>
>
>
> .
>
next prev parent reply other threads:[~2016-04-26 3:44 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38 ` Alex Williamson
2016-04-14 1:02 ` Chen Fan
2016-04-26 3:39 ` Chen Fan [this message]
2016-04-26 14:48 ` Alex Williamson
2016-05-06 1:38 ` Chen Fan
2016-05-06 16:39 ` Alex Williamson
2016-05-11 3:11 ` Zhou Jie
2016-05-11 20:20 ` Alex Williamson
2016-05-24 10:49 ` Michael S. Tsirkin
2016-05-25 1:08 ` Zhou Jie
2016-05-25 2:54 ` Alex Williamson
2016-05-25 8:45 ` Michael S. Tsirkin
2016-05-25 14:22 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=571EE2D6.4000100@cn.fujitsu.com \
--to=chen.fan.fnst@cn.fujitsu.com \
--cc=alex.williamson@redhat.com \
--cc=caoj.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).