qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org, mst@redhat.com
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Tue, 26 Apr 2016 11:39:02 +0800	[thread overview]
Message-ID: <571EE2D6.4000100@cn.fujitsu.com> (raw)
In-Reply-To: <570EEC42.3040300@cn.fujitsu.com>


On 04/14/2016 09:02 AM, Chen Fan wrote:
>
> On 04/12/2016 05:38 AM, Alex Williamson wrote:
>> On Tue, 5 Apr 2016 19:42:02 +0800
>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>
>>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>
>>> for supporting aer recovery, host and guest would run the same aer
>>> recovery code, that would do the secondary bus reset if the error
>>> is fatal, the aer recovery process:
>>>    1. error_detected
>>>    2. reset_link (if fatal)
>>>    3. slot_reset/mmio_enabled
>>>    4. resume
>>>
>>> it indicates that host will do secondary bus reset to reset
>>> the physical devices under bus in step 2, that would cause
>>> devices in D3 status in a short time. but in qemu, we register
>>> an error detected handler, that would be invoked as host broadcasts
>>> the error-detected event in step 1, in order to avoid guest do
>>> reset_link when host do reset_link simultaneously. it may cause
>>> fatal error. we introduce a resmue notifier to assure host reset
>>> completely. then do guest aer injection.
>> Why is it safe to continue running the VM between the error detected
>> notification and the resume notification?  We're just pushing back the
>> point at which we inject the AER into the guest, potentially negating
>> any benefit by allowing the VM to consume bad data.  Shouldn't we
>> instead be immediately notifying the VM on error detected, but stalling
>> any access to the device until resume is signaled?  How do we know that
>> resume will ever be signaled?  We have both the problem that we may be
>> running on an older kernel that won't support a resume notification and
>> the problem that seeing a resume notification depends on the host being
>> able to successfully complete a link reset after fatal error. We can
>> detect support for resume notification, but we still need a strategy
>> for never receiving it.  Thanks,
> That's make sense, but I haven't came up with a good idea. do you have
> any idea, Alex?
>
ping...

> Thanks,
> Chen
>
>
>>
>> Alex
>>
>>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>> ---
>>>   hw/vfio/pci.c              | 157 
>>> +++++++++++++++++++++++++++++++++++----------
>>>   hw/vfio/pci.h              |   2 +
>>>   linux-headers/linux/vfio.h |   1 +
>>>   3 files changed, 126 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 691ff5e..d79fb3d 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2610,12 +2610,7 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>>   static void vfio_err_notifier_handler(void *opaque)
>>>   {
>>>       VFIOPCIDevice *vdev = opaque;
>>> -    PCIDevice *dev = &vdev->pdev;
>>>       Error *local_err = NULL;
>>> -    PCIEAERMsg msg = {
>>> -        .severity = 0,
>>> -        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>>> -    };
>>>         if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
>>>           return;
>>> @@ -2636,35 +2631,7 @@ static void vfio_err_notifier_handler(void 
>>> *opaque)
>>>           goto stop;
>>>       }
>>>   -    /*
>>> -     * we should read the error details from the real hardware
>>> -     * configuration spaces, here we only need to do is signaling
>>> -     * to guest an uncorrectable error has occurred.
>>> -     */
>>> -    if (dev->exp.aer_cap) {
>>> -        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>>> -        uint32_t uncor_status;
>>> -        bool isfatal;
>>> -
>>> -        uncor_status = vfio_pci_read_config(dev,
>>> -                           dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 
>>> 4);
>>> -
>>> -        /*
>>> -         * if the error is not emitted by this device, we can
>>> -         * just ignore it.
>>> -         */
>>> -        if (!(uncor_status & ~0UL)) {
>>> -            return;
>>> -        }
>>> -
>>> -        isfatal = uncor_status & pci_get_long(aer_cap + 
>>> PCI_ERR_UNCOR_SEVER);
>>> -
>>> -        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>>> - PCI_ERR_ROOT_CMD_NONFATAL_EN;
>>> -
>>> -        pcie_aer_msg(dev, &msg);
>>> -        return;
>>> -    }
>>> +    return;
>>>     stop:
>>>       /*
>>> @@ -2757,6 +2724,126 @@ static void 
>>> vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>>>       event_notifier_cleanup(&vdev->err_notifier);
>>>   }
>>>   +static void vfio_resume_notifier_handler(void *opaque)
>>> +{
>>> +    VFIOPCIDevice *vdev = opaque;
>>> +    PCIDevice *dev = &vdev->pdev;
>>> +    PCIEAERMsg msg = {
>>> +        .severity = 0,
>>> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>>> +    };
>>> +
>>> +    if (!event_notifier_test_and_clear(&vdev->resume_notifier)) {
>>> +        return;
>>> +    }
>>> +
>>> +    /*
>>> +     * we should read the error details from the real hardware
>>> +     * configuration spaces, here we only need to do is signaling
>>> +     * to guest an uncorrectable error has occurred.
>>> +     */
>>> +    if (dev->exp.aer_cap) {
>>> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>>> +        uint32_t uncor_status;
>>> +        bool isfatal;
>>> +
>>> +        uncor_status = vfio_pci_read_config(dev,
>>> +                           dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 
>>> 4);
>>> +
>>> +        /*
>>> +         * if the error is not emitted by this device, we can
>>> +         * just ignore it.
>>> +         */
>>> +        if (!(uncor_status & ~0UL)) {
>>> +            return;
>>> +        }
>>> +
>>> +        isfatal = uncor_status & pci_get_long(aer_cap + 
>>> PCI_ERR_UNCOR_SEVER);
>>> +
>>> +        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>>> + PCI_ERR_ROOT_CMD_NONFATAL_EN;
>>> +
>>> +        pcie_aer_msg(dev, &msg);
>>> +    }
>>> +}
>>> +
>>> +static void vfio_register_aer_resume_notifier(VFIOPCIDevice *vdev)
>>> +{
>>> +    int ret;
>>> +    int argsz;
>>> +    struct vfio_irq_set *irq_set;
>>> +    int32_t *pfd;
>>> +
>>> +    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (event_notifier_init(&vdev->resume_notifier, 0)) {
>>> +        error_report("vfio: Unable to init event notifier for"
>>> +                     " resume notification");
>>> +        return;
>>> +    }
>>> +
>>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>>> +
>>> +    irq_set = g_malloc0(argsz);
>>> +    irq_set->argsz = argsz;
>>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
>>> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
>>> +    irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
>>> +    irq_set->start = 0;
>>> +    irq_set->count = 1;
>>> +    pfd = (int32_t *)&irq_set->data;
>>> +
>>> +    *pfd = event_notifier_get_fd(&vdev->resume_notifier);
>>> +    qemu_set_fd_handler(*pfd, vfio_resume_notifier_handler, NULL, 
>>> vdev);
>>> +
>>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>> +    if (ret) {
>>> +        error_report("vfio: Failed to set up resume notification");
>>> +        qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
>>> + event_notifier_cleanup(&vdev->resume_notifier);
>>> +    } else {
>>> +        vdev->resume_enabled = true;
>>> +    }
>>> +    g_free(irq_set);
>>> +}
>>> +
>>> +static void vfio_unregister_aer_resume_notifier(VFIOPCIDevice *vdev)
>>> +{
>>> +    int argsz;
>>> +    struct vfio_irq_set *irq_set;
>>> +    int32_t *pfd;
>>> +    int ret;
>>> +
>>> +    if (!vdev->resume_enabled) {
>>> +        return;
>>> +    }
>>> +
>>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>>> +
>>> +    irq_set = g_malloc0(argsz);
>>> +    irq_set->argsz = argsz;
>>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
>>> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
>>> +    irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
>>> +    irq_set->start = 0;
>>> +    irq_set->count = 1;
>>> +    pfd = (int32_t *)&irq_set->data;
>>> +    *pfd = -1;
>>> +
>>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>>> +    if (ret) {
>>> +        error_report("vfio: Failed to de-assign error fd: %m");
>>> +    }
>>> +    g_free(irq_set);
>>> + qemu_set_fd_handler(event_notifier_get_fd(&vdev->resume_notifier),
>>> +                        NULL, NULL, vdev);
>>> +    event_notifier_cleanup(&vdev->resume_notifier);
>>> +
>>> +    vdev->resume_enabled = false;
>>> +}
>>> +
>>>   static void vfio_req_notifier_handler(void *opaque)
>>>   {
>>>       VFIOPCIDevice *vdev = opaque;
>>> @@ -3062,6 +3149,7 @@ static int vfio_initfn(PCIDevice *pdev)
>>>       }
>>>         vfio_register_err_notifier(vdev);
>>> +    vfio_register_aer_resume_notifier(vdev);
>>>       vfio_register_req_notifier(vdev);
>>>       vfio_setup_resetfn_quirk(vdev);
>>>   @@ -3092,6 +3180,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>>>       VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>>         vfio_unregister_req_notifier(vdev);
>>> +    vfio_unregister_aer_resume_notifier(vdev);
>>>       vfio_unregister_err_notifier(vdev);
>>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>>       vfio_disable_interrupts(vdev);
>>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>>> index 9fb0206..3ebc58f 100644
>>> --- a/hw/vfio/pci.h
>>> +++ b/hw/vfio/pci.h
>>> @@ -119,6 +119,7 @@ typedef struct VFIOPCIDevice {
>>>       VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
>>>       PCIHostDeviceAddress host;
>>>       EventNotifier err_notifier;
>>> +    EventNotifier resume_notifier;
>>>       EventNotifier req_notifier;
>>>       int (*resetfn)(struct VFIOPCIDevice *);
>>>       uint32_t vendor_id;
>>> @@ -144,6 +145,7 @@ typedef struct VFIOPCIDevice {
>>>       bool no_kvm_msi;
>>>       bool no_kvm_msix;
>>>       bool single_depend_dev;
>>> +    bool resume_enabled;
>>>   } VFIOPCIDevice;
>>>     uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, 
>>> int len);
>>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>>> index 15e096c..6d1826d 100644
>>> --- a/linux-headers/linux/vfio.h
>>> +++ b/linux-headers/linux/vfio.h
>>> @@ -345,6 +345,7 @@ enum {
>>>       VFIO_PCI_MSIX_IRQ_INDEX,
>>>       VFIO_PCI_ERR_IRQ_INDEX,
>>>       VFIO_PCI_REQ_IRQ_INDEX,
>>> +    VFIO_PCI_RESUME_IRQ_INDEX,
>>>       VFIO_PCI_NUM_IRQS
>>>   };
>>
>>
>> .
>>
>
>
>
>
> .
>

  reply	other threads:[~2016-04-26  3:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38   ` Alex Williamson
2016-04-14  1:02     ` Chen Fan
2016-04-26  3:39       ` Chen Fan [this message]
2016-04-26 14:48         ` Alex Williamson
2016-05-06  1:38           ` Chen Fan
2016-05-06 16:39             ` Alex Williamson
2016-05-11  3:11               ` Zhou Jie
2016-05-11 20:20                 ` Alex Williamson
2016-05-24 10:49           ` Michael S. Tsirkin
2016-05-25  1:08             ` Zhou Jie
2016-05-25  2:54             ` Alex Williamson
2016-05-25  8:45               ` Michael S. Tsirkin
2016-05-25 14:22                 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571EE2D6.4000100@cn.fujitsu.com \
    --to=chen.fan.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).