qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, izumi.taku@jp.fujitsu.com
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Thu, 14 Apr 2016 09:02:58 +0800	[thread overview]
Message-ID: <570EEC42.3040300@cn.fujitsu.com> (raw)
In-Reply-To: <20160411153827.3884ded1@t450s.home>


On 04/12/2016 05:38 AM, Alex Williamson wrote:
> On Tue, 5 Apr 2016 19:42:02 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>
>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>
>> for supporting aer recovery, host and guest would run the same aer
>> recovery code, that would do the secondary bus reset if the error
>> is fatal, the aer recovery process:
>>    1. error_detected
>>    2. reset_link (if fatal)
>>    3. slot_reset/mmio_enabled
>>    4. resume
>>
>> it indicates that host will do secondary bus reset to reset
>> the physical devices under bus in step 2, that would cause
>> devices in D3 status in a short time. but in qemu, we register
>> an error detected handler, that would be invoked as host broadcasts
>> the error-detected event in step 1, in order to avoid guest do
>> reset_link when host do reset_link simultaneously. it may cause
>> fatal error. we introduce a resmue notifier to assure host reset
>> completely. then do guest aer injection.
> Why is it safe to continue running the VM between the error detected
> notification and the resume notification?  We're just pushing back the
> point at which we inject the AER into the guest, potentially negating
> any benefit by allowing the VM to consume bad data.  Shouldn't we
> instead be immediately notifying the VM on error detected, but stalling
> any access to the device until resume is signaled?  How do we know that
> resume will ever be signaled?  We have both the problem that we may be
> running on an older kernel that won't support a resume notification and
> the problem that seeing a resume notification depends on the host being
> able to successfully complete a link reset after fatal error.  We can
> detect support for resume notification, but we still need a strategy
> for never receiving it.  Thanks,
That's make sense, but I haven't came up with a good idea. do you have
any idea, Alex?

Thanks,
Chen


>
> Alex
>
>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>> ---
>>   hw/vfio/pci.c              | 157 +++++++++++++++++++++++++++++++++++----------
>>   hw/vfio/pci.h              |   2 +
>>   linux-headers/linux/vfio.h |   1 +
>>   3 files changed, 126 insertions(+), 34 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 691ff5e..d79fb3d 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2610,12 +2610,7 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>>   static void vfio_err_notifier_handler(void *opaque)
>>   {
>>       VFIOPCIDevice *vdev = opaque;
>> -    PCIDevice *dev = &vdev->pdev;
>>       Error *local_err = NULL;
>> -    PCIEAERMsg msg = {
>> -        .severity = 0,
>> -        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>> -    };
>>   
>>       if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
>>           return;
>> @@ -2636,35 +2631,7 @@ static void vfio_err_notifier_handler(void *opaque)
>>           goto stop;
>>       }
>>   
>> -    /*
>> -     * we should read the error details from the real hardware
>> -     * configuration spaces, here we only need to do is signaling
>> -     * to guest an uncorrectable error has occurred.
>> -     */
>> -    if (dev->exp.aer_cap) {
>> -        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>> -        uint32_t uncor_status;
>> -        bool isfatal;
>> -
>> -        uncor_status = vfio_pci_read_config(dev,
>> -                           dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
>> -
>> -        /*
>> -         * if the error is not emitted by this device, we can
>> -         * just ignore it.
>> -         */
>> -        if (!(uncor_status & ~0UL)) {
>> -            return;
>> -        }
>> -
>> -        isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
>> -
>> -        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>> -                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
>> -
>> -        pcie_aer_msg(dev, &msg);
>> -        return;
>> -    }
>> +    return;
>>   
>>   stop:
>>       /*
>> @@ -2757,6 +2724,126 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>>       event_notifier_cleanup(&vdev->err_notifier);
>>   }
>>   
>> +static void vfio_resume_notifier_handler(void *opaque)
>> +{
>> +    VFIOPCIDevice *vdev = opaque;
>> +    PCIDevice *dev = &vdev->pdev;
>> +    PCIEAERMsg msg = {
>> +        .severity = 0,
>> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
>> +    };
>> +
>> +    if (!event_notifier_test_and_clear(&vdev->resume_notifier)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * we should read the error details from the real hardware
>> +     * configuration spaces, here we only need to do is signaling
>> +     * to guest an uncorrectable error has occurred.
>> +     */
>> +    if (dev->exp.aer_cap) {
>> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
>> +        uint32_t uncor_status;
>> +        bool isfatal;
>> +
>> +        uncor_status = vfio_pci_read_config(dev,
>> +                           dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
>> +
>> +        /*
>> +         * if the error is not emitted by this device, we can
>> +         * just ignore it.
>> +         */
>> +        if (!(uncor_status & ~0UL)) {
>> +            return;
>> +        }
>> +
>> +        isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
>> +
>> +        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>> +                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
>> +
>> +        pcie_aer_msg(dev, &msg);
>> +    }
>> +}
>> +
>> +static void vfio_register_aer_resume_notifier(VFIOPCIDevice *vdev)
>> +{
>> +    int ret;
>> +    int argsz;
>> +    struct vfio_irq_set *irq_set;
>> +    int32_t *pfd;
>> +
>> +    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
>> +        return;
>> +    }
>> +
>> +    if (event_notifier_init(&vdev->resume_notifier, 0)) {
>> +        error_report("vfio: Unable to init event notifier for"
>> +                     " resume notification");
>> +        return;
>> +    }
>> +
>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>> +
>> +    irq_set = g_malloc0(argsz);
>> +    irq_set->argsz = argsz;
>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
>> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
>> +    irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
>> +    irq_set->start = 0;
>> +    irq_set->count = 1;
>> +    pfd = (int32_t *)&irq_set->data;
>> +
>> +    *pfd = event_notifier_get_fd(&vdev->resume_notifier);
>> +    qemu_set_fd_handler(*pfd, vfio_resume_notifier_handler, NULL, vdev);
>> +
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    if (ret) {
>> +        error_report("vfio: Failed to set up resume notification");
>> +        qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
>> +        event_notifier_cleanup(&vdev->resume_notifier);
>> +    } else {
>> +        vdev->resume_enabled = true;
>> +    }
>> +    g_free(irq_set);
>> +}
>> +
>> +static void vfio_unregister_aer_resume_notifier(VFIOPCIDevice *vdev)
>> +{
>> +    int argsz;
>> +    struct vfio_irq_set *irq_set;
>> +    int32_t *pfd;
>> +    int ret;
>> +
>> +    if (!vdev->resume_enabled) {
>> +        return;
>> +    }
>> +
>> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
>> +
>> +    irq_set = g_malloc0(argsz);
>> +    irq_set->argsz = argsz;
>> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
>> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
>> +    irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
>> +    irq_set->start = 0;
>> +    irq_set->count = 1;
>> +    pfd = (int32_t *)&irq_set->data;
>> +    *pfd = -1;
>> +
>> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>> +    if (ret) {
>> +        error_report("vfio: Failed to de-assign error fd: %m");
>> +    }
>> +    g_free(irq_set);
>> +    qemu_set_fd_handler(event_notifier_get_fd(&vdev->resume_notifier),
>> +                        NULL, NULL, vdev);
>> +    event_notifier_cleanup(&vdev->resume_notifier);
>> +
>> +    vdev->resume_enabled = false;
>> +}
>> +
>>   static void vfio_req_notifier_handler(void *opaque)
>>   {
>>       VFIOPCIDevice *vdev = opaque;
>> @@ -3062,6 +3149,7 @@ static int vfio_initfn(PCIDevice *pdev)
>>       }
>>   
>>       vfio_register_err_notifier(vdev);
>> +    vfio_register_aer_resume_notifier(vdev);
>>       vfio_register_req_notifier(vdev);
>>       vfio_setup_resetfn_quirk(vdev);
>>   
>> @@ -3092,6 +3180,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>>       VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>   
>>       vfio_unregister_req_notifier(vdev);
>> +    vfio_unregister_aer_resume_notifier(vdev);
>>       vfio_unregister_err_notifier(vdev);
>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>       vfio_disable_interrupts(vdev);
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 9fb0206..3ebc58f 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -119,6 +119,7 @@ typedef struct VFIOPCIDevice {
>>       VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
>>       PCIHostDeviceAddress host;
>>       EventNotifier err_notifier;
>> +    EventNotifier resume_notifier;
>>       EventNotifier req_notifier;
>>       int (*resetfn)(struct VFIOPCIDevice *);
>>       uint32_t vendor_id;
>> @@ -144,6 +145,7 @@ typedef struct VFIOPCIDevice {
>>       bool no_kvm_msi;
>>       bool no_kvm_msix;
>>       bool single_depend_dev;
>> +    bool resume_enabled;
>>   } VFIOPCIDevice;
>>   
>>   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>> index 15e096c..6d1826d 100644
>> --- a/linux-headers/linux/vfio.h
>> +++ b/linux-headers/linux/vfio.h
>> @@ -345,6 +345,7 @@ enum {
>>   	VFIO_PCI_MSIX_IRQ_INDEX,
>>   	VFIO_PCI_ERR_IRQ_INDEX,
>>   	VFIO_PCI_REQ_IRQ_INDEX,
>> +	VFIO_PCI_RESUME_IRQ_INDEX,
>>   	VFIO_PCI_NUM_IRQS
>>   };
>>   
>
>
> .
>

  reply	other threads:[~2016-04-14  1:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38   ` Alex Williamson
2016-04-14  1:02     ` Chen Fan [this message]
2016-04-26  3:39       ` Chen Fan
2016-04-26 14:48         ` Alex Williamson
2016-05-06  1:38           ` Chen Fan
2016-05-06 16:39             ` Alex Williamson
2016-05-11  3:11               ` Zhou Jie
2016-05-11 20:20                 ` Alex Williamson
2016-05-24 10:49           ` Michael S. Tsirkin
2016-05-25  1:08             ` Zhou Jie
2016-05-25  2:54             ` Alex Williamson
2016-05-25  8:45               ` Michael S. Tsirkin
2016-05-25 14:22                 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570EEC42.3040300@cn.fujitsu.com \
    --to=chen.fan.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).