From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59999) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1autvm-0000kQ-9q for qemu-devel@nongnu.org; Mon, 25 Apr 2016 23:44:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1autvi-000150-65 for qemu-devel@nongnu.org; Mon, 25 Apr 2016 23:44:58 -0400 Received: from [59.151.112.132] (port=57411 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1autvg-00013y-TU for qemu-devel@nongnu.org; Mon, 25 Apr 2016 23:44:54 -0400 Message-ID: <571EE2D6.4000100@cn.fujitsu.com> Date: Tue, 26 Apr 2016 11:39:02 +0800 From: Chen Fan MIME-Version: 1.0 References: <1459856523-17085-1-git-send-email-caoj.fnst@cn.fujitsu.com> <1459856523-17085-12-git-send-email-caoj.fnst@cn.fujitsu.com> <20160411153827.3884ded1@t450s.home> <570EEC42.3040300@cn.fujitsu.com> In-Reply-To: <570EEC42.3040300@cn.fujitsu.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson , Cao jin Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org, mst@redhat.com On 04/14/2016 09:02 AM, Chen Fan wrote: > > On 04/12/2016 05:38 AM, Alex Williamson wrote: >> On Tue, 5 Apr 2016 19:42:02 +0800 >> Cao jin wrote: >> >>> From: Chen Fan >>> >>> for supporting aer recovery, host and guest would run the same aer >>> recovery code, that would do the secondary bus reset if the error >>> is fatal, the aer recovery process: >>> 1. error_detected >>> 2. reset_link (if fatal) >>> 3. slot_reset/mmio_enabled >>> 4. resume >>> >>> it indicates that host will do secondary bus reset to reset >>> the physical devices under bus in step 2, that would cause >>> devices in D3 status in a short time. but in qemu, we register >>> an error detected handler, that would be invoked as host broadcasts >>> the error-detected event in step 1, in order to avoid guest do >>> reset_link when host do reset_link simultaneously. it may cause >>> fatal error. we introduce a resmue notifier to assure host reset >>> completely. then do guest aer injection. >> Why is it safe to continue running the VM between the error detected >> notification and the resume notification? We're just pushing back the >> point at which we inject the AER into the guest, potentially negating >> any benefit by allowing the VM to consume bad data. Shouldn't we >> instead be immediately notifying the VM on error detected, but stalling >> any access to the device until resume is signaled? How do we know that >> resume will ever be signaled? We have both the problem that we may be >> running on an older kernel that won't support a resume notification and >> the problem that seeing a resume notification depends on the host being >> able to successfully complete a link reset after fatal error. We can >> detect support for resume notification, but we still need a strategy >> for never receiving it. Thanks, > That's make sense, but I haven't came up with a good idea. do you have > any idea, Alex? > ping... > Thanks, > Chen > > >> >> Alex >> >>> Signed-off-by: Chen Fan >>> --- >>> hw/vfio/pci.c | 157 >>> +++++++++++++++++++++++++++++++++++---------- >>> hw/vfio/pci.h | 2 + >>> linux-headers/linux/vfio.h | 1 + >>> 3 files changed, 126 insertions(+), 34 deletions(-) >>> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >>> index 691ff5e..d79fb3d 100644 >>> --- a/hw/vfio/pci.c >>> +++ b/hw/vfio/pci.c >>> @@ -2610,12 +2610,7 @@ static void vfio_put_device(VFIOPCIDevice *vdev) >>> static void vfio_err_notifier_handler(void *opaque) >>> { >>> VFIOPCIDevice *vdev = opaque; >>> - PCIDevice *dev = &vdev->pdev; >>> Error *local_err = NULL; >>> - PCIEAERMsg msg = { >>> - .severity = 0, >>> - .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn, >>> - }; >>> if (!event_notifier_test_and_clear(&vdev->err_notifier)) { >>> return; >>> @@ -2636,35 +2631,7 @@ static void vfio_err_notifier_handler(void >>> *opaque) >>> goto stop; >>> } >>> - /* >>> - * we should read the error details from the real hardware >>> - * configuration spaces, here we only need to do is signaling >>> - * to guest an uncorrectable error has occurred. >>> - */ >>> - if (dev->exp.aer_cap) { >>> - uint8_t *aer_cap = dev->config + dev->exp.aer_cap; >>> - uint32_t uncor_status; >>> - bool isfatal; >>> - >>> - uncor_status = vfio_pci_read_config(dev, >>> - dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, >>> 4); >>> - >>> - /* >>> - * if the error is not emitted by this device, we can >>> - * just ignore it. >>> - */ >>> - if (!(uncor_status & ~0UL)) { >>> - return; >>> - } >>> - >>> - isfatal = uncor_status & pci_get_long(aer_cap + >>> PCI_ERR_UNCOR_SEVER); >>> - >>> - msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : >>> - PCI_ERR_ROOT_CMD_NONFATAL_EN; >>> - >>> - pcie_aer_msg(dev, &msg); >>> - return; >>> - } >>> + return; >>> stop: >>> /* >>> @@ -2757,6 +2724,126 @@ static void >>> vfio_unregister_err_notifier(VFIOPCIDevice *vdev) >>> event_notifier_cleanup(&vdev->err_notifier); >>> } >>> +static void vfio_resume_notifier_handler(void *opaque) >>> +{ >>> + VFIOPCIDevice *vdev = opaque; >>> + PCIDevice *dev = &vdev->pdev; >>> + PCIEAERMsg msg = { >>> + .severity = 0, >>> + .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn, >>> + }; >>> + >>> + if (!event_notifier_test_and_clear(&vdev->resume_notifier)) { >>> + return; >>> + } >>> + >>> + /* >>> + * we should read the error details from the real hardware >>> + * configuration spaces, here we only need to do is signaling >>> + * to guest an uncorrectable error has occurred. >>> + */ >>> + if (dev->exp.aer_cap) { >>> + uint8_t *aer_cap = dev->config + dev->exp.aer_cap; >>> + uint32_t uncor_status; >>> + bool isfatal; >>> + >>> + uncor_status = vfio_pci_read_config(dev, >>> + dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, >>> 4); >>> + >>> + /* >>> + * if the error is not emitted by this device, we can >>> + * just ignore it. >>> + */ >>> + if (!(uncor_status & ~0UL)) { >>> + return; >>> + } >>> + >>> + isfatal = uncor_status & pci_get_long(aer_cap + >>> PCI_ERR_UNCOR_SEVER); >>> + >>> + msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : >>> + PCI_ERR_ROOT_CMD_NONFATAL_EN; >>> + >>> + pcie_aer_msg(dev, &msg); >>> + } >>> +} >>> + >>> +static void vfio_register_aer_resume_notifier(VFIOPCIDevice *vdev) >>> +{ >>> + int ret; >>> + int argsz; >>> + struct vfio_irq_set *irq_set; >>> + int32_t *pfd; >>> + >>> + if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) { >>> + return; >>> + } >>> + >>> + if (event_notifier_init(&vdev->resume_notifier, 0)) { >>> + error_report("vfio: Unable to init event notifier for" >>> + " resume notification"); >>> + return; >>> + } >>> + >>> + argsz = sizeof(*irq_set) + sizeof(*pfd); >>> + >>> + irq_set = g_malloc0(argsz); >>> + irq_set->argsz = argsz; >>> + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | >>> + VFIO_IRQ_SET_ACTION_TRIGGER; >>> + irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX; >>> + irq_set->start = 0; >>> + irq_set->count = 1; >>> + pfd = (int32_t *)&irq_set->data; >>> + >>> + *pfd = event_notifier_get_fd(&vdev->resume_notifier); >>> + qemu_set_fd_handler(*pfd, vfio_resume_notifier_handler, NULL, >>> vdev); >>> + >>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set); >>> + if (ret) { >>> + error_report("vfio: Failed to set up resume notification"); >>> + qemu_set_fd_handler(*pfd, NULL, NULL, vdev); >>> + event_notifier_cleanup(&vdev->resume_notifier); >>> + } else { >>> + vdev->resume_enabled = true; >>> + } >>> + g_free(irq_set); >>> +} >>> + >>> +static void vfio_unregister_aer_resume_notifier(VFIOPCIDevice *vdev) >>> +{ >>> + int argsz; >>> + struct vfio_irq_set *irq_set; >>> + int32_t *pfd; >>> + int ret; >>> + >>> + if (!vdev->resume_enabled) { >>> + return; >>> + } >>> + >>> + argsz = sizeof(*irq_set) + sizeof(*pfd); >>> + >>> + irq_set = g_malloc0(argsz); >>> + irq_set->argsz = argsz; >>> + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | >>> + VFIO_IRQ_SET_ACTION_TRIGGER; >>> + irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX; >>> + irq_set->start = 0; >>> + irq_set->count = 1; >>> + pfd = (int32_t *)&irq_set->data; >>> + *pfd = -1; >>> + >>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set); >>> + if (ret) { >>> + error_report("vfio: Failed to de-assign error fd: %m"); >>> + } >>> + g_free(irq_set); >>> + qemu_set_fd_handler(event_notifier_get_fd(&vdev->resume_notifier), >>> + NULL, NULL, vdev); >>> + event_notifier_cleanup(&vdev->resume_notifier); >>> + >>> + vdev->resume_enabled = false; >>> +} >>> + >>> static void vfio_req_notifier_handler(void *opaque) >>> { >>> VFIOPCIDevice *vdev = opaque; >>> @@ -3062,6 +3149,7 @@ static int vfio_initfn(PCIDevice *pdev) >>> } >>> vfio_register_err_notifier(vdev); >>> + vfio_register_aer_resume_notifier(vdev); >>> vfio_register_req_notifier(vdev); >>> vfio_setup_resetfn_quirk(vdev); >>> @@ -3092,6 +3180,7 @@ static void vfio_exitfn(PCIDevice *pdev) >>> VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >>> vfio_unregister_req_notifier(vdev); >>> + vfio_unregister_aer_resume_notifier(vdev); >>> vfio_unregister_err_notifier(vdev); >>> pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); >>> vfio_disable_interrupts(vdev); >>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h >>> index 9fb0206..3ebc58f 100644 >>> --- a/hw/vfio/pci.h >>> +++ b/hw/vfio/pci.h >>> @@ -119,6 +119,7 @@ typedef struct VFIOPCIDevice { >>> VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */ >>> PCIHostDeviceAddress host; >>> EventNotifier err_notifier; >>> + EventNotifier resume_notifier; >>> EventNotifier req_notifier; >>> int (*resetfn)(struct VFIOPCIDevice *); >>> uint32_t vendor_id; >>> @@ -144,6 +145,7 @@ typedef struct VFIOPCIDevice { >>> bool no_kvm_msi; >>> bool no_kvm_msix; >>> bool single_depend_dev; >>> + bool resume_enabled; >>> } VFIOPCIDevice; >>> uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, >>> int len); >>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h >>> index 15e096c..6d1826d 100644 >>> --- a/linux-headers/linux/vfio.h >>> +++ b/linux-headers/linux/vfio.h >>> @@ -345,6 +345,7 @@ enum { >>> VFIO_PCI_MSIX_IRQ_INDEX, >>> VFIO_PCI_ERR_IRQ_INDEX, >>> VFIO_PCI_REQ_IRQ_INDEX, >>> + VFIO_PCI_RESUME_IRQ_INDEX, >>> VFIO_PCI_NUM_IRQS >>> }; >> >> >> . >> > > > > > . >