From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50334) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YY60x-0000fG-7X for qemu-devel@nongnu.org; Wed, 18 Mar 2015 00:55:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YY60l-0007fD-2k for qemu-devel@nongnu.org; Wed, 18 Mar 2015 00:55:31 -0400 Received: from e23smtp06.au.ibm.com ([202.81.31.148]:41343) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YY60k-0007Xk-Es for qemu-devel@nongnu.org; Wed, 18 Mar 2015 00:55:18 -0400 Received: from /spool/local by e23smtp06.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 18 Mar 2015 14:55:13 +1000 Date: Wed, 18 Mar 2015 15:54:09 +1100 From: Gavin Shan Message-ID: <20150318045409.GA18622@shangw> References: <1426523486-9794-1-git-send-email-gwshan@linux.vnet.ibm.com> <1426523486-9794-2-git-send-email-gwshan@linux.vnet.ibm.com> <1426627006.3643.342.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1426627006.3643.342.camel@redhat.com> Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset Reply-To: Gavin Shan List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: david@gibson.dropbear.id.au, qemu-ppc@nongnu.org, Gavin Shan , qemu-devel@nongnu.org On Tue, Mar 17, 2015 at 03:16:46PM -0600, Alex Williamson wrote: >On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote: >> When Linux guest recovers from EEH error on the following Emulex >> adapter, the MSIx interrupts are disabled and the INTx emulation >> is enabled. One INTx interrupt is injected to the guest by host >> because of detected pending INTx interrupts on the adapter. QEMU >> disables mmap'ed BAR regions and starts a timer to enable those >> regions at later point the INTx interrupt handler. Unfortunately, >> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled >> mapp'ed BAR regions won't be reenabled properly. It leads to EEH >> recovery failure at guest side because of hanged MMIO access. >> >> # lspci | grep Emulex >> 0000:01:00.0 Ethernet controller: Emulex Corporation \ >> OneConnect 10Gb NIC (be3) (rev 02) >> 0000:01:00.1 Ethernet controller: Emulex Corporation \ >> OneConnect 10Gb NIC (be3) (rev 02) >> >> The patch disables INTx interrupt before doing EEH reset to avoid >> the issue. >> >> Signed-off-by: Gavin Shan >> --- >> hw/vfio/pci.c | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> index fca1edc..bfa3d0c 100644 >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid, >> * disable it so that it can be reenabled properly. Also, >> * the cached MSIx table should be cleared as it's not >> * reflecting the contents in hardware. >> + * >> + * We might have INTx interrupt whose handler disables the >> + * memory mapped BARs. The INTx pending state can't be >> + * cleared with memory BAR access in slow path. The timer >> + * kicked by the INTx interrupt handler won't enable those >> + * disabled memory mapped BARs, which leads to hanged MMIO >> + * register access and EEH recovery failure. We simply disable >> + * INTx if it has been enabled. >> */ > >This feels like a quick hack for a problem we don't really understand. >Why is INTx being fired through QEMU rather than KVM? Why isn't the >INTx re-enabling happening since this is exactly the scenario where it's >supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses >BAR, mmap re-enabled, INTx unmasked)? > Indeed. It's a quick hack before finding the root cause about why slow path doesn't work when fast path is disabled. I'm still tracing it and hopefully I can find something soon. Note that: KVM IRQFD isn't enabled on the system I was doing experiments. Thanks, Gavin >> QLIST_FOREACH(vbasedev, &group->device_list, next) { >> vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev); >> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid, >> } >> >> msix_reset(&vdev->pdev); >> + >> + /* Disable INTx */ >> + if (vdev->interrupt == VFIO_INT_INTx) { >> + vfio_disable_intx(vdev); >> + } >> } >> >> break; > > > >