From: Alex Williamson <alex.williamson@nvidia.com>
To: jrhilke@google.com
Cc: Alex Williamson <alex.williamson@nvidia.com>,
Alex Williamson <alex@shazbot.org>, kvm <kvm@vger.kernel.org>,
David Matlack <dmatlack@google.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: [PATCH 8/8] selftests/vfio: igb: Recover after DMA-read faults
Date: Fri, 15 May 2026 16:03:15 -0600 [thread overview]
Message-ID: <20260515220330.565792-9-alex.williamson@nvidia.com> (raw)
In-Reply-To: <20260515220330.565792-1-alex.williamson@nvidia.com>
The mix_and_match test intentionally submits a TX descriptor with an
unmapped source IOVA so that the DMA read fails. On real 82576
hardware the resulting fault leaves the descriptor engine unable to
service subsequent valid descriptors, so the next memcpy in the same
test iteration times out.
The 82576 datasheet (section 4.2.1.6.1) describes CTRL.RST as the
software mechanism to recover from a hung device. Empirically
CTRL.RST alone is not sufficient in this state: the visible queue
registers are reinitialized, but the next valid memcpy still posts
descriptors without any TDH/TDT progress in the same process. A
fresh device open after the failure works, which points to a reset
scope broader than CTRL.RST being required. The 82576 advertises
PCIe FLR; VFIO_DEVICE_RESET drives FLR and supplies that scope while
preserving the selftest process and its DMA mappings.
Add igb_error_reset_and_reinit() implementing the recovery sequence:
issue VFIO_DEVICE_RESET, re-arm the kernel-side MSI-X trigger against
the still-valid eventfd via vfio_pci_irq_reenable() (this does not
touch the eventfd, which test fixtures may have cached), and
re-program the device via igb_hw_init(). FLR clears EICR and leaves
EIMS=0, so no explicit interrupt mask or cause writes are needed.
igb_hw_init() resets tx_tail/rx_tail to 0 and igb_memcpy_start() zeros
each descriptor before submission, so no ring memset is needed either.
Call this from igb_memcpy_wait() on completion timeout, preceded by a
10 ms delay so that PCIe/IOMMU/AER error handling triggered by the
just-observed DMA fault can release the device lock VFIO_DEVICE_RESET
contends for. The delay is heuristic and tied to the fault path, so
it lives at the call site rather than inside the reset helper. The
failed memcpy still returns -ETIMEDOUT; reset recovery only ensures
the next operation starts from a usable device state.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
---
.../selftests/vfio/lib/drivers/igb/igb.c | 40 +++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/tools/testing/selftests/vfio/lib/drivers/igb/igb.c b/tools/testing/selftests/vfio/lib/drivers/igb/igb.c
index ef242ebd9d2e..07d1a907f18a 100644
--- a/tools/testing/selftests/vfio/lib/drivers/igb/igb.c
+++ b/tools/testing/selftests/vfio/lib/drivers/igb/igb.c
@@ -443,6 +443,28 @@ static void igb_memcpy_start(struct vfio_pci_device *device, iova_t src,
igb_write32(igb, IGB_TDT0, igb->tx_tail);
}
+/*
+ * Reset the device via VFIO_DEVICE_RESET (PCIe FLR on the 82576) and
+ * re-program it. VFIO_DEVICE_RESET tears down the kernel-side MSI-X
+ * trigger but leaves user-side eventfds intact, so re-arm the trigger
+ * via vfio_pci_irq_reenable() before reprogramming so any caller-cached
+ * eventfd remains valid.
+ *
+ * FLR clears device-side state to power-on reset values (datasheet
+ * 4.2.1.5.1: a PF FLR is "equivalent to a D0->D3->D0 transition"), so
+ * EIMS and EICR come back as 0 from their register-defined initial
+ * values, and igb_hw_init() resets tx_tail/rx_tail to 0. The next
+ * igb_memcpy_start() will memset each descriptor it touches before
+ * submission, so no explicit IMC/EICR writes or ring memsets are
+ * needed here.
+ */
+static void igb_error_reset_and_reinit(struct vfio_pci_device *device)
+{
+ vfio_pci_device_reset(device);
+ vfio_pci_irq_reenable(device, VFIO_PCI_MSIX_IRQ_INDEX, MSIX_VECTOR, 1);
+ igb_hw_init(device);
+}
+
static int igb_memcpy_wait(struct vfio_pci_device *device)
{
struct igb *igb = to_igb_state(device);
@@ -478,6 +500,24 @@ static int igb_memcpy_wait(struct vfio_pci_device *device)
if (rx->wb.status_error & 1)
return 0;
+ /*
+ * The descriptor never completed. On real 82576 hardware this
+ * typically follows a DMA-read fault from one of the intentional
+ * unmapped-IOVA tests; the fault leaves the descriptor engine
+ * unable to service subsequent valid descriptors. CTRL.RST alone
+ * reinitializes the queue registers but leaves the engine wedged
+ * for the current process, so a broader VFIO_DEVICE_RESET (FLR)
+ * is required.
+ *
+ * Delay before requesting reset so PCIe/IOMMU/AER error handling
+ * triggered by the just-observed DMA fault can release the device
+ * lock VFIO_DEVICE_RESET contends for. The 10 ms value is
+ * heuristic. The current memcpy still fails with -ETIMEDOUT;
+ * recovery only ensures the next memcpy starts from a usable state.
+ */
+ usleep(10000);
+ igb_error_reset_and_reinit(device);
+
return -ETIMEDOUT;
}
--
2.51.0
prev parent reply other threads:[~2026-05-15 22:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 22:03 [PATCH 0/8] selftests/vfio: igb: 82576 hardware compatibility Alex Williamson
2026-05-15 22:03 ` [PATCH 1/8] selftests/vfio: igb: Use PHY internal loopback on 82576 Alex Williamson
2026-05-15 22:03 ` [PATCH 2/8] selftests/vfio: igb: Use advanced TX and RX descriptors Alex Williamson
2026-05-15 22:03 ` [PATCH 3/8] selftests/vfio: igb: Program MSI-X interrupt routing Alex Williamson
2026-05-15 22:03 ` [PATCH 4/8] selftests/vfio: igb: Extend memcpy completion timeout for line-rate hardware Alex Williamson
2026-05-15 22:03 ` [PATCH 5/8] selftests/vfio: igb: Disable PCIe completion timeout retries Alex Williamson
2026-05-15 22:03 ` [PATCH 6/8] selftests/vfio: Add vfio_pci_irq_reenable() helper Alex Williamson
2026-05-15 22:03 ` [PATCH 7/8] selftests/vfio: igb: Factor hardware programming into igb_hw_init() Alex Williamson
2026-05-15 22:03 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260515220330.565792-9-alex.williamson@nvidia.com \
--to=alex.williamson@nvidia.com \
--cc=alex@shazbot.org \
--cc=dmatlack@google.com \
--cc=jgg@nvidia.com \
--cc=jrhilke@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.