Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Josh Hilke <jrhilke@google.com>,
	Vipin Sharma <vipinsh@google.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Shuah Khan <shuah@kernel.org>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH] vfio: selftests: Add driver for IGB QEMU device
Date: Thu, 14 May 2026 22:58:43 +0000	[thread overview]
Message-ID: <agZTo-OzlSbD9G5n@google.com> (raw)
In-Reply-To: <20260514164844.2698802c@shazbot.org>

On 2026-05-14 04:48 PM, Alex Williamson wrote:
> On Thu, 14 May 2026 14:49:28 -0700
> Josh Hilke <jrhilke@google.com> wrote:
> 
> > On Thu, May 14, 2026 at 9:28 AM Alex Williamson <alex@shazbot.org> wrote:
> > >
> > > On Wed, 13 May 2026 23:33:02 +0000
> > > David Matlack <dmatlack@google.com> wrote:
> > >  
> > > > On 2026-05-13 11:49 AM, Josh Hilke wrote:  
> > > > > On Tue, May 12, 2026 at 7:12 PM Josh Hilke <jrhilke@google.com> wrote:  
> > > > > > On Mon, May 11, 2026 at 4:45 PM David Matlack <dmatlack@google.com> wrote:  
> > > > > > > On 2026-05-11 09:18 PM, Josh Hilke wrote:  
> > > >  
> > > > > > > > +     retries = 100;
> > > > > > > > +     while (retries-- > 0) {
> > > > > > > > +             if (rx->wb.status_error & 1)
> > > > > > > > +                     break;
> > > > > > > > +             usleep(10);
> > > > > > > > +     }  
> > > > > > >
> > > > > > > Why bail after a certain timeout? The test may have kicked off a large
> > > > > > > count of memcpys. Is this for error detection?  
> > > > > >
> > > > > > The bailout was intended to detect errors during development.
> > > > > > Shouldn't need it anymore. I'll remove it in v2.  
> > > > >
> > > > > Sorry, I forgot: we need the timeout  to detect DMA errors for the
> > > > > memcpy_from_unmapped_iova test in vfio_pci_driver_test. The test
> > > > > triggers an IOMMU fault because the IOVA is unmapped, and the IOMMU
> > > > > aborts the DMA operation. However, the QEMU IGB implementation does
> > > > > not set an error bit, so timing out is our only method for error
> > > > > detection.  
> > > >
> > > > Hm... that's going to be tricky then. This means we would have to set
> > > > the timeout to longer than the longest possible memcpy duration to avoid
> > > > false negatives? That means we'll have to set the timeout to quite long.  
> > >
> > > FWIW, I had AI churn on trying to make this work on a physical 82576 as
> > > I have several of these in my local machines as sort of the defacto,
> > > readily available SR-IOV NIC.  The AI got up to 30/35 tests passing but
> > > is currently stuck that the queues stall in the mix-and-match test when
> > > it's trying to DMA from an unmapped IOVA.  So far none of the in-band
> > > methods to kick the queues seem to work, I'm not sure if we'll need to
> > > resort to an FLR.
> > >
> > > I'd be happy to send the changes it's made so far if you want to
> > > validate and incorporate, or have any thoughts to kicking it after the
> > > IOMMU fault.  Some of the changes are related to timeouts, where QEMU
> > > loopback is actually faster than bare metal since the physical  queues
> > > run at 1Gbps even in loopback mode.
> > >
> > > I'll also plant the seed that if we do have outstanding issues for a
> > > driver that binds to a real world device, but only works on the
> > > emulated version of that device... how do we handle that?  In part, I
> > > think it's emulated in QEMU because it is so ubiquitous.  I'm also
> > > hoping to use the same device for the new SR-IOV selftests.  Thanks,
> > >
> > > Alex  
> > 
> > I'm glad you're interested in this as well!
> > 
> > Unfortunately (and ironically) I don't have access to a physical device.
> > 
> > Regarding driver support for the real vs. emulated device, I think we
> > should prioritize supporting the emulated version. This approach
> > unlocks the ability for anyone to run VFIO/Live Update tests without
> > needing specific hardware. Once that's done, other folks can add
> > patches to update the driver if they want to use the physical device.
> > What do you think?
> > 
> > I plan to add SR-IOV support to this driver in a separate series after
> > the driver merges.
> 
> I've got it working now, I'll need to do some cleanup and verification,
> but it's not too bad (imo), several places where QEMU emulation only
> supports one mode and we don't fully configure the device or don't
> account for physical hardware limitations or timing.  The most
> significant change is in resolving the issue above, that once the queue
> gets wedged from DMA errors, VFIO_DEVICE_RESET unfortunately seems to
> be the most straightforward mechanism to get it unstuck.  That may
> result in some initialization refactoring and plumbing to restore the
> interrupt state. Thanks,

FWIW I had to do similar for the ioat driver, except it used a
device-specific reset mechanism instead of VFIO_DEVICE_RESET, so I think
this approach should be ok.

static void ioat_handle_error(struct vfio_pci_device *device)
{
	...
	ioat_reset(device);
}

static int ioat_memcpy_wait(struct vfio_pci_device *device)
{
	void *registers = ioat_channel_registers(device);
	u64 status;
	int r = 0;

	/* Wait until all operations complete. */
	for (;;) {
		status = ioat_channel_status(registers);
		...
		if (status == IOAT_CHANSTS_HALTED) {
			ioat_handle_error(device);
			return -1;
		}
	}

	...
}

  reply	other threads:[~2026-05-14 22:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 21:18 [PATCH] vfio: selftests: Add driver for IGB QEMU device Josh Hilke
2026-05-11 23:45 ` David Matlack
2026-05-13  2:12   ` Josh Hilke
2026-05-13 18:49     ` Josh Hilke
2026-05-13 23:33       ` David Matlack
2026-05-14 16:28         ` Alex Williamson
2026-05-14 21:49           ` Josh Hilke
2026-05-14 22:48             ` Alex Williamson
2026-05-14 22:58               ` David Matlack [this message]
2026-05-14 23:16               ` Josh Hilke
2026-05-13 23:29     ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agZTo-OzlSbD9G5n@google.com \
    --to=dmatlack@google.com \
    --cc=alex@shazbot.org \
    --cc=anthony.l.nguyen@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jrhilke@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox