The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: Josh Hilke <jrhilke@google.com>
Cc: David Matlack <dmatlack@google.com>,
	Vipin Sharma <vipinsh@google.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Shuah Khan <shuah@kernel.org>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, alex@shazbot.org
Subject: Re: [PATCH] vfio: selftests: Add driver for IGB QEMU device
Date: Thu, 14 May 2026 16:48:44 -0600	[thread overview]
Message-ID: <20260514164844.2698802c@shazbot.org> (raw)
In-Reply-To: <CAAdrzjv7FiYWhXHDJQvK-c71UpKo+uwOu-dnbtka=6p2qj86zQ@mail.gmail.com>

On Thu, 14 May 2026 14:49:28 -0700
Josh Hilke <jrhilke@google.com> wrote:

> On Thu, May 14, 2026 at 9:28 AM Alex Williamson <alex@shazbot.org> wrote:
> >
> > On Wed, 13 May 2026 23:33:02 +0000
> > David Matlack <dmatlack@google.com> wrote:
> >  
> > > On 2026-05-13 11:49 AM, Josh Hilke wrote:  
> > > > On Tue, May 12, 2026 at 7:12 PM Josh Hilke <jrhilke@google.com> wrote:  
> > > > > On Mon, May 11, 2026 at 4:45 PM David Matlack <dmatlack@google.com> wrote:  
> > > > > > On 2026-05-11 09:18 PM, Josh Hilke wrote:  
> > >  
> > > > > > > +     retries = 100;
> > > > > > > +     while (retries-- > 0) {
> > > > > > > +             if (rx->wb.status_error & 1)
> > > > > > > +                     break;
> > > > > > > +             usleep(10);
> > > > > > > +     }  
> > > > > >
> > > > > > Why bail after a certain timeout? The test may have kicked off a large
> > > > > > count of memcpys. Is this for error detection?  
> > > > >
> > > > > The bailout was intended to detect errors during development.
> > > > > Shouldn't need it anymore. I'll remove it in v2.  
> > > >
> > > > Sorry, I forgot: we need the timeout  to detect DMA errors for the
> > > > memcpy_from_unmapped_iova test in vfio_pci_driver_test. The test
> > > > triggers an IOMMU fault because the IOVA is unmapped, and the IOMMU
> > > > aborts the DMA operation. However, the QEMU IGB implementation does
> > > > not set an error bit, so timing out is our only method for error
> > > > detection.  
> > >
> > > Hm... that's going to be tricky then. This means we would have to set
> > > the timeout to longer than the longest possible memcpy duration to avoid
> > > false negatives? That means we'll have to set the timeout to quite long.  
> >
> > FWIW, I had AI churn on trying to make this work on a physical 82576 as
> > I have several of these in my local machines as sort of the defacto,
> > readily available SR-IOV NIC.  The AI got up to 30/35 tests passing but
> > is currently stuck that the queues stall in the mix-and-match test when
> > it's trying to DMA from an unmapped IOVA.  So far none of the in-band
> > methods to kick the queues seem to work, I'm not sure if we'll need to
> > resort to an FLR.
> >
> > I'd be happy to send the changes it's made so far if you want to
> > validate and incorporate, or have any thoughts to kicking it after the
> > IOMMU fault.  Some of the changes are related to timeouts, where QEMU
> > loopback is actually faster than bare metal since the physical  queues
> > run at 1Gbps even in loopback mode.
> >
> > I'll also plant the seed that if we do have outstanding issues for a
> > driver that binds to a real world device, but only works on the
> > emulated version of that device... how do we handle that?  In part, I
> > think it's emulated in QEMU because it is so ubiquitous.  I'm also
> > hoping to use the same device for the new SR-IOV selftests.  Thanks,
> >
> > Alex  
> 
> I'm glad you're interested in this as well!
> 
> Unfortunately (and ironically) I don't have access to a physical device.
> 
> Regarding driver support for the real vs. emulated device, I think we
> should prioritize supporting the emulated version. This approach
> unlocks the ability for anyone to run VFIO/Live Update tests without
> needing specific hardware. Once that's done, other folks can add
> patches to update the driver if they want to use the physical device.
> What do you think?
> 
> I plan to add SR-IOV support to this driver in a separate series after
> the driver merges.

I've got it working now, I'll need to do some cleanup and verification,
but it's not too bad (imo), several places where QEMU emulation only
supports one mode and we don't fully configure the device or don't
account for physical hardware limitations or timing.  The most
significant change is in resolving the issue above, that once the queue
gets wedged from DMA errors, VFIO_DEVICE_RESET unfortunately seems to
be the most straightforward mechanism to get it unstuck.  That may
result in some initialization refactoring and plumbing to restore the
interrupt state. Thanks,

Alex

  reply	other threads:[~2026-05-14 22:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 21:18 [PATCH] vfio: selftests: Add driver for IGB QEMU device Josh Hilke
2026-05-11 23:45 ` David Matlack
2026-05-13  2:12   ` Josh Hilke
2026-05-13 18:49     ` Josh Hilke
2026-05-13 23:33       ` David Matlack
2026-05-14 16:28         ` Alex Williamson
2026-05-14 21:49           ` Josh Hilke
2026-05-14 22:48             ` Alex Williamson [this message]
2026-05-14 22:58               ` David Matlack
2026-05-14 23:16               ` Josh Hilke
2026-05-13 23:29     ` David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514164844.2698802c@shazbot.org \
    --to=alex@shazbot.org \
    --cc=anthony.l.nguyen@intel.com \
    --cc=dmatlack@google.com \
    --cc=jgg@nvidia.com \
    --cc=jrhilke@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox