All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Aaron Lewis <aaronlewis@google.com>,
	bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	kvm@vger.kernel.org, seanjc@google.com, jrhilke@google.com
Subject: Re: [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue
Date: Thu, 26 Jun 2025 17:56:08 -0300	[thread overview]
Message-ID: <20250626205608.GO167785@nvidia.com> (raw)
In-Reply-To: <20250626132659.62178b7d.alex.williamson@redhat.com>

On Thu, Jun 26, 2025 at 01:26:59PM -0600, Alex Williamson wrote:
> On Thu, 26 Jun 2025 18:04:21 +0000
> Aaron Lewis <aaronlewis@google.com> wrote:
> 
> > This series is being sent as an RFC to help brainstorm the best way to
> > fix a latency issue it uncovers.
> > 
> > The crux of the issue is that when initializing multiple VFs from the
> > same PF the devices are reset serially rather than in parallel
> > regardless if they are initialized from different threads.  That happens
> > because a shared lock is acquired when vfio_df_ioctl_bind_iommufd() is
> > called, then a FLR (function level reset) is done which takes 100ms to
> > complete.  That in combination with trying to initialize many devices at
> > the same time results in a lot of wasted time.
> > 
> > While the PCI spec does specify that a FLR requires 100ms to ensure it
> > has time to complete, I don't see anything indicating that other VFs
> > can't be reset at the same time.
> > 
> > A couple of ideas on how to approach a fix are:
> > 
> >   1. See if the lock preventing the second thread from making forward
> >   progress can be sharded to only include the VF it protects.
> 
> I think we're talking about the dev_set mutex here, right?  I think this
> is just an oversight.  The original lock that dev_set replaced was
> devised to manage the set of devices affected by the same bus or slot
> reset.  I believe we've held the same semantics though and VFs just
> happen to fall through to the default of a bus-based dev_set.
> Obviously we cannot do a bus or slot reset of a VF, we only have FLR,
> and it especially doesn't make sense that VFs on the same "bus" from
> different PFs share this mutex.

It certainly could be.. But I am feeling a bit wary and would want to
check this carefully. We ended up using the devset for more things -
need to check where everything ended up.

Off hand I don't recall any reason why the VF should be part of the
dev set..

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 6328c3a05bcd..261a6dc5a5fc 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
>  		return -EBUSY;
>  	}
>  
> -	if (pci_is_root_bus(pdev->bus)) {
> +	if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
>  		ret = vfio_assign_device_set(&vdev->vdev, vdev);
>  	} else if (!pci_probe_reset_slot(pdev->slot)) {
>  		ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);
> 
> Does that allow fully parallelized resets?

I forget all the details but if we are sure the reset of a VF is only
the VF then that does seem like the right direction

Jason

  reply	other threads:[~2025-06-26 20:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26 18:04 [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Aaron Lewis
2025-06-26 18:04 ` [RFC PATCH 1/3] vfio: selftests: Allow run.sh to bind to more than one device Aaron Lewis
2025-06-27 22:19   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 2/3] vfio: selftests: Introduce the selftest vfio_flr_test Aaron Lewis
2025-06-27 22:37   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 3/3] vfio: selftests: Include a BPF script to pair with " Aaron Lewis
2025-06-27 22:51   ` David Matlack
2025-06-26 19:26 ` [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Alex Williamson
2025-06-26 20:56   ` Jason Gunthorpe [this message]
2025-06-26 21:58     ` Aaron Lewis
2025-06-26 22:10       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250626205608.GO167785@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aaronlewis@google.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=dmatlack@google.com \
    --cc=jrhilke@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.