kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Aaron Lewis <aaronlewis@google.com>
Cc: bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	kvm@vger.kernel.org, seanjc@google.com, jrhilke@google.com,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue
Date: Thu, 26 Jun 2025 13:26:59 -0600	[thread overview]
Message-ID: <20250626132659.62178b7d.alex.williamson@redhat.com> (raw)
In-Reply-To: <20250626180424.632628-1-aaronlewis@google.com>

On Thu, 26 Jun 2025 18:04:21 +0000
Aaron Lewis <aaronlewis@google.com> wrote:

> This series is being sent as an RFC to help brainstorm the best way to
> fix a latency issue it uncovers.
> 
> The crux of the issue is that when initializing multiple VFs from the
> same PF the devices are reset serially rather than in parallel
> regardless if they are initialized from different threads.  That happens
> because a shared lock is acquired when vfio_df_ioctl_bind_iommufd() is
> called, then a FLR (function level reset) is done which takes 100ms to
> complete.  That in combination with trying to initialize many devices at
> the same time results in a lot of wasted time.
> 
> While the PCI spec does specify that a FLR requires 100ms to ensure it
> has time to complete, I don't see anything indicating that other VFs
> can't be reset at the same time.
> 
> A couple of ideas on how to approach a fix are:
> 
>   1. See if the lock preventing the second thread from making forward
>   progress can be sharded to only include the VF it protects.

I think we're talking about the dev_set mutex here, right?  I think this
is just an oversight.  The original lock that dev_set replaced was
devised to manage the set of devices affected by the same bus or slot
reset.  I believe we've held the same semantics though and VFs just
happen to fall through to the default of a bus-based dev_set.
Obviously we cannot do a bus or slot reset of a VF, we only have FLR,
and it especially doesn't make sense that VFs on the same "bus" from
different PFs share this mutex.

Seems like we just need this:

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 6328c3a05bcd..261a6dc5a5fc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
 		return -EBUSY;
 	}
 
-	if (pci_is_root_bus(pdev->bus)) {
+	if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
 		ret = vfio_assign_device_set(&vdev->vdev, vdev);
 	} else if (!pci_probe_reset_slot(pdev->slot)) {
 		ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);

Does that allow fully parallelized resets?

Meanwhile this is a good use case of selftests and further impetus for
me to get it working.  Thanks,

Alex

>   
>   2. Do the FLR for the VF in probe() and close(device_fd) rather than in
>   vfio_df_ioctl_bind_iommufd().
> 
> To demonstrate the problem the run script had to be extended to bind
> multiple devices to the vfio-driver, not just one.  E.g.
> 
>   $ ./run.sh -d 0000:17:0c.1 -d 0000:17:0c.2 -d 0000:16:01.7 -s
> 
> Also included is a selftest and BPF script.  With those, the problem can
> be reproduced with the output logging showing that one of the devices
> takes >200ms to initialize despite running from different threads.
> 
>   $ VFIO_BDF_1=0000:17:0c.1 VFIO_BDF_2=0000:17:0c.2 ./vfio_flr_test
>   [0x7f61bb888700] '0000:17:0c.2' initialized in 108.6ms.
>   [0x7f61bc089700] '0000:17:0c.1' initialized in 212.3ms.
> 
> And the BPF script indicating that the latency issues are coming from the
> mutex in vfio_df_ioctl_bind_iommufd().
> 
>   [pcie_flr] duration = 108ms
>   [vfio_df_ioctl_bind_iommufd] duration = 108ms
>   [pcie_flr] duration = 104ms
>   [vfio_df_ioctl_bind_iommufd] duration = 212ms
> 
>   [__mutex_lock] duration = 103ms
>   __mutex_lock+5
>   vfio_df_ioctl_bind_iommufd+171
>   __se_sys_ioctl+110
>   do_syscall_64+109
>   entry_SYSCALL_64_after_hwframe+120
> 
> This series can be applied on top of the VFIO selftests using the branch:
> upstream/vfio/selftests/v1.
> 
> https://github.com/dmatlack/linux/tree/vfio/selftests/v1
> 
> Aaron Lewis (3):
>   vfio: selftests: Allow run.sh to bind to more than one device
>   vfio: selftests: Introduce the selftest vfio_flr_test
>   vfio: selftests: Include a BPF script to pair with the selftest vfio_flr_test
> 
>  tools/testing/selftests/vfio/Makefile         |   1 +
>  tools/testing/selftests/vfio/run.sh           |  73 +++++++----
>  tools/testing/selftests/vfio/vfio_flr_test.c  | 120 ++++++++++++++++++
>  .../testing/selftests/vfio/vfio_flr_trace.bt  |  83 ++++++++++++
>  4 files changed, 251 insertions(+), 26 deletions(-)
>  create mode 100644 tools/testing/selftests/vfio/vfio_flr_test.c
>  create mode 100644 tools/testing/selftests/vfio/vfio_flr_trace.bt
> 


  parent reply	other threads:[~2025-06-26 19:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26 18:04 [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Aaron Lewis
2025-06-26 18:04 ` [RFC PATCH 1/3] vfio: selftests: Allow run.sh to bind to more than one device Aaron Lewis
2025-06-27 22:19   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 2/3] vfio: selftests: Introduce the selftest vfio_flr_test Aaron Lewis
2025-06-27 22:37   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 3/3] vfio: selftests: Include a BPF script to pair with " Aaron Lewis
2025-06-27 22:51   ` David Matlack
2025-06-26 19:26 ` Alex Williamson [this message]
2025-06-26 20:56   ` [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Jason Gunthorpe
2025-06-26 21:58     ` Aaron Lewis
2025-06-26 22:10       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250626132659.62178b7d.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=aaronlewis@google.com \
    --cc=bhelgaas@google.com \
    --cc=dmatlack@google.com \
    --cc=jgg@nvidia.com \
    --cc=jrhilke@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).