From: Alex Williamson <alex.williamson@redhat.com>
To: Aaron Lewis <aaronlewis@google.com>
Cc: bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
kvm@vger.kernel.org, seanjc@google.com, jrhilke@google.com,
Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue
Date: Thu, 26 Jun 2025 13:26:59 -0600 [thread overview]
Message-ID: <20250626132659.62178b7d.alex.williamson@redhat.com> (raw)
In-Reply-To: <20250626180424.632628-1-aaronlewis@google.com>
On Thu, 26 Jun 2025 18:04:21 +0000
Aaron Lewis <aaronlewis@google.com> wrote:
> This series is being sent as an RFC to help brainstorm the best way to
> fix a latency issue it uncovers.
>
> The crux of the issue is that when initializing multiple VFs from the
> same PF the devices are reset serially rather than in parallel
> regardless if they are initialized from different threads. That happens
> because a shared lock is acquired when vfio_df_ioctl_bind_iommufd() is
> called, then a FLR (function level reset) is done which takes 100ms to
> complete. That in combination with trying to initialize many devices at
> the same time results in a lot of wasted time.
>
> While the PCI spec does specify that a FLR requires 100ms to ensure it
> has time to complete, I don't see anything indicating that other VFs
> can't be reset at the same time.
>
> A couple of ideas on how to approach a fix are:
>
> 1. See if the lock preventing the second thread from making forward
> progress can be sharded to only include the VF it protects.
I think we're talking about the dev_set mutex here, right? I think this
is just an oversight. The original lock that dev_set replaced was
devised to manage the set of devices affected by the same bus or slot
reset. I believe we've held the same semantics though and VFs just
happen to fall through to the default of a bus-based dev_set.
Obviously we cannot do a bus or slot reset of a VF, we only have FLR,
and it especially doesn't make sense that VFs on the same "bus" from
different PFs share this mutex.
Seems like we just need this:
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 6328c3a05bcd..261a6dc5a5fc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
return -EBUSY;
}
- if (pci_is_root_bus(pdev->bus)) {
+ if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
ret = vfio_assign_device_set(&vdev->vdev, vdev);
} else if (!pci_probe_reset_slot(pdev->slot)) {
ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);
Does that allow fully parallelized resets?
Meanwhile this is a good use case of selftests and further impetus for
me to get it working. Thanks,
Alex
>
> 2. Do the FLR for the VF in probe() and close(device_fd) rather than in
> vfio_df_ioctl_bind_iommufd().
>
> To demonstrate the problem the run script had to be extended to bind
> multiple devices to the vfio-driver, not just one. E.g.
>
> $ ./run.sh -d 0000:17:0c.1 -d 0000:17:0c.2 -d 0000:16:01.7 -s
>
> Also included is a selftest and BPF script. With those, the problem can
> be reproduced with the output logging showing that one of the devices
> takes >200ms to initialize despite running from different threads.
>
> $ VFIO_BDF_1=0000:17:0c.1 VFIO_BDF_2=0000:17:0c.2 ./vfio_flr_test
> [0x7f61bb888700] '0000:17:0c.2' initialized in 108.6ms.
> [0x7f61bc089700] '0000:17:0c.1' initialized in 212.3ms.
>
> And the BPF script indicating that the latency issues are coming from the
> mutex in vfio_df_ioctl_bind_iommufd().
>
> [pcie_flr] duration = 108ms
> [vfio_df_ioctl_bind_iommufd] duration = 108ms
> [pcie_flr] duration = 104ms
> [vfio_df_ioctl_bind_iommufd] duration = 212ms
>
> [__mutex_lock] duration = 103ms
> __mutex_lock+5
> vfio_df_ioctl_bind_iommufd+171
> __se_sys_ioctl+110
> do_syscall_64+109
> entry_SYSCALL_64_after_hwframe+120
>
> This series can be applied on top of the VFIO selftests using the branch:
> upstream/vfio/selftests/v1.
>
> https://github.com/dmatlack/linux/tree/vfio/selftests/v1
>
> Aaron Lewis (3):
> vfio: selftests: Allow run.sh to bind to more than one device
> vfio: selftests: Introduce the selftest vfio_flr_test
> vfio: selftests: Include a BPF script to pair with the selftest vfio_flr_test
>
> tools/testing/selftests/vfio/Makefile | 1 +
> tools/testing/selftests/vfio/run.sh | 73 +++++++----
> tools/testing/selftests/vfio/vfio_flr_test.c | 120 ++++++++++++++++++
> .../testing/selftests/vfio/vfio_flr_trace.bt | 83 ++++++++++++
> 4 files changed, 251 insertions(+), 26 deletions(-)
> create mode 100644 tools/testing/selftests/vfio/vfio_flr_test.c
> create mode 100644 tools/testing/selftests/vfio/vfio_flr_trace.bt
>
next prev parent reply other threads:[~2025-06-26 19:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-26 18:04 [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Aaron Lewis
2025-06-26 18:04 ` [RFC PATCH 1/3] vfio: selftests: Allow run.sh to bind to more than one device Aaron Lewis
2025-06-27 22:19 ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 2/3] vfio: selftests: Introduce the selftest vfio_flr_test Aaron Lewis
2025-06-27 22:37 ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 3/3] vfio: selftests: Include a BPF script to pair with " Aaron Lewis
2025-06-27 22:51 ` David Matlack
2025-06-26 19:26 ` Alex Williamson [this message]
2025-06-26 20:56 ` [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Jason Gunthorpe
2025-06-26 21:58 ` Aaron Lewis
2025-06-26 22:10 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250626132659.62178b7d.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=aaronlewis@google.com \
--cc=bhelgaas@google.com \
--cc=dmatlack@google.com \
--cc=jgg@nvidia.com \
--cc=jrhilke@google.com \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
--cc=vipinsh@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).