kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue
@ 2025-06-26 18:04 Aaron Lewis
  2025-06-26 18:04 ` [RFC PATCH 1/3] vfio: selftests: Allow run.sh to bind to more than one device Aaron Lewis
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Aaron Lewis @ 2025-06-26 18:04 UTC (permalink / raw)
  To: alex.williamson, bhelgaas, dmatlack, vipinsh
  Cc: kvm, seanjc, jrhilke, Aaron Lewis

This series is being sent as an RFC to help brainstorm the best way to
fix a latency issue it uncovers.

The crux of the issue is that when initializing multiple VFs from the
same PF the devices are reset serially rather than in parallel
regardless if they are initialized from different threads.  That happens
because a shared lock is acquired when vfio_df_ioctl_bind_iommufd() is
called, then a FLR (function level reset) is done which takes 100ms to
complete.  That in combination with trying to initialize many devices at
the same time results in a lot of wasted time.

While the PCI spec does specify that a FLR requires 100ms to ensure it
has time to complete, I don't see anything indicating that other VFs
can't be reset at the same time.

A couple of ideas on how to approach a fix are:

  1. See if the lock preventing the second thread from making forward
  progress can be sharded to only include the VF it protects.
  
  2. Do the FLR for the VF in probe() and close(device_fd) rather than in
  vfio_df_ioctl_bind_iommufd().

To demonstrate the problem the run script had to be extended to bind
multiple devices to the vfio-driver, not just one.  E.g.

  $ ./run.sh -d 0000:17:0c.1 -d 0000:17:0c.2 -d 0000:16:01.7 -s

Also included is a selftest and BPF script.  With those, the problem can
be reproduced with the output logging showing that one of the devices
takes >200ms to initialize despite running from different threads.

  $ VFIO_BDF_1=0000:17:0c.1 VFIO_BDF_2=0000:17:0c.2 ./vfio_flr_test
  [0x7f61bb888700] '0000:17:0c.2' initialized in 108.6ms.
  [0x7f61bc089700] '0000:17:0c.1' initialized in 212.3ms.

And the BPF script indicating that the latency issues are coming from the
mutex in vfio_df_ioctl_bind_iommufd().

  [pcie_flr] duration = 108ms
  [vfio_df_ioctl_bind_iommufd] duration = 108ms
  [pcie_flr] duration = 104ms
  [vfio_df_ioctl_bind_iommufd] duration = 212ms

  [__mutex_lock] duration = 103ms
  __mutex_lock+5
  vfio_df_ioctl_bind_iommufd+171
  __se_sys_ioctl+110
  do_syscall_64+109
  entry_SYSCALL_64_after_hwframe+120

This series can be applied on top of the VFIO selftests using the branch:
upstream/vfio/selftests/v1.

https://github.com/dmatlack/linux/tree/vfio/selftests/v1

Aaron Lewis (3):
  vfio: selftests: Allow run.sh to bind to more than one device
  vfio: selftests: Introduce the selftest vfio_flr_test
  vfio: selftests: Include a BPF script to pair with the selftest vfio_flr_test

 tools/testing/selftests/vfio/Makefile         |   1 +
 tools/testing/selftests/vfio/run.sh           |  73 +++++++----
 tools/testing/selftests/vfio/vfio_flr_test.c  | 120 ++++++++++++++++++
 .../testing/selftests/vfio/vfio_flr_trace.bt  |  83 ++++++++++++
 4 files changed, 251 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/vfio/vfio_flr_test.c
 create mode 100644 tools/testing/selftests/vfio/vfio_flr_trace.bt

-- 
2.50.0.727.gbf7dc18ff4-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-06-27 22:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 18:04 [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Aaron Lewis
2025-06-26 18:04 ` [RFC PATCH 1/3] vfio: selftests: Allow run.sh to bind to more than one device Aaron Lewis
2025-06-27 22:19   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 2/3] vfio: selftests: Introduce the selftest vfio_flr_test Aaron Lewis
2025-06-27 22:37   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 3/3] vfio: selftests: Include a BPF script to pair with " Aaron Lewis
2025-06-27 22:51   ` David Matlack
2025-06-26 19:26 ` [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Alex Williamson
2025-06-26 20:56   ` Jason Gunthorpe
2025-06-26 21:58     ` Aaron Lewis
2025-06-26 22:10       ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).