public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] vfio/pci: vfio device address space mapping
@ 2024-05-23 19:56 Alex Williamson
  2024-05-23 19:56 ` [PATCH 1/2] vfio: Create vfio_fs_type with inode per device Alex Williamson
  2024-05-23 19:56 ` [PATCH 2/2] vfio/pci: Use unmap_mapping_range() Alex Williamson
  0 siblings, 2 replies; 21+ messages in thread
From: Alex Williamson @ 2024-05-23 19:56 UTC (permalink / raw)
  To: kvm; +Cc: Alex Williamson, ajones, yan.y.zhao, kevin.tian, jgg, peterx

Upstream commit ba168b52bf8e ("mm: use rwsem assertion macros for
mmap_lock") changes a long standing lockdep issue where we call
io_remap_pfn_range() from within the vm_ops fault handler callback
without the proper write lock[1], generating a WARN_ON that we can
no longer stall to fix.

Attaching an address space to the vfio device file has been discussed
for some time as a way to make use of unmap_mapping_range(), which
provides an easy mechanism for zapping all vmas mapping a section of
the device file, for example mmaps to PCI BARs.  This means that we
no longer need to track those vmas for the purpose of zapping, which
removes a bunch of really ugly locking.  This vma list was also used
to avoid duplicate mappings for concurrent faults to the same vma.
As a result, we now use the more acceptable vmf_insert_pfn() which
actually manages locking correctly from the fault handler versus
io_remap_pfn_range().

The unfortunate side effect of this is that we now fault per page
rather than populate the entire vma with a single fault.  While
this overhead is fairly insignificant for average BAR sizes, it
is notable.  There's potentially quite ugly code we could use to
walk the vmas in the address space to proactively reinsert mappings
to avoid this, but the simpler solution seems to be to teach
vmf_insert_pfn_{pmd,pud}() about pfnmaps such that we can extend
the faulting behavior to include vm_ops huge_fault to both vastly
reduce the number of faults as well as reducing tlb usage.

The above commit seems to require an iterative solution where we
introduce the address space, remove the vma tracking, and make use
of vmf_insert_pfn() in the short term and work on the mm aspects to
enable huge_fault in the long term.

This series is intended for v6.10 given the WARN_ON now encountered
for all vfio-pci uses.  Thanks,

Alex

[1]https://lore.kernel.org/all/20230508125842.28193-1-yan.y.zhao@intel.com/

Alex Williamson (2):
  vfio: Create vfio_fs_type with inode per device
  vfio/pci: Use unmap_mapping_range()

 drivers/vfio/device_cdev.c       |   7 +
 drivers/vfio/group.c             |   7 +
 drivers/vfio/pci/vfio_pci_core.c | 256 +++++++------------------------
 drivers/vfio/vfio_main.c         |  44 ++++++
 include/linux/vfio.h             |   1 +
 include/linux/vfio_pci_core.h    |   2 -
 6 files changed, 115 insertions(+), 202 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-05-30  7:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 19:56 [PATCH 0/2] vfio/pci: vfio device address space mapping Alex Williamson
2024-05-23 19:56 ` [PATCH 1/2] vfio: Create vfio_fs_type with inode per device Alex Williamson
2024-05-24 13:24   ` Jason Gunthorpe
2024-05-29 23:59   ` Tian, Kevin
2024-05-23 19:56 ` [PATCH 2/2] vfio/pci: Use unmap_mapping_range() Alex Williamson
2024-05-24  0:39   ` Yan Zhao
2024-05-24  0:49     ` Peter Xu
2024-05-24  1:47       ` Yan Zhao
2024-05-28 18:42         ` Alex Williamson
2024-05-29  2:29           ` Yan Zhao
2024-05-29  3:12             ` Alex Williamson
2024-05-29  6:34               ` Yan Zhao
2024-05-29 16:50                 ` Alex Williamson
2024-05-30  7:46                   ` Yan Zhao
2024-05-24  8:40       ` Tian, Kevin
2024-05-24 13:22         ` Jason Gunthorpe
2024-05-24 23:15           ` Peter Xu
2024-05-24 13:42   ` Jason Gunthorpe
2024-05-30  0:09   ` Tian, Kevin
2024-05-30  2:22     ` Alex Williamson
2024-05-30  2:47       ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox