public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] vfio/pci: Add mmap() for DMABUFs
@ 2026-04-16 13:17 Matt Evans
  2026-04-16 13:17 ` [PATCH 1/9] vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put Matt Evans
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Matt Evans @ 2026-04-16 13:17 UTC (permalink / raw)
  To: Alex Williamson, Leon Romanovsky, Jason Gunthorpe, Alex Mastro,
	Christian König
  Cc: Mahmoud Adam, David Matlack, Björn Töpel, Sumit Semwal,
	Kevin Tian, Ankit Agrawal, Pranjal Shrivastava, Alistair Popple,
	Vivek Kasireddy, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, kvm


Hi all,


This series is based on previous RFCs/discussions:

Tech topic: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
RFCv1:	    https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
RFCv2:	    https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/


The background/rationale is covered in more detail in the RFC cover
letters.  The TL;DR is:

The goal is to enable userspace driver designs that use VFIO to export
DMABUFs representing subsets of PCI device BARs, and "vend" those
buffers from a primary process to other subordinate processes by fd.
These processes then mmap() the buffers and their access to the device
is isolated to the exported ranges.  This is an improvement on sharing
the VFIO device fd to subordinate processes, which would allow
unfettered access .

This is achieved by enabling mmap() of vfio-pci DMABUFs.  Second, a
new ioctl()-based revocation mechanism is added to allow the primary
process to forcibly revoke access to previously-shared BAR spans, even
if the subordinate processes haven't cleanly exited.

(The related topic of safe delegation of iommufd control to the
subordinate processes is not addressed here, and is follow-up work.)

As well as isolation and revocation, another advantage to accessing a
BAR through a VMA backed by a DMABUF is that it's straightforward to
create the buffer with access attributes, such as write-combining.


Notes on patches
================

Feedback from the RFCs requested that, instead of creating
DMABUF-specific vm_ops and .fault paths, to go the whole way and
migrate the existing VFIO PCI BAR mmap() to be backed by a DMABUF too,
resulting in a common vm_ops and fault handler for mmap()s of both the
VFIO device and explicitly-exported DMABUFs.  This has been done for
vfio-pci, but not sub-drivers (nvgrace-gpu's special-case mappings are
unchanged).


 vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put

   A bug fix to a related are, whose context is a depdency for later
   patches.


 vfio/pci: Add a helper to look up PFNs for DMABUFs
 vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA

   The first is for a DMABUF VMA fault handler to determine
   arbitrary-sized PFNs from ranges in DMABUF.  Secondly, refactor
   DMABUF export for use by the existing export feature and a new
   helper that creates a DMABUF corresponding to a VFIO BAR mmap()
   request.


 vfio/pci: Convert BAR mmap() to use a DMABUF

   The vfio-pci core mmap() creates a DMABUF with the helper, and the
   vm_ops fault handler uses the other helper to resolve the fault.
   Because this depends on DMABUF structs/code, CONFIG_VFIO_PCI_CORE
   needs to depend on CONFIG_DMA_SHARED_BUFFER.  The
   CONFIG_VFIO_PCI_DMABUF still conditionally enables the export
   support code.

   NOTE: The user mmap()s a device fd, but the resulting VMA's vm_file
   becomes that of the DMABUF which takes ownership of the device and
   puts it on release.  This maintains the existing behaviour of a VMA
   keeping the VFIO device open.

   BAR zapping then happens via the existing vfio_pci_dma_buf_move()
   path, which now needs to unmap PTEs in the DMABUF's address_space.


 vfio/pci: Provide a user-facing name for BAR mappings

   There was a request for decent debug naming in /proc/<pid>/maps
   etc.  comparable to the existing VFIO names: since the VMAs are
   DMABUFs, they have a "dmabuf:" prefix and can't be 100% identical
   to before.  This is a user-visible change, but this patch at least
   now gives us extra info on the BDF & BAR being mapped.


 vfio/pci: Clean up BAR zap and revocation

   In general (see NOTE!) the vfio_pci_zap_bars() is now obsolete,
   since it unmaps PTEs in the VFIO device address_space which is now
   unused.  This consolidates all calls (e.g. around reset) with the
   neighbouring vfio_pci_dma_buf_move()s into new functions, to
   revoke-zap/unrevoke.

   NOTE: the nvgrace-gpu driver continues to use its own private
   vm_ops, fault handler, etc. for its special memregions, and these
   DO still add PTEs to the VFIO device address_space.  So, a
   temporary flag, vdev->bar_needs_zap, maintains the old behaviour
   for this use.  At least this patch's consolidation makes it easy
   to remove the remaining zap when this need goes away.

   A FIXME is added: if nvgrace-gpu is converted to DMABUFs, remove
   the flag and final zap.


 vfio/pci: Support mmap() of a VFIO DMABUF

   Adds mmap() for a DMABUF fd exported from vfio-pci.

   It was a goal to keep the VFIO device fd lifetime behaviour
   unchanged with respect to the DMABUFs.  An application can close
   all device fds, and this will revoke/clean up all DMABUFs; no
   mappings or other access can be performed now.  When enabling
   mmap() of the DMABUFs, this means access through the VMA is also
   revoked.  This complicates the fault handler because whilst the
   DMABUF exists, it has no guarantee that the corresponding VFIO
   device is still alive.  Adds synchronisation ensuring the vdev is
   available before vdev->memory_lock is touched.

   (I decided against the alternative of preventing cleanup by holding
   the VFIO device open if any DMABUFs exist, because it's both a
   change of behaviour and less clean overall.)

   I've added a chonky comment in place, happy to clarify more if you
   have ideas.


 vfio/pci: Permanently revoke a DMABUF on request

   By weight, this is mostly a rename of revoked to an enum, status.
   There are now 3 states for a buffer, usable and revoked
   temporary/permanent.  A new VFIO device ioctl is added,
   VFIO_DEVICE_PCI_DMABUF_REVOKE, which passes a DMABUF (exported from
   that device) and permanently revokes it.  Thus a userspace driver
   can guarantee any downstream consumers of a shared fd are prevented
   from accessing a BAR range, and that range can be reused.

   The code doing revocation in vfio_pci_dma_buf_move() is moved,
   unchanged, to a common function for use by _move() and the new
   ioctl path.

   Q:  I can't think of a good reason to temporarily revoke/unrevoke
   buffers from userspace, so didn't add a 'flags' field to the ioctl
   struct.  Easy to add if people think it's worthwhile for future
   use.


 vfio/pci: Add mmap() attributes to DMABUF feature

   Reserves bits [31:28] in vfio_device_feature_dma_buf to allow a
   (CPU) mapping attribute to be specified for an exported set of
   ranges.  The default is the current UC, and a new flag can specify
   CPU access as WC.

   Q:  I've taken 4 bits; the intention is for this field to be a
   scalar not a bitmap (i.e. mutually-exclusive access properties).
   Perhaps 4 is a bit too many?


Testing
=======

(The [RFC ONLY] userspace test program, for QEMU edu-plus, has been
dropped, but can be found in the GitHub branch below.)

This code has been tested in mapping DMABUFs of single/multiple
ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
to work correctly.  I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...).  No regressions
observed on the VFIO selftests, or on our internal vfio-pci
applications.


End
===

This is based on -next (next-20260414 but will merge earlier), as it
depends on Leon's series "vfio: Wait for dma-buf invalidation to
complete":

https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566ad@houat/T/#m310cd07011e3a1461b6fda45e3f9b886ba76571a

These commits are on GitHub, along with "[RFC ONLY] selftests: vfio: Add
standalone vfio_dmabuf_mmap_test":

https://github.com/metamev/linux/compare/next-20260414...metamev:linux:dev/mev/vfio-dmabuf-mmap


Thanks for reading,


Matt


================================================================================
Change log:


v1:

 - Cleanup of the common DMABUF-aware VMA vm_ops fault handler and
   export code.
 - Fixed a lot of races, particularly faults racing with DMABUF
   cleanup (if the VFIO device fds close, for example).
 - Added nicer human-readable names for VFIO mmap() VMAs


RFCv2:  Respin based on the feedback/suggestions:
https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/

 - Transform the existing VFIO BAR mmap path to also use DMABUFs
   behind the scenes, and then simply share that code for
   explicitly-mapped DMABUFs.  Jason wanted to go that direction to
   enable iommufd VFIO type 1 emulation to pick up a DMABUF for an IO
   mapping.

 - Revoke buffers using a VFIO device fd ioctl

RFCv1:
https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/


Matt Evans (9):
  vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
  vfio/pci: Add a helper to look up PFNs for DMABUFs
  vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
  vfio/pci: Convert BAR mmap() to use a DMABUF
  vfio/pci: Provide a user-facing name for BAR mappings
  vfio/pci: Clean up BAR zap and revocation
  vfio/pci: Support mmap() of a VFIO DMABUF
  vfio/pci: Permanently revoke a DMABUF on request
  vfio/pci: Add mmap() attributes to DMABUF feature

 drivers/vfio/pci/Kconfig            |   3 +-
 drivers/vfio/pci/Makefile           |   3 +-
 drivers/vfio/pci/nvgrace-gpu/main.c |   5 +
 drivers/vfio/pci/vfio_pci_config.c  |  30 +-
 drivers/vfio/pci/vfio_pci_core.c    | 224 ++++++++++---
 drivers/vfio/pci/vfio_pci_dmabuf.c  | 500 +++++++++++++++++++++++-----
 drivers/vfio/pci/vfio_pci_priv.h    |  49 ++-
 include/linux/vfio_pci_core.h       |   1 +
 include/uapi/linux/vfio.h           |  42 ++-
 9 files changed, 690 insertions(+), 167 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-16 13:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16 13:17 [PATCH 0/9] vfio/pci: Add mmap() for DMABUFs Matt Evans
2026-04-16 13:17 ` [PATCH 1/9] vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put Matt Evans
2026-04-16 13:17 ` [PATCH 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs Matt Evans
2026-04-16 13:17 ` [PATCH 3/9] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Matt Evans
2026-04-16 13:17 ` [PATCH 4/9] vfio/pci: Convert BAR mmap() to use a DMABUF Matt Evans
2026-04-16 13:17 ` [PATCH 5/9] vfio/pci: Provide a user-facing name for BAR mappings Matt Evans
2026-04-16 13:17 ` [PATCH 6/9] vfio/pci: Clean up BAR zap and revocation Matt Evans
2026-04-16 13:17 ` [PATCH 7/9] vfio/pci: Support mmap() of a VFIO DMABUF Matt Evans
2026-04-16 13:17 ` [PATCH 8/9] vfio/pci: Permanently revoke a DMABUF on request Matt Evans
2026-04-16 13:17 ` [PATCH 9/9] vfio/pci: Add mmap() attributes to DMABUF feature Matt Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox