From: Matt Evans <mattev@meta.com>
To: "Alex Williamson" <alex@shazbot.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Alex Mastro" <amastro@fb.com>,
"Christian König" <christian.koenig@amd.com>
Cc: "Mahmoud Adam" <mngyadam@amazon.de>,
"David Matlack" <dmatlack@google.com>,
"Björn Töpel" <bjorn@kernel.org>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Kevin Tian" <kevin.tian@intel.com>,
"Ankit Agrawal" <ankita@nvidia.com>,
"Pranjal Shrivastava" <praan@google.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
linux-kernel@vger.kernel.org, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
kvm@vger.kernel.org
Subject: [PATCH 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs
Date: Thu, 16 Apr 2026 06:17:45 -0700 [thread overview]
Message-ID: <20260416131815.2729131-3-mattev@meta.com> (raw)
In-Reply-To: <20260416131815.2729131-1-mattev@meta.com>
Add vfio_pci_dma_buf_find_pfn(), which a VMA fault handler can use to
find a PFN.
This supports multi-range DMABUFs, which typically would be used to
represent scattered spans but might even represent overlapping or
aliasing spans of PFNs.
Because this is intended to be used in vfio_pci_core.c, we also need
to expose the struct vfio_pci_dma_buf in the vfio_pci_priv.h header.
Signed-off-by: Matt Evans <mattev@meta.com>
---
drivers/vfio/pci/vfio_pci_dmabuf.c | 124 ++++++++++++++++++++++++++---
drivers/vfio/pci/vfio_pci_priv.h | 19 +++++
2 files changed, 130 insertions(+), 13 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 04478b7415a0..8b6bae56bbf2 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -9,19 +9,6 @@
MODULE_IMPORT_NS("DMA_BUF");
-struct vfio_pci_dma_buf {
- struct dma_buf *dmabuf;
- struct vfio_pci_core_device *vdev;
- struct list_head dmabufs_elm;
- size_t size;
- struct phys_vec *phys_vec;
- struct p2pdma_provider *provider;
- u32 nr_ranges;
- struct kref kref;
- struct completion comp;
- u8 revoked : 1;
-};
-
static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
{
@@ -106,6 +93,117 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
.release = vfio_pci_dma_buf_release,
};
+int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ unsigned int order,
+ unsigned long *out_pfn)
+{
+ /*
+ * Given a VMA (start, end, pgoffs) and a fault address,
+ * search the corresponding DMABUF's phys_vec[] to find the
+ * range representing the address's offset into the VMA, and
+ * its PFN.
+ *
+ * The phys_vec[] ranges represent contiguous spans of VAs
+ * upwards from the buffer offset 0; the actual PFNs might be
+ * in any order, overlap/alias, etc. Calculate an offset of
+ * the desired page given VMA start/pgoff and address, then
+ * search upwards from 0 to find which span contains it.
+ *
+ * On success, a valid PFN for a page sized by 'order' is
+ * returned into out_pfn.
+ *
+ * Failure occurs if:
+ * - The page would cross the edge of the VMA
+ * - The page isn't entirely contained within a range
+ * - We find a range, but the final PFN isn't aligned to the
+ * requested order.
+ *
+ * (Upon failure, the caller is expected to try again with a
+ * smaller order; the tests above will always succeed for
+ * order=0 as the limit case.)
+ *
+ * It's suboptimal if DMABUFs are created with neigbouring
+ * ranges that are physically contiguous, since hugepages
+ * can't straddle range boundaries. (The construction of the
+ * ranges vector should merge such ranges.)
+ */
+
+ const unsigned long pagesize = PAGE_SIZE << order;
+ unsigned long rounded_page_addr = address & ~(pagesize - 1);
+ unsigned long rounded_page_end = rounded_page_addr + pagesize;
+ unsigned long buf_page_offset;
+ unsigned long buf_offset = 0;
+ unsigned int i;
+
+ if (rounded_page_addr < vma->vm_start || rounded_page_end > vma->vm_end) {
+ if (order > 0)
+ return -EAGAIN;
+
+ /* A fault address outside of the VMA is absurd. */
+ WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n",
+ address, vma->vm_start, vma->vm_end);
+ return -EFAULT;
+ }
+
+ if (unlikely(check_add_overflow(rounded_page_addr - vma->vm_start,
+ vma->vm_pgoff << PAGE_SHIFT, &buf_page_offset)))
+ return -EFAULT;
+
+ for (i = 0; i < vpdmabuf->nr_ranges; i++) {
+ size_t range_len = vpdmabuf->phys_vec[i].len;
+ phys_addr_t range_start = vpdmabuf->phys_vec[i].paddr;
+
+ /*
+ * If the current range starts after the page's span,
+ * this and any future range won't match. Bail early.
+ */
+ if (buf_page_offset + pagesize <= buf_offset)
+ break;
+
+ if (buf_page_offset >= buf_offset &&
+ buf_page_offset + pagesize <= buf_offset + range_len) {
+ /*
+ * The faulting page is wholly contained
+ * within the span represented by the range.
+ * Validate PFN alignment for the order:
+ */
+ unsigned long pfn = (range_start >> PAGE_SHIFT) +
+ ((buf_page_offset - buf_offset) >> PAGE_SHIFT);
+
+ if (IS_ALIGNED(pfn, 1 << order)) {
+ *out_pfn = pfn;
+ return 0;
+ }
+ /* Retry with smaller order */
+ return -EAGAIN;
+ }
+ buf_offset += range_len;
+ }
+
+ /*
+ * A hugepage straddling a range boundary will fail to match a
+ * range, but the address will (eventually) match when retried
+ * with a smaller page.
+ */
+ if (order > 0)
+ return -EAGAIN;
+
+ /*
+ * If we get here, the address fell outside of the span
+ * represented by the (concatenated) ranges. Setup of a
+ * mapping must ensure that the VMA is <= the total size of
+ * the ranges, so this should never happen. But, if it does,
+ * force SIGBUS for the access and warn.
+ */
+ WARN_ONCE(1, "No range for addr 0x%lx, order %d: VMA 0x%lx-0x%lx pgoff 0x%lx, %u ranges, size 0x%zx\n",
+ address, order, vma->vm_start, vma->vm_end, vma->vm_pgoff,
+ vpdmabuf->nr_ranges, vpdmabuf->size);
+
+ return -EFAULT;
+}
+
/*
* This is a temporary "private interconnect" between VFIO DMABUF and iommufd.
* It allows the two co-operating drivers to exchange the physical address of
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index fca9d0dfac90..317170a5b407 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -23,6 +23,19 @@ struct vfio_pci_ioeventfd {
bool test_mem;
};
+struct vfio_pci_dma_buf {
+ struct dma_buf *dmabuf;
+ struct vfio_pci_core_device *vdev;
+ struct list_head dmabufs_elm;
+ size_t size;
+ struct phys_vec *phys_vec;
+ struct p2pdma_provider *provider;
+ u32 nr_ranges;
+ struct kref kref;
+ struct completion comp;
+ u8 revoked : 1;
+};
+
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev);
void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev);
@@ -114,6 +127,12 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
}
+int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *vpdmabuf,
+ struct vm_area_struct *vma,
+ unsigned long address,
+ unsigned int order,
+ unsigned long *out_pfn);
+
#ifdef CONFIG_VFIO_PCI_DMABUF
int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
struct vfio_device_feature_dma_buf __user *arg,
--
2.47.3
next prev parent reply other threads:[~2026-04-17 7:06 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 13:17 [PATCH 0/9] vfio/pci: Add mmap() for DMABUFs Matt Evans
2026-04-16 13:17 ` [PATCH 1/9] vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put Matt Evans
2026-04-24 18:05 ` Jason Gunthorpe
2026-05-01 19:12 ` Alex Williamson
2026-05-06 13:53 ` Matt Evans
2026-05-06 15:29 ` Leon Romanovsky
2026-05-06 15:55 ` Matt Evans
2026-05-06 16:14 ` Leon Romanovsky
2026-05-06 16:42 ` Matt Evans
2026-04-16 13:17 ` Matt Evans [this message]
2026-04-24 18:15 ` [PATCH 2/9] vfio/pci: Add a helper to look up PFNs for DMABUFs Jason Gunthorpe
2026-05-07 15:48 ` Matt Evans
2026-04-16 13:17 ` [PATCH 3/9] vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA Matt Evans
2026-04-24 18:24 ` Jason Gunthorpe
2026-04-30 16:47 ` Matt Evans
2026-04-30 17:11 ` Jason Gunthorpe
2026-05-05 18:13 ` Matt Evans
2026-05-06 19:03 ` Matt Evans
2026-04-16 13:17 ` [PATCH 4/9] vfio/pci: Convert BAR mmap() to use a DMABUF Matt Evans
2026-05-01 22:19 ` Alex Williamson
2026-05-04 7:40 ` Jason Gunthorpe
2026-05-05 10:49 ` Leon Romanovsky
2026-05-05 14:50 ` Alex Williamson
2026-05-05 14:59 ` Jason Gunthorpe
2026-05-06 5:35 ` Leon Romanovsky
2026-05-14 17:52 ` Matt Evans
2026-04-16 13:17 ` [PATCH 5/9] vfio/pci: Provide a user-facing name for BAR mappings Matt Evans
2026-04-24 18:26 ` Jason Gunthorpe
2026-05-01 22:44 ` Alex Williamson
2026-05-07 16:56 ` Matt Evans
2026-05-07 17:17 ` Matt Evans
2026-04-16 13:17 ` [PATCH 6/9] vfio/pci: Clean up BAR zap and revocation Matt Evans
2026-05-01 23:19 ` Alex Williamson
2026-05-05 10:58 ` Leon Romanovsky
2026-04-16 13:17 ` [PATCH 7/9] vfio/pci: Support mmap() of a VFIO DMABUF Matt Evans
2026-04-24 18:30 ` Jason Gunthorpe
2026-05-07 16:09 ` Matt Evans
2026-04-16 13:17 ` [PATCH 8/9] vfio/pci: Permanently revoke a DMABUF on request Matt Evans
2026-04-16 13:17 ` [PATCH 9/9] vfio/pci: Add mmap() attributes to DMABUF feature Matt Evans
2026-04-24 18:31 ` Jason Gunthorpe
2026-04-26 10:52 ` Leon Romanovsky
2026-04-27 14:36 ` Alex Williamson
2026-05-11 15:30 ` Matt Evans
2026-05-11 17:51 ` Leon Romanovsky
2026-05-11 20:09 ` Alex Williamson
2026-05-12 17:51 ` Matt Evans
2026-05-13 18:27 ` Alex Williamson
2026-05-14 13:55 ` Matt Evans
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260416131815.2729131-3-mattev@meta.com \
--to=mattev@meta.com \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=bjorn@kernel.org \
--cc=christian.koenig@amd.com \
--cc=dmatlack@google.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=mngyadam@amazon.de \
--cc=praan@google.com \
--cc=sumit.semwal@linaro.org \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.