* [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Introduce VFIO_DEVICE_FEATURE_P2P_REGISTER to the VFIO_DEVICE_FEATURE
ioctl. This feature allows a privileged userspace process to register
a specific PCI BAR with the kernel's PCI P2P DMA provider framework.
Unlike the standard VFIO dmabuf exporter, this feature leverages the
pci-p2pdma infrastructure to manufacture ZONE_DEVICE struct pages for
the BAR.
Standard VFIO mmap() is not supported for BARs registered via this
interface. Users are instead expected to use the native p2pmem sysfs
interface for memory allocation and management.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
include/uapi/linux/vfio.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..adbac3f965eb 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,21 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET register a PCI BAR with the kernel
+ * P2P DMA subsystem (pci-p2pdma).
+ *
+ * Once a BAR is registered, it will be added to the device's P2P
+ * pool and can be allocated via the standard sysfs p2pmem/allocate
+ * interface.
+ *
+ * Note: Standard VFIO mmap() of the BAR will be blocked once it is
+ * registered for native P2P.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+#define VFIO_DEVICE_FEATURE_P2P_REGISTER 13
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 2/5] vfio/pci: Implement ZONE_DEVICE-backed P2P registration
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Implement the VFIO_DEVICE_FEATURE_P2P_REGISTER. A bitmask is added to
track the registered regions.
Post-registration, the BAR is added to the device's P2P pool and is
available to be used with standard page-based APIs.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/vfio/pci/vfio_pci_core.c | 39 ++++++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 1 +
2 files changed, 40 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a28f1e99362c..1e922e3aaeb3 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1554,6 +1554,43 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_p2p_register(struct vfio_pci_core_device *vdev,
+ u32 flags, u32 __user *arg,
+ size_t argsz)
+{
+ u32 bar_index;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(bar_index));
+ if (ret != 1)
+ return ret;
+
+ if (copy_from_user(&bar_index, arg, sizeof(bar_index)))
+ return -EFAULT;
+
+ if (bar_index >= PCI_STD_NUM_BARS)
+ return -EINVAL;
+
+ if (!(pci_resource_flags(vdev->pdev, bar_index) & IORESOURCE_MEM))
+ return -EINVAL;
+
+ /* Already registered */
+ if (vdev->p2p_registered_bars & (1 << bar_index))
+ return 0;
+
+ ret = pci_p2pdma_add_resource(vdev->pdev, bar_index, 0, 0);
+ if (ret && ret != -EEXIST)
+ return ret;
+
+ vdev->p2p_registered_bars |= (1 << bar_index);
+
+ pci_info(vdev->pdev, "BAR %d registered for ZONE_DEVICE P2P\n",
+ bar_index);
+
+ return 0;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1572,6 +1609,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_P2P_REGISTER:
+ return vfio_pci_core_feature_p2p_register(vdev, flags, arg, argsz);
default:
return -ENOTTY;
}
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 5fc6ce4dd786..cae7f069e5b6 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -112,6 +112,7 @@ struct vfio_pci_core_device {
struct vfio_pci_region *region;
u8 msi_qmax;
u8 msix_bar;
+ u8 p2p_registered_bars;
u16 msix_size;
u32 msix_offset;
u32 rbar[7];
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Update vfio_pci_core_mmap() & vfio_pci_core_feature_dma_buf() to
enforce mutual exclusivity with ZONE_DEVICE P2P registration.
If a BAR has been registered for native P2P (via
VFIO_DEVICE_FEATURE_P2P_REGISTER), subsequent requests to mmap the BAR
or export it as a DMABUF are rejected with -EBUSY.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/vfio/pci/vfio_pci_core.c | 7 +++++++
drivers/vfio/pci/vfio_pci_dmabuf.c | 6 ++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1e922e3aaeb3..9cf494b765e7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1828,6 +1828,13 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
if (!vdev->bar_mmap_supported[index])
return -EINVAL;
+ if (vdev->p2p_registered_bars & (1 << index)) {
+ pci_warn(vdev->pdev,
+ "BAR %d registered for P2P. Use sysfs to allocate\n",
+ index);
+ return -EBUSY;
+ }
+
phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
req_len = vma->vm_end - vma->vm_start;
pgoff = vma->vm_pgoff &
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index c16f460c01d6..6635a8681291 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -251,6 +251,12 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
return -ENODEV;
+ if (vdev->p2p_registered_bars & (1 << get_dma_buf.region_index)) {
+ pci_warn(vdev->pdev, "BAR %d registered for native P2P. Use sysfs for allocation.\n",
+ get_dma_buf.region_index);
+ return -EBUSY;
+ }
+
dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
sizeof(*dma_ranges));
if (IS_ERR(dma_ranges))
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
` (2 preceding siblings ...)
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
5 siblings, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Ensure that a PCI BAR cannot be registered for native ZONE_DEVICE P2P
if it is already being used as a source for exported DMABUFs.
Add region_index to struct vfio_pci_dma_buf to track the source BAR.
Introduce a new helper function, vfio_pci_bar_is_dmabuf() to scan the
device's active DMABUF list. When a registration request is received via
VFIO_DEVICE_FEATURE_P2P_REGISTER, VFIO rejects the request with -EBUSY
if any active DMABUF originates from the target BAR.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/vfio/pci/vfio_pci_core.c | 6 ++++++
drivers/vfio/pci/vfio_pci_dmabuf.c | 16 ++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 6 ++++++
3 files changed, 28 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 9cf494b765e7..7913b8916df9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1579,6 +1579,12 @@ static int vfio_pci_core_feature_p2p_register(struct vfio_pci_core_device *vdev,
if (vdev->p2p_registered_bars & (1 << bar_index))
return 0;
+ if (vfio_pci_bar_is_dmabuf(vdev, bar_index)) {
+ pci_warn(vdev->pdev, "BAR %d has active DMABUFs. Cannot register for P2P.\n",
+ bar_index);
+ return -EBUSY;
+ }
+
ret = pci_p2pdma_add_resource(vdev->pdev, bar_index, 0, 0);
if (ret && ret != -EEXIST)
return ret;
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 6635a8681291..194d74724422 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -17,6 +17,7 @@ struct vfio_pci_dma_buf {
struct phys_vec *phys_vec;
struct p2pdma_provider *provider;
u32 nr_ranges;
+ u32 region_index;
struct kref kref;
struct completion comp;
u8 revoked : 1;
@@ -279,6 +280,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
priv->vdev = vdev;
priv->nr_ranges = get_dma_buf.nr_ranges;
+ priv->region_index = get_dma_buf.region_index;
priv->size = length;
ret = vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider,
get_dma_buf.region_index,
@@ -410,3 +412,17 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
}
up_write(&vdev->memory_lock);
}
+
+bool vfio_pci_bar_is_dmabuf(struct vfio_pci_core_device *vdev, int index)
+{
+ struct vfio_pci_dma_buf *priv;
+
+ lockdep_assert_held(&vdev->memory_lock);
+
+ list_for_each_entry(priv, &vdev->dmabufs, dmabufs_elm) {
+ if (priv->region_index == index)
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_bar_is_dmabuf);
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index fca9d0dfac90..0a5da4d5edc4 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -120,6 +120,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
size_t argsz);
void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
+bool vfio_pci_bar_is_dmabuf(struct vfio_pci_core_device *vdev, int index);
#else
static inline int
vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
@@ -128,6 +129,11 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
{
return -ENOTTY;
}
+static inline bool vfio_pci_bar_is_dmabuf(struct vfio_pci_core_device *vdev,
+ int index)
+{
+ return false;
+}
static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
{
}
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
` (3 preceding siblings ...)
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
5 siblings, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Introduce pci_p2pdma_remove_resource() to allow manual teardown of a
device's P2P DMA pool. The new API enables exclusive owners of a device,
such as vfio-pci, to cleanly release P2P resources during session closure
or hardware reset.
Update vfio-pci to call this function during vfio_pci_zap_bars(),
ensuring that any BARs registered with ZONE_DEVICE P2P are released.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/pci/p2pdma.c | 38 ++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 5 +++++
include/linux/pci-p2pdma.h | 1 +
3 files changed, 44 insertions(+)
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index adb17a4f6939..2a48ffefa01c 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -26,6 +26,7 @@ struct pci_p2pdma {
bool p2pmem_published;
struct xarray map_types;
struct p2pdma_provider mem[PCI_STD_NUM_BARS];
+ struct pci_p2pdma_pagemap *pagemaps[PCI_STD_NUM_BARS];
};
struct pci_p2pdma_pagemap {
@@ -453,6 +454,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
if (error)
goto pages_free;
+ p2pdma->pagemaps[bar] = p2p_pgmap;
+
pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n",
pgmap->range.start, pgmap->range.end);
@@ -466,6 +469,41 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
}
EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
+/**
+ * pci_p2pdma_remove_resource - remove all p2p memory for a device
+ * @pdev: the device to remove the memory from
+ *
+ * Tear down the entire p2p DMA pool for the device. Zap any existinga
+ * userspace mappings of the p2pmem/allocate file
+ */
+void pci_p2pdma_remove_resource(struct pci_dev *pdev)
+{
+ struct pci_p2pdma *p2pdma;
+ int i;
+
+ p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
+ if (!p2pdma || !p2pdma->pool)
+ return;
+
+ for (i = 0; i < PCI_STD_NUM_BARS; i++) {
+ if (p2pdma->pagemaps[i]) {
+ devm_release_action(&pdev->dev, pci_p2pdma_unmap_mappings,
+ p2pdma->pagemaps[i]);
+ devm_memunmap_pages(&pdev->dev, &p2pdma->pagemaps[i]->pgmap);
+ devm_kfree(&pdev->dev, p2pdma->pagemaps[i]);
+ p2pdma->pagemaps[i] = NULL;
+ }
+ }
+
+ gen_pool_destroy(p2pdma->pool);
+ p2pdma->pool = NULL;
+
+ sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
+
+ pci_info(pdev, "removed all peer-to-peer DMA memory\n");
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_remove_resource);
+
/*
* Note this function returns the parent PCI device with a
* reference taken. It is the caller's responsibility to drop
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7913b8916df9..7b58cb344408 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1704,6 +1704,11 @@ static void vfio_pci_zap_bars(struct vfio_pci_core_device *vdev)
loff_t len = end - start;
unmap_mapping_range(core_vdev->inode->i_mapping, start, len, true);
+
+ if (vdev->p2p_registered_bars) {
+ pci_p2pdma_remove_resource(vdev->pdev);
+ vdev->p2p_registered_bars = 0;
+ }
}
void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev)
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 873de20a2247..14ee2e59a43e 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -72,6 +72,7 @@ int pcim_p2pdma_init(struct pci_dev *pdev);
struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar);
int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset);
+void pci_p2pdma_remove_resource(struct pci_dev *pdev);
int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
int num_clients, bool verbose);
struct pci_dev *pci_p2pmem_find_many(struct device **clients, int num_clients);
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
` (4 preceding siblings ...)
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
@ 2026-06-10 16:28 ` Jason Gunthorpe
2026-06-10 18:32 ` Leon Romanovsky
2026-06-11 14:40 ` Pranjal Shrivastava
5 siblings, 2 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2026-06-10 16:28 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: linux-pci, linux-kernel, kvm, Bjorn Helgaas, Logan Gunthorpe,
Alex Williamson, Kevin Tian, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> Users utilize the standard sysfs p2pmem/allocate interface for managing
> memory slices once a BAR is registered.
I'm shocked someone wants to use API, what are you expecting to do
with it??
> An alternative implementation has been explored which integrates with the
> ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> registering a BAR as a system-wide P2P provider, VFIO optionally
> allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
That's probably more sensible but you can't have a DMABUF mmap
actually install non-special memory. The native vfio mmap still can,
but not mmap on the dmabuf fd. That's still workable, just keep in
mind.
What do you even intend to do with this? With the new work to tie
dmabuf directly into io_uring I really wonder if this is worth doing
for VFIO?
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
@ 2026-06-10 18:32 ` Leon Romanovsky
2026-06-11 14:40 ` Pranjal Shrivastava
1 sibling, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2026-06-10 18:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pranjal Shrivastava, linux-pci, linux-kernel, kvm, Bjorn Helgaas,
Logan Gunthorpe, Alex Williamson, Kevin Tian, Ankit Agrawal,
Matt Evans, Vivek Kasireddy, Shivaji Kant, Samiullah Khawaja
On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
>
> > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > memory slices once a BAR is registered.
>
> I'm shocked someone wants to use API, what are you expecting to do
> with it??
I was under impression what we all want to move from that API.
Thanks
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
2026-06-10 18:32 ` Leon Romanovsky
@ 2026-06-11 14:40 ` Pranjal Shrivastava
2026-06-11 14:43 ` Pranjal Shrivastava
2026-06-11 22:14 ` Jason Gunthorpe
1 sibling, 2 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-11 14:40 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-pci, linux-kernel, kvm, Bjorn Helgaas, Logan Gunthorpe,
Alex Williamson, Kevin Tian, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
>
> > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > memory slices once a BAR is registered.
>
> I'm shocked someone wants to use API, what are you expecting to do
> with it??
Our primary use-case is PCIe BAR (DDR / HBM) -> NFS via P2PDMA while the
PCIe device is managed by a user-space driver based on vfio-pci. While
kernel drivers (e.g.drm) can register BARs with ZONE_DEVICE natively to
enable this, VFIO currently lacks an equivalent mechanism.
>
> > An alternative implementation has been explored which integrates with the
> > ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> > registering a BAR as a system-wide P2P provider, VFIO optionally
> > allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> > new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
>
> That's probably more sensible but you can't have a DMABUF mmap
> actually install non-special memory. The native vfio mmap still can,
> but not mmap on the dmabuf fd. That's still workable, just keep in
> mind.
Ack. I guess, we could have a separate mmap path in case of BARs that are
struct page backed which doesn't go through the dmabuf exporter.
That said, we don't mind moving away from the old API if VFIO mmap can
support this, we don't mind open-coding with devmem_remap_pages.
Effectively, we just need a struct page-backed BAR exporter, preferably,
via VFIO.
>
> What do you even intend to do with this? With the new work to tie
> dmabuf directly into io_uring I really wonder if this is worth doing
> for VFIO?
I agree that Pavel's io_uring series is great for the Block Layer [1],
However, it doesn't help with NFS + O_DIRECT which still relies on
struct page for DMA mapping. We have a series to modernize NFS to support
P2PDMA [2][3] and while I understand native dmabuf read/writes are coming
into play, their integration into NFS is a distant future (which probably
we'll have a stab at once Pavel's series settles).
Thus, we'd like to provide a standard, VFS-compatible P2P path via VFIO
we're willing to help maintain this if needed.
Thanks,
Praan
[1] https://lore.kernel.org/all/cover.1777475843.git.asml.silence@gmail.com/
[2] https://lore.kernel.org/all/20260603053033.3300318-1-praan@google.com/
[3] [RFC] https://lore.kernel.org/all/20260401194501.2269200-1-praan@google.com/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-11 14:40 ` Pranjal Shrivastava
@ 2026-06-11 14:43 ` Pranjal Shrivastava
2026-06-11 22:14 ` Jason Gunthorpe
1 sibling, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-11 14:43 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-pci, linux-kernel, kvm, Bjorn Helgaas, Logan Gunthorpe,
Alex Williamson, Kevin Tian, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
On Thu, Jun 11, 2026 at 02:40:17PM +0000, Pranjal Shrivastava wrote:
> On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> >
[...]
> > > An alternative implementation has been explored which integrates with the
> > > ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> > > registering a BAR as a system-wide P2P provider, VFIO optionally
> > > allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> > > new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
> >
> > That's probably more sensible but you can't have a DMABUF mmap
> > actually install non-special memory. The native vfio mmap still can,
> > but not mmap on the dmabuf fd. That's still workable, just keep in
> > mind.
>
> Ack. I guess, we could have a separate mmap path in case of BARs that are
> struct page backed which doesn't go through the dmabuf exporter.
>
> That said, we don't mind moving away from the old API if VFIO mmap can
> support this, we don't mind open-coding with devmem_remap_pages.
Minor correction, I meant: devm_memremap_pages
Thanks,
Praan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-11 14:40 ` Pranjal Shrivastava
2026-06-11 14:43 ` Pranjal Shrivastava
@ 2026-06-11 22:14 ` Jason Gunthorpe
2026-06-12 14:50 ` Pranjal Shrivastava
1 sibling, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2026-06-11 22:14 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: linux-pci, linux-kernel, kvm, Bjorn Helgaas, Logan Gunthorpe,
Alex Williamson, Kevin Tian, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
On Thu, Jun 11, 2026 at 02:40:17PM +0000, Pranjal Shrivastava wrote:
> On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> >
> > > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > > memory slices once a BAR is registered.
> >
> > I'm shocked someone wants to use API, what are you expecting to do
> > with it??
>
> Our primary use-case is PCIe BAR (DDR / HBM) -> NFS via P2PDMA while the
> PCIe device is managed by a user-space driver based on vfio-pci. While
> kernel drivers (e.g.drm) can register BARs with ZONE_DEVICE natively to
> enable this, VFIO currently lacks an equivalent mechanism.
I mean the weird sysfs mmap API. It is only useful if the device is
basically pure memory with no functionality. You can't even learn what
MMIO offset the returned allocation gives so it is almost completely
useless.
nvme could use it because CMB is pure memory and you reference it by
its MMIO address, but that doesn't apply to VFIO..
> > > An alternative implementation has been explored which integrates with the
> > > ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> > > registering a BAR as a system-wide P2P provider, VFIO optionally
> > > allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> > > new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
> >
> > That's probably more sensible but you can't have a DMABUF mmap
> > actually install non-special memory. The native vfio mmap still can,
> > but not mmap on the dmabuf fd. That's still workable, just keep in
> > mind.
>
> Ack. I guess, we could have a separate mmap path in case of BARs that are
> struct page backed which doesn't go through the dmabuf exporter.
The dmabuf export is perfectly fine, you just have to think very
carefully about the mmap path.
I suppose if you build the proper revocation fence for zone device
pages as part of the vfio implementation it would be OK for dmabuf
mmap to expose them as well since it would have the right lifecycle
model.
That's the tricky thing with zone_device, you have to be careful to
wait for all the page references to be put back at all the right
times.
Come to think of it, since the sysfs API cannot do that in the way
VFIO wants I actually think you can't use it..
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-11 22:14 ` Jason Gunthorpe
@ 2026-06-12 14:50 ` Pranjal Shrivastava
0 siblings, 0 replies; 12+ messages in thread
From: Pranjal Shrivastava @ 2026-06-12 14:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-pci, linux-kernel, kvm, Bjorn Helgaas, Logan Gunthorpe,
Alex Williamson, Kevin Tian, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
On Thu, Jun 11, 2026 at 07:14:47PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 11, 2026 at 02:40:17PM +0000, Pranjal Shrivastava wrote:
> > On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> > >
> > > > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > > > memory slices once a BAR is registered.
> > >
> > > I'm shocked someone wants to use API, what are you expecting to do
> > > with it??
> >
> > Our primary use-case is PCIe BAR (DDR / HBM) -> NFS via P2PDMA while the
> > PCIe device is managed by a user-space driver based on vfio-pci. While
> > kernel drivers (e.g.drm) can register BARs with ZONE_DEVICE natively to
> > enable this, VFIO currently lacks an equivalent mechanism.
>
> I mean the weird sysfs mmap API. It is only useful if the device is
> basically pure memory with no functionality. You can't even learn what
> MMIO offset the returned allocation gives so it is almost completely
> useless.
>
> nvme could use it because CMB is pure memory and you reference it by
> its MMIO address, but that doesn't apply to VFIO..
>
Ack, I agree, sysfs allocation doesn't provide the offset-level control.
I'll pivot entirely to the DMABUF approach.
> > > > An alternative implementation has been explored which integrates with the
> > > > ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> > > > registering a BAR as a system-wide P2P provider, VFIO optionally
> > > > allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> > > > new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
> > >
> > > That's probably more sensible but you can't have a DMABUF mmap
> > > actually install non-special memory. The native vfio mmap still can,
> > > but not mmap on the dmabuf fd. That's still workable, just keep in
> > > mind.
> >
> > Ack. I guess, we could have a separate mmap path in case of BARs that are
> > struct page backed which doesn't go through the dmabuf exporter.
>
> The dmabuf export is perfectly fine, you just have to think very
> carefully about the mmap path.
>
> I suppose if you build the proper revocation fence for zone device
> pages as part of the vfio implementation it would be OK for dmabuf
> mmap to expose them as well since it would have the right lifecycle
> model.
>
Ack, I'll move forward with adding a flag to request a ZONE_DEVICE-backed
DMABUF export (the 'Alternative Approach' mentioned in the cover letter).
And yes, I agree we need to ensure the mmap path is handled carefully
with the correct lifecycle in mind.
> That's the tricky thing with zone_device, you have to be careful to
> wait for all the page references to be put back at all the right
> times.
Yea, that's going to be tricky.. I'm thinking if we can have a zap model
there somehow? If the device is gone / going through a reset, we can
handle the refcounts accordingly?
>
> Come to think of it, since the sysfs API cannot do that in the way
> VFIO wants I actually think you can't use it..
Ack. Baking this into the VFIO DMABUF allows us to enforce the right
lifecycle.
My plan for RFC v2 is to add a flag like VFIO_DMA_BUF_FLAG_ZONE_DEVICE
to struct vfio_device_feature_dma_buf which allows the caller to opt-in
to ZONE_DEVICE backing specifically for that export.
Does this opt-in flag sound like a reasonable uAPI or do you see any
concerns with this direction?
Otherwise, as you noted, the lifecycle and the mmap path remain the main
problems to solve.
Thanks,
Praan
^ permalink raw reply [flat|nested] 12+ messages in thread