* [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
@ 2026-06-10 15:18 Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Introduce ZONE_DEVICE backing for VFIO-exposed PCIe BARs.
Currently, kernel drivers can register their BARs with the P2PDMA
subsystem to enable high-performance, page-backed P2P DMA. However, when
a device is bound to vfio-pci, this capability is missing.
As a framework, VFIO should support the same set of capabilities
available to kernel drivers, provided appropriate isolation and
lifecycles are maintained.
This series adds support for VFIO to register BARs as page-backed P2P
providers, enabling userspace drivers to perform zero-copy P2P DMA via
standard POSIX APIs (e.g., O_DIRECT) which require struct page metadata.
Design
======
The proposed design enables VFIO to leverage the kernel's pci-p2pdma
infrastructure, delegating memory management and slicing to the managed
P2P pool.
1. ZONE_DEVICE Registration
A new feature, VFIO_DEVICE_FEATURE_P2P_REGISTER, is introduced to trigger
pci_p2pdma_add_resource() for a target BAR which creates the ZONE_DEVICE
struct pages and registers the BAR as a standard P2P provider.
2. Mutual Exclusivity Policy
A strict mutual exclusion policy is enforced:
- Registered BARs: Standard mmap() and DMABUF export are blocked.
- Exported BARs: Native P2P registration is blocked.
Users utilize the standard sysfs p2pmem/allocate interface for managing
memory slices once a BAR is registered.
3. Lifecycle and Resource Reclamation
A new helper is introduced: pci_p2pdma_remove_resource(), is implemented
to support manual teardown of the P2P pool. VFIO invokes this helper
during device reset or session termination to ensure the pool is
destroyed and the ZONE_DEVICE pages are unmapped.
This is on github: https://github.com/pran005/linux/tree/vfio-p2p
Alternative Approach: Page-backed DMABUF Export
================================================
An alternative implementation has been explored which integrates with the
ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
registering a BAR as a system-wide P2P provider, VFIO optionally
allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
A proof-of-concept of this approach is available at:
https://github.com/pran005/linux/tree/vfio-p2p-alt
[1] https://lore.kernel.org/all/20260527102319.100128-1-mattev@meta.com/
Pranjal Shrivastava (5):
vfio: Add UAPI for ZONE_DEVICE-backed P2P registration
vfio/pci: Implement ZONE_DEVICE-backed P2P registration
vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs
vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs
PCI/P2PDMA: Introduce a helper to release P2P resources
drivers/pci/p2pdma.c | 38 ++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 57 ++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_dmabuf.c | 22 ++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 6 ++++
include/linux/pci-p2pdma.h | 1 +
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 15 ++++++++
7 files changed, 140 insertions(+)
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:31 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Introduce VFIO_DEVICE_FEATURE_P2P_REGISTER to the VFIO_DEVICE_FEATURE
ioctl. This feature allows a privileged userspace process to register
a specific PCI BAR with the kernel's PCI P2P DMA provider framework.
Unlike the standard VFIO dmabuf exporter, this feature leverages the
pci-p2pdma infrastructure to manufacture ZONE_DEVICE struct pages for
the BAR.
Standard VFIO mmap() is not supported for BARs registered via this
interface. Users are instead expected to use the native p2pmem sysfs
interface for memory allocation and management.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
include/uapi/linux/vfio.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..adbac3f965eb 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,21 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET register a PCI BAR with the kernel
+ * P2P DMA subsystem (pci-p2pdma).
+ *
+ * Once a BAR is registered, it will be added to the device's P2P
+ * pool and can be allocated via the standard sysfs p2pmem/allocate
+ * interface.
+ *
+ * Note: Standard VFIO mmap() of the BAR will be blocked once it is
+ * registered for native P2P.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+#define VFIO_DEVICE_FEATURE_P2P_REGISTER 13
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH 2/5] vfio/pci: Implement ZONE_DEVICE-backed P2P registration
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:35 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
` (3 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Implement the VFIO_DEVICE_FEATURE_P2P_REGISTER. A bitmask is added to
track the registered regions.
Post-registration, the BAR is added to the device's P2P pool and is
available to be used with standard page-based APIs.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/vfio/pci/vfio_pci_core.c | 39 ++++++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 1 +
2 files changed, 40 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a28f1e99362c..1e922e3aaeb3 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1554,6 +1554,43 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
return 0;
}
+static int vfio_pci_core_feature_p2p_register(struct vfio_pci_core_device *vdev,
+ u32 flags, u32 __user *arg,
+ size_t argsz)
+{
+ u32 bar_index;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(bar_index));
+ if (ret != 1)
+ return ret;
+
+ if (copy_from_user(&bar_index, arg, sizeof(bar_index)))
+ return -EFAULT;
+
+ if (bar_index >= PCI_STD_NUM_BARS)
+ return -EINVAL;
+
+ if (!(pci_resource_flags(vdev->pdev, bar_index) & IORESOURCE_MEM))
+ return -EINVAL;
+
+ /* Already registered */
+ if (vdev->p2p_registered_bars & (1 << bar_index))
+ return 0;
+
+ ret = pci_p2pdma_add_resource(vdev->pdev, bar_index, 0, 0);
+ if (ret && ret != -EEXIST)
+ return ret;
+
+ vdev->p2p_registered_bars |= (1 << bar_index);
+
+ pci_info(vdev->pdev, "BAR %d registered for ZONE_DEVICE P2P\n",
+ bar_index);
+
+ return 0;
+}
+
int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
void __user *arg, size_t argsz)
{
@@ -1572,6 +1609,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_P2P_REGISTER:
+ return vfio_pci_core_feature_p2p_register(vdev, flags, arg, argsz);
default:
return -ENOTTY;
}
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 5fc6ce4dd786..cae7f069e5b6 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -112,6 +112,7 @@ struct vfio_pci_core_device {
struct vfio_pci_region *region;
u8 msi_qmax;
u8 msix_bar;
+ u8 p2p_registered_bars;
u16 msix_size;
u32 msix_offset;
u32 rbar[7];
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:40 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
` (2 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Update vfio_pci_core_mmap() & vfio_pci_core_feature_dma_buf() to
enforce mutual exclusivity with ZONE_DEVICE P2P registration.
If a BAR has been registered for native P2P (via
VFIO_DEVICE_FEATURE_P2P_REGISTER), subsequent requests to mmap the BAR
or export it as a DMABUF are rejected with -EBUSY.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/vfio/pci/vfio_pci_core.c | 7 +++++++
drivers/vfio/pci/vfio_pci_dmabuf.c | 6 ++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1e922e3aaeb3..9cf494b765e7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1828,6 +1828,13 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
if (!vdev->bar_mmap_supported[index])
return -EINVAL;
+ if (vdev->p2p_registered_bars & (1 << index)) {
+ pci_warn(vdev->pdev,
+ "BAR %d registered for P2P. Use sysfs to allocate\n",
+ index);
+ return -EBUSY;
+ }
+
phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
req_len = vma->vm_end - vma->vm_start;
pgoff = vma->vm_pgoff &
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index c16f460c01d6..6635a8681291 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -251,6 +251,12 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
return -ENODEV;
+ if (vdev->p2p_registered_bars & (1 << get_dma_buf.region_index)) {
+ pci_warn(vdev->pdev, "BAR %d registered for native P2P. Use sysfs for allocation.\n",
+ get_dma_buf.region_index);
+ return -EBUSY;
+ }
+
dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
sizeof(*dma_ranges));
if (IS_ERR(dma_ranges))
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
` (2 preceding siblings ...)
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:44 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
5 siblings, 1 reply; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Ensure that a PCI BAR cannot be registered for native ZONE_DEVICE P2P
if it is already being used as a source for exported DMABUFs.
Add region_index to struct vfio_pci_dma_buf to track the source BAR.
Introduce a new helper function, vfio_pci_bar_is_dmabuf() to scan the
device's active DMABUF list. When a registration request is received via
VFIO_DEVICE_FEATURE_P2P_REGISTER, VFIO rejects the request with -EBUSY
if any active DMABUF originates from the target BAR.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/vfio/pci/vfio_pci_core.c | 6 ++++++
drivers/vfio/pci/vfio_pci_dmabuf.c | 16 ++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 6 ++++++
3 files changed, 28 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 9cf494b765e7..7913b8916df9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1579,6 +1579,12 @@ static int vfio_pci_core_feature_p2p_register(struct vfio_pci_core_device *vdev,
if (vdev->p2p_registered_bars & (1 << bar_index))
return 0;
+ if (vfio_pci_bar_is_dmabuf(vdev, bar_index)) {
+ pci_warn(vdev->pdev, "BAR %d has active DMABUFs. Cannot register for P2P.\n",
+ bar_index);
+ return -EBUSY;
+ }
+
ret = pci_p2pdma_add_resource(vdev->pdev, bar_index, 0, 0);
if (ret && ret != -EEXIST)
return ret;
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 6635a8681291..194d74724422 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -17,6 +17,7 @@ struct vfio_pci_dma_buf {
struct phys_vec *phys_vec;
struct p2pdma_provider *provider;
u32 nr_ranges;
+ u32 region_index;
struct kref kref;
struct completion comp;
u8 revoked : 1;
@@ -279,6 +280,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
priv->vdev = vdev;
priv->nr_ranges = get_dma_buf.nr_ranges;
+ priv->region_index = get_dma_buf.region_index;
priv->size = length;
ret = vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider,
get_dma_buf.region_index,
@@ -410,3 +412,17 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
}
up_write(&vdev->memory_lock);
}
+
+bool vfio_pci_bar_is_dmabuf(struct vfio_pci_core_device *vdev, int index)
+{
+ struct vfio_pci_dma_buf *priv;
+
+ lockdep_assert_held(&vdev->memory_lock);
+
+ list_for_each_entry(priv, &vdev->dmabufs, dmabufs_elm) {
+ if (priv->region_index == index)
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_bar_is_dmabuf);
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index fca9d0dfac90..0a5da4d5edc4 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -120,6 +120,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
size_t argsz);
void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
+bool vfio_pci_bar_is_dmabuf(struct vfio_pci_core_device *vdev, int index);
#else
static inline int
vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
@@ -128,6 +129,11 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
{
return -ENOTTY;
}
+static inline bool vfio_pci_bar_is_dmabuf(struct vfio_pci_core_device *vdev,
+ int index)
+{
+ return false;
+}
static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
{
}
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
` (3 preceding siblings ...)
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
@ 2026-06-10 15:18 ` Pranjal Shrivastava
2026-06-10 15:54 ` sashiko-bot
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
5 siblings, 1 reply; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-10 15:18 UTC (permalink / raw)
To: linux-pci, linux-kernel, kvm
Cc: Bjorn Helgaas, Logan Gunthorpe, Alex Williamson, Jason Gunthorpe,
Kevin Tian, Pranjal Shrivastava, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
Introduce pci_p2pdma_remove_resource() to allow manual teardown of a
device's P2P DMA pool. The new API enables exclusive owners of a device,
such as vfio-pci, to cleanly release P2P resources during session closure
or hardware reset.
Update vfio-pci to call this function during vfio_pci_zap_bars(),
ensuring that any BARs registered with ZONE_DEVICE P2P are released.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/pci/p2pdma.c | 38 ++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_core.c | 5 +++++
include/linux/pci-p2pdma.h | 1 +
3 files changed, 44 insertions(+)
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index adb17a4f6939..2a48ffefa01c 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -26,6 +26,7 @@ struct pci_p2pdma {
bool p2pmem_published;
struct xarray map_types;
struct p2pdma_provider mem[PCI_STD_NUM_BARS];
+ struct pci_p2pdma_pagemap *pagemaps[PCI_STD_NUM_BARS];
};
struct pci_p2pdma_pagemap {
@@ -453,6 +454,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
if (error)
goto pages_free;
+ p2pdma->pagemaps[bar] = p2p_pgmap;
+
pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n",
pgmap->range.start, pgmap->range.end);
@@ -466,6 +469,41 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
}
EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
+/**
+ * pci_p2pdma_remove_resource - remove all p2p memory for a device
+ * @pdev: the device to remove the memory from
+ *
+ * Tear down the entire p2p DMA pool for the device. Zap any existinga
+ * userspace mappings of the p2pmem/allocate file
+ */
+void pci_p2pdma_remove_resource(struct pci_dev *pdev)
+{
+ struct pci_p2pdma *p2pdma;
+ int i;
+
+ p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
+ if (!p2pdma || !p2pdma->pool)
+ return;
+
+ for (i = 0; i < PCI_STD_NUM_BARS; i++) {
+ if (p2pdma->pagemaps[i]) {
+ devm_release_action(&pdev->dev, pci_p2pdma_unmap_mappings,
+ p2pdma->pagemaps[i]);
+ devm_memunmap_pages(&pdev->dev, &p2pdma->pagemaps[i]->pgmap);
+ devm_kfree(&pdev->dev, p2pdma->pagemaps[i]);
+ p2pdma->pagemaps[i] = NULL;
+ }
+ }
+
+ gen_pool_destroy(p2pdma->pool);
+ p2pdma->pool = NULL;
+
+ sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
+
+ pci_info(pdev, "removed all peer-to-peer DMA memory\n");
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_remove_resource);
+
/*
* Note this function returns the parent PCI device with a
* reference taken. It is the caller's responsibility to drop
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7913b8916df9..7b58cb344408 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1704,6 +1704,11 @@ static void vfio_pci_zap_bars(struct vfio_pci_core_device *vdev)
loff_t len = end - start;
unmap_mapping_range(core_vdev->inode->i_mapping, start, len, true);
+
+ if (vdev->p2p_registered_bars) {
+ pci_p2pdma_remove_resource(vdev->pdev);
+ vdev->p2p_registered_bars = 0;
+ }
}
void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev)
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 873de20a2247..14ee2e59a43e 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -72,6 +72,7 @@ int pcim_p2pdma_init(struct pci_dev *pdev);
struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar);
int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset);
+void pci_p2pdma_remove_resource(struct pci_dev *pdev);
int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients,
int num_clients, bool verbose);
struct pci_dev *pci_p2pmem_find_many(struct device **clients, int num_clients);
--
2.54.0.1099.g489fc7bff1-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
@ 2026-06-10 15:31 ` sashiko-bot
0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-06-10 15:31 UTC (permalink / raw)
To: Pranjal Shrivastava; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Dynamic creation of ZONE_DEVICE pages for VFIO-assigned BARs introduces intractable lifetime synchronization issues, leading to Denial of Service.
- [Low] The newly introduced `VFIO_DEVICE_FEATURE_P2P_REGISTER` macro lacks a corresponding UAPI structure to specify the target PCI BAR index.
--
commit c98c2e1f824821a8273fbcf6834c8210ddf76e75
Author: Pranjal Shrivastava <praan@google.com>
vfio: Add UAPI for ZONE_DEVICE-backed P2P registration
Introduce VFIO_DEVICE_FEATURE_P2P_REGISTER to the VFIO_DEVICE_FEATURE
ioctl. This feature allows a privileged userspace process to register
a specific PCI BAR with the kernel's PCI P2P DMA provider framework.
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5de618a3a5eee..adbac3f965eb4 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
[ ... ]
> +/**
> + * Upon VFIO_DEVICE_FEATURE_SET register a PCI BAR with the kernel
> + * P2P DMA subsystem (pci-p2pdma).
> + *
> + * Once a BAR is registered, it will be added to the device's P2P
> + * pool and can be allocated via the standard sysfs p2pmem/allocate
> + * interface.
> + *
> + * Note: Standard VFIO mmap() of the BAR will be blocked once it is
> + * registered for native P2P.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +#define VFIO_DEVICE_FEATURE_P2P_REGISTER 13
[Severity: High]
Does this feature introduce a regression with lifetime synchronization
issues?
When VFIO_DEVICE_FEATURE_P2P_REGISTER is used, ZONE_DEVICE pages are
manufactured for the PCI BAR via devm_memremap_pages(). These pages can be
mapped into userspace using the sysfs p2pmem/allocate interface.
If userspace pins these pages via get_user_pages() (e.g., through an
O_DIRECT read, io_uring, or passing them to another device), the page
reference counts are elevated. If the VFIO device is subsequently unbound
or the file descriptor is closed while the pin is active, the driver
teardown triggers:
vfio_pci_zap_bars()
-> pci_p2pdma_remove_resource()
-> devm_memunmap_pages()
The unmap path kills the percpu refcount and waits synchronously via
wait_for_completion() for all references to drop. Because the pages remain
pinned by GUP, the references will never reach zero, causing the kernel
thread to hang indefinitely.
This blocks driver unbinding and holds global locks (such as
vdev->memory_lock and driver core locks), causing a denial of service.
[Severity: Low]
Does this UAPI addition lack a corresponding structure to specify the target
PCI BAR index?
The implementation in vfio_pci_core_feature_p2p_register() appears to
silently assume a primitive u32 payload when calling
copy_from_user(&bar_index, arg, sizeof(bar_index)), creating a mismatch
between documented UAPI and actual kernel expectations.
Standard VFIO UAPI practices require explicit, strongly-typed structures
(e.g., struct vfio_device_feature_p2p_register { __u32 bar_index; };) to
accompany feature macros so that the ABI is well-documented and safe for
userspace developers to consume.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610151853.3608948-1-praan@google.com?part=1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 2/5] vfio/pci: Implement ZONE_DEVICE-backed P2P registration
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
@ 2026-06-10 15:35 ` sashiko-bot
0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-06-10 15:35 UTC (permalink / raw)
To: Pranjal Shrivastava; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [Critical] Registering a BAR for P2P DMA does not check if the BAR contains the MSI-X table, allowing userspace to bypass VFIO MSI-X protections.
- [High] Missing synchronization in `vfio_pci_core_feature_p2p_register` allows concurrent ioctls to corrupt state and leak devm memory.
- [High] Missing synchronization and state conflict between VFIO DMABUF and P2P DMA pool registration.
- [High] P2P BAR registration state and devres allocations persist across VFIO file descriptor closures, breaking device isolation.
Pre-existing issues:
- [High] The error path in `pci_p2pdma_add_resource()` frees `p2p_pgmap` without revoking the devres unmap action.
--
commit d5df237db4cc33de44c5c2d95b83b4fda6fa953a
Author: Pranjal Shrivastava <praan@google.com>
vfio/pci: Implement ZONE_DEVICE-backed P2P registration
Implement the VFIO_DEVICE_FEATURE_P2P_REGISTER. A bitmask is added to
track the registered regions.
Post-registration, the BAR is added to the device's P2P pool and is
available to be used with standard page-based APIs.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a28f1e99362c1..1e922e3aaeb3a 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1554,6 +1554,43 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
> return 0;
> }
>
> +static int vfio_pci_core_feature_p2p_register(struct vfio_pci_core_device *vdev,
> + u32 flags, u32 __user *arg,
> + size_t argsz)
> +{
> + u32 bar_index;
> + int ret;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
> + sizeof(bar_index));
> + if (ret != 1)
> + return ret;
> +
> + if (copy_from_user(&bar_index, arg, sizeof(bar_index)))
> + return -EFAULT;
> +
> + if (bar_index >= PCI_STD_NUM_BARS)
> + return -EINVAL;
> +
> + if (!(pci_resource_flags(vdev->pdev, bar_index) & IORESOURCE_MEM))
> + return -EINVAL;
[Severity: Critical]
Does this code allow userspace to bypass VFIO MSI-X protections?
If the requested BAR contains the device's MSI-X table (bar_index ==
vdev->msix_bar), passing the entire BAR to pci_p2pdma_add_resource() exposes
the whole physical memory range to the global P2P DMA pool.
A malicious guest or userspace process could instruct another device to
perform P2P DMA writes directly to the MSI-X table, completely bypassing
VFIO's software protections and allowing arbitrary host interrupt injection.
> +
> + /* Already registered */
> + if (vdev->p2p_registered_bars & (1 << bar_index))
> + return 0;
[Severity: High]
Does this code have a missing synchronization issue that allows concurrent
ioctls to corrupt state and leak devm memory?
The ioctl handler modifies vdev->p2p_registered_bars and calls
pci_p2pdma_add_resource() without holding any serialization lock (such as
vdev->igate or vdev->memory_lock).
Since ioctls can be invoked concurrently on the same file descriptor, multiple
threads can bypass the unregistered check simultaneously, leading to
overlapping duplicate devres allocations in pci_p2pdma_add_resource(), sysfs
duplicate name errors, and bitmask corruption.
[Severity: High]
Is there a missing synchronization and state conflict between VFIO DMABUF and
P2P DMA pool registration?
If vfio_pci_bar_is_dmabuf(vdev, bar_index) is called here to check the DMABUF
state, it is invoked without holding vdev->memory_lock. Because
vfio_pci_bar_is_dmabuf() explicitly uses
lockdep_assert_held(&vdev->memory_lock), executing this unprotected ioctl
will immediately trigger a lockdep kernel splat. Additionally, without the
lock, the state validation is racy and allows conflicting mappings.
> +
> + ret = pci_p2pdma_add_resource(vdev->pdev, bar_index, 0, 0);
[Severity: High]
This is a pre-existing issue, but does the error path in
pci_p2pdma_add_resource() free p2p_pgmap without revoking the devres unmap
action?
In pci_p2pdma_add_resource(), if gen_pool_add_owner() fails, the code jumps
to pages_free and calls devm_kfree(&pdev->dev, p2p_pgmap). However, it fails
to call devm_remove_action() for the earlier successfully added
pci_p2pdma_unmap_mappings action.
When the device is later detached, devres will execute the unmap action on a
freed p2p_pgmap pointer, causing a Use-After-Free.
> + if (ret && ret != -EEXIST)
> + return ret;
> +
> + vdev->p2p_registered_bars |= (1 << bar_index);
[Severity: High]
Do the P2P BAR registration state and devres allocations persist across VFIO
file descriptor closures, breaking device isolation?
The pci_p2pdma_add_resource() function ties memory allocations and sysfs
entries to the underlying PCI device (&pdev->dev) using devres. However, when
a user closes the VFIO file descriptor, vfio_pci_core_close_device() does not
invoke vfio_pci_zap_bars() to revoke the P2P memory resources or clear
vdev->p2p_registered_bars.
Thus, the P2P sysfs allocations leak across VM sessions until the VFIO driver
unbinds, and subsequent users inherit a polluted, already-registered state.
> +
> + pci_info(vdev->pdev, "BAR %d registered for ZONE_DEVICE P2P\n",
> + bar_index);
> +
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610151853.3608948-1-praan@google.com?part=2
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
@ 2026-06-10 15:40 ` sashiko-bot
0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-06-10 15:40 UTC (permalink / raw)
To: Pranjal Shrivastava; +Cc: linux-pci, kvm
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Mutual exclusivity between P2P registration and userspace mapping is only enforced in one direction, allowing trivial bypass.
- [High] Unsynchronized read-modify-write of the `p2p_registered_bars` bitmask can lead to lost updates, bypassing mutual exclusivity.
- [High] Time-Of-Check to Time-Of-Use (TOCTOU) race condition between P2P registration and mmap/dmabuf export.
- [Medium] Un-ratelimited `pci_warn()` calls can be triggered by unprivileged users to flood the kernel log.
--
commit 34d8b3426f32368f81ae51a26bcaccb5ac86b1c0
Author: Pranjal Shrivastava <praan@google.com>
vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs
Update vfio_pci_core_mmap() & vfio_pci_core_feature_dma_buf() to
enforce mutual exclusivity with ZONE_DEVICE P2P registration.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1e922e3aaeb3a..9cf494b765e72 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1828,6 +1828,13 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> if (!vdev->bar_mmap_supported[index])
> return -EINVAL;
>
> + if (vdev->p2p_registered_bars & (1 << index)) {
[Severity: High]
Does this check adequately enforce mutual exclusivity if the user performs
operations in the reverse order?
While this prevents mmap() after P2P registration, if a user calls mmap()
first and then issues the VFIO_DEVICE_FEATURE_P2P_REGISTER ioctl, it appears
they can have both active simultaneously. Looking at
vfio_pci_core_feature_p2p_register(), there is a check for dmabufs but it
lacks a check for active mmaps.
[Severity: High]
Is this check susceptible to a time-of-check to time-of-use race?
This check of vdev->p2p_registered_bars is performed locklessly. A concurrent
thread could potentially call VFIO_DEVICE_FEATURE_P2P_REGISTER immediately
after this check succeeds, but before the mapping is fully established.
[Severity: High]
Are updates to vdev->p2p_registered_bars synchronized to prevent lost updates?
Looking at vfio_pci_core_feature_p2p_register(), the bitmask is updated using
a non-atomic read-modify-write operation:
vdev->p2p_registered_bars |= (1 << bar_index);
If multiple threads register different BARs concurrently without locks, could
this cause a lost update? A lost update would leave the bit clear, allowing
subsequent mmaps to incorrectly bypass this mutual exclusivity check.
> + pci_warn(vdev->pdev,
> + "BAR %d registered for P2P. Use sysfs to allocate\n",
> + index);
[Severity: Medium]
Could this un-ratelimited warning allow an unprivileged user to flood the
kernel log?
Since this is triggered directly by the mmap() syscall, a user with access
to the VFIO device could repeatedly call mmap() in a loop, displacing
legitimate logs. Would it be better to use pci_warn_ratelimited() here?
> + return -EBUSY;
> + }
> +
> phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
> req_len = vma->vm_end - vma->vm_start;
> pgoff = vma->vm_pgoff &
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index c16f460c01d68..6635a86812918 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -251,6 +251,12 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
> return -ENODEV;
>
> + if (vdev->p2p_registered_bars & (1 << get_dma_buf.region_index)) {
[Severity: High]
Could a concurrent thread bypass mutual exclusivity here due to a lack of
synchronization?
This check happens before acquiring vdev->memory_lock. If another thread
concurrently registers the BAR for P2P after this check passes but before
the dmabuf export completes, could both operations succeed concurrently?
> + pci_warn(vdev->pdev, "BAR %d registered for native P2P. Use sysfs for allocation.\n",
[Severity: Medium]
Similarly to the mmap() path, could this un-ratelimited warning be
triggered in a loop by userspace calling the VFIO_DEVICE_FEATURE ioctl?
> + get_dma_buf.region_index);
> + return -EBUSY;
> + }
> +
> dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
> sizeof(*dma_ranges));
> if (IS_ERR(dma_ranges))
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610151853.3608948-1-praan@google.com?part=3
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
@ 2026-06-10 15:44 ` sashiko-bot
0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-06-10 15:44 UTC (permalink / raw)
To: Pranjal Shrivastava; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] vfio_pci_bar_is_dmabuf() is called without holding vdev->memory_lock, triggering a lockdep splat and causing a list traversal data race.
- [High] Time-of-Check to Time-of-Use (TOCTOU) race allows concurrent registration of both a native P2P resource and a DMABUF for the same PCI BAR.
- [Medium] Concurrent calls to vfio_pci_core_feature_p2p_register() can result in lost bit updates to vdev->p2p_registered_bars due to non-atomic read-modify-write.
--
commit 2c70089a882afbe16090c5204ddae6a5a5a6ab65
Author: Pranjal Shrivastava <praan@google.com>
vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs
This commit adds a check to ensure that a PCI region cannot be registered
for native peer-to-peer if it is already backing active exported dma-bufs.
It introduces vfio_pci_bar_is_dmabuf() to scan the device's active dma-buf
list and rejects peer-to-peer registration with -EBUSY if a match is found.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 9cf494b765e72..7913b8916df91 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1579,6 +1579,12 @@ static int vfio_pci_core_feature_p2p_register(struct vfio_pci_core_device *vdev,
> if (vdev->p2p_registered_bars & (1 << bar_index))
> return 0;
>
> + if (vfio_pci_bar_is_dmabuf(vdev, bar_index)) {
[Severity: High]
Does this function call need to be protected by vdev->memory_lock?
Looking at the newly added vfio_pci_bar_is_dmabuf() helper, it asserts
that the memory_lock is held:
lockdep_assert_held(&vdev->memory_lock);
Calling it here unlocked will trigger a lockdep warning. Since it traverses
the vdev->dmabufs list, could this also lead to a data race or list
corruption if another thread modifies the list concurrently?
[Severity: High]
Could this lockless check result in a time-of-check to time-of-use race?
If vfio_pci_core_feature_p2p_register() and vfio_pci_core_feature_dma_buf()
are called concurrently for the same region, is it possible for one thread
to observe no active dma-bufs here, while the other thread concurrently
observes no peer-to-peer registration?
Both might proceed and commit their state, allowing the same region to
simultaneously back both exported dma-bufs and native peer-to-peer.
> + pci_warn(vdev->pdev, "BAR %d has active DMABUFs. Cannot register for P2P.\n",
> + bar_index);
> + return -EBUSY;
> + }
> +
> ret = pci_p2pdma_add_resource(vdev->pdev, bar_index, 0, 0);
> if (ret && ret != -EEXIST)
> return ret;
[Severity: Medium]
Is there a risk of lost updates when modifying the registration state
shortly after this check?
Looking at the rest of vfio_pci_core_feature_p2p_register():
vdev->p2p_registered_bars |= (1 << bar_index);
This performs a non-atomic read-modify-write on the bitmask without a lock.
If multiple threads concurrently register different regions for peer-to-peer,
could they race and overwrite each other's modifications, silently losing a
registration?
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610151853.3608948-1-praan@google.com?part=4
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
@ 2026-06-10 15:54 ` sashiko-bot
0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-06-10 15:54 UTC (permalink / raw)
To: Pranjal Shrivastava; +Cc: kvm, linux-pci
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [Critical] `pci_p2pdma_remove_resource()` destroys the P2P memory pool without RCU synchronization, causing Use-After-Free and NULL pointer dereferences.
- [High] `vfio_pci_core_feature_p2p_register()` modifies P2P resources without acquiring any serialization lock, causing data races with concurrent ioctls and `vfio_pci_zap_bars()`.
- [High] `devm_memunmap_pages()` blocks indefinitely in `vfio_pci_zap_bars()` if P2P memory pages are persistently pinned by userspace, causing a local Denial of Service.
Pre-existing issues:
- [High] Error path in `pci_p2pdma_add_resource()` leaves a dangling devres action, causing a Use-After-Free during driver unbind.
--
commit f498fe90fe30c1dceda58fbbdcd2de8314934e62
Author: Pranjal Shrivastava <praan@google.com>
PCI/P2PDMA: Introduce a helper to release P2P resources
Introduce pci_p2pdma_remove_resource() to allow manual teardown of a
device's P2P DMA pool. The new API enables exclusive owners of a device,
such as vfio-pci, to cleanly release P2P resources during session closure
or hardware reset.
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index adb17a4f69396..2a48ffefa01c4 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
[ ... ]
> @@ -466,6 +469,41 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
> }
> EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);
[Severity: High]
This is a pre-existing issue, but does the error path in
pci_p2pdma_add_resource() leave a dangling devres action?
If gen_pool_add_owner() fails, the code jumps to pages_free:
pages_free:
devm_memunmap_pages(&pdev->dev, pgmap);
pgmap_free:
devm_kfree(&pdev->dev, p2p_pgmap);
return error;
The devm_add_action_or_reset() called earlier registers
pci_p2pdma_unmap_mappings() with the p2p_pgmap pointer, but it doesn't
appear to be deregistered here. When the device driver later unbinds,
will the devres core execute pci_p2pdma_unmap_mappings() with the already
freed p2p_pgmap pointer, leading to a Use-After-Free of p2p_pgmap->mem->owner?
> +/**
> + * pci_p2pdma_remove_resource - remove all p2p memory for a device
> + * @pdev: the device to remove the memory from
> + *
> + * Tear down the entire p2p DMA pool for the device. Zap any existinga
> + * userspace mappings of the p2pmem/allocate file
> + */
> +void pci_p2pdma_remove_resource(struct pci_dev *pdev)
> +{
[ ... ]
> + gen_pool_destroy(p2pdma->pool);
> + p2pdma->pool = NULL;
[Severity: Critical]
Does destroying the pool and setting it to NULL here without RCU
synchronization cause Use-After-Free and NULL pointer dereferences?
Because pdev->p2pdma is still active and accessible to RCU readers, it seems
this can race with several concurrent operations:
Concurrent sysfs readers like size_show() might read p2pdma->pool and
trigger a Use-After-Free when executing gen_pool_size().
Concurrent callers of pci_alloc_p2pmem() read p2pdma under RCU, see it is
not NULL, and could unconditionally pass the now-NULL p2pdma->pool to
gen_pool_alloc_owner(), causing a NULL pointer dereference.
Additionally, the driver detach handler pci_p2pdma_release() checks
if (p2pdma->pool) synchronize_rcu();. Since the pool is now NULL, it skips
RCU synchronization and immediately destroys the xarray, leading to a UAF
of the p2pdma structure itself by any remaining RCU readers.
> +
> + sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group);
> +
> + pci_info(pdev, "removed all peer-to-peer DMA memory\n");
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_remove_resource);
[ ... ]
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 7913b8916df91..7b58cb344408d 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1704,6 +1704,11 @@ static void vfio_pci_zap_bars(struct vfio_pci_core_device *vdev)
> loff_t len = end - start;
>
> unmap_mapping_range(core_vdev->inode->i_mapping, start, len, true);
> +
> + if (vdev->p2p_registered_bars) {
> + pci_p2pdma_remove_resource(vdev->pdev);
[Severity: High]
Is it possible for devm_memunmap_pages() to block indefinitely here while
holding a critical device rwsem?
Inside pci_p2pdma_remove_resource(), devm_memunmap_pages() synchronously
waits via wait_for_completion(&pgmap->done) for all device page references
to be released. If userspace has persistently pinned these P2P pages (e.g.,
by registering them with RDMA or io_uring), the reference count might remain
elevated, causing the kernel thread to hang in uninterruptible sleep.
Because vfio_pci_zap_bars() is called with down_write(&vdev->memory_lock)
held, this hang could prevent any future operations on the device, including
administrative driver unbinding.
[Severity: High]
Can pci_p2pdma_remove_resource() run concurrently with the ioctl handler
vfio_pci_core_feature_p2p_register()?
It looks like vfio_pci_core_feature_p2p_register() updates
vdev->p2p_registered_bars and calls pci_p2pdma_add_resource() without taking
vdev->memory_lock or any other serialization lock.
Since vfio_pci_zap_bars() directly calls pci_p2pdma_remove_resource(),
concurrent execution could cause a TOCTOU race and corrupt the p2pdma pool
state and pagemaps, as the PCI P2P subsystem assumes exclusive access from
the caller for state modifications.
> + vdev->p2p_registered_bars = 0;
> + }
> }
>
> void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev)
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610151853.3608948-1-praan@google.com?part=5
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
` (4 preceding siblings ...)
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
@ 2026-06-10 16:28 ` Jason Gunthorpe
2026-06-10 18:32 ` Leon Romanovsky
5 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2026-06-10 16:28 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: linux-pci, linux-kernel, kvm, Bjorn Helgaas, Logan Gunthorpe,
Alex Williamson, Kevin Tian, Ankit Agrawal, Matt Evans,
Vivek Kasireddy, Leon Romanovsky, Shivaji Kant, Samiullah Khawaja
On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> Users utilize the standard sysfs p2pmem/allocate interface for managing
> memory slices once a BAR is registered.
I'm shocked someone wants to use API, what are you expecting to do
with it??
> An alternative implementation has been explored which integrates with the
> ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> registering a BAR as a system-wide P2P provider, VFIO optionally
> allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
That's probably more sensible but you can't have a DMABUF mmap
actually install non-special memory. The native vfio mmap still can,
but not mmap on the dmabuf fd. That's still workable, just keep in
mind.
What do you even intend to do with this? With the new work to tie
dmabuf directly into io_uring I really wonder if this is worth doing
for VFIO?
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
@ 2026-06-10 18:32 ` Leon Romanovsky
0 siblings, 0 replies; 13+ messages in thread
From: Leon Romanovsky @ 2026-06-10 18:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pranjal Shrivastava, linux-pci, linux-kernel, kvm, Bjorn Helgaas,
Logan Gunthorpe, Alex Williamson, Kevin Tian, Ankit Agrawal,
Matt Evans, Vivek Kasireddy, Shivaji Kant, Samiullah Khawaja
On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
>
> > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > memory slices once a BAR is registered.
>
> I'm shocked someone wants to use API, what are you expecting to do
> with it??
I was under impression what we all want to move from that API.
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-06-10 18:32 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 15:18 [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Pranjal Shrivastava
2026-06-10 15:18 ` [RFC PATCH 1/5] vfio: Add UAPI for ZONE_DEVICE-backed P2P registration Pranjal Shrivastava
2026-06-10 15:31 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 2/5] vfio/pci: Implement " Pranjal Shrivastava
2026-06-10 15:35 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 3/5] vfio/pci: Block mmap & dmabuf export for ZONE_DEVICE-registered BARs Pranjal Shrivastava
2026-06-10 15:40 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 4/5] vfio/pci: Block ZONE_DEVICE registration for BARs with active DMABUFs Pranjal Shrivastava
2026-06-10 15:44 ` sashiko-bot
2026-06-10 15:18 ` [RFC PATCH 5/5] PCI/P2PDMA: Introduce a helper to release P2P resources Pranjal Shrivastava
2026-06-10 15:54 ` sashiko-bot
2026-06-10 16:28 ` [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Jason Gunthorpe
2026-06-10 18:32 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox