The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time
@ 2026-05-05 17:38 Matt Evans
  2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw)
  To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal,
	Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum,
	Yishai Hadas
  Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization

Hi,

These patches fix a potential race for concurrent calls to
vfio_pci_core_setup_barmap(), and a DMABUF missing check for resource
before the export.  Discussion on a previous series (different,
replaced by this one) is here:

 https://lore.kernel.org/kvm/20260415181423.1008458-1-mattev@meta.com

Responses in that thread indicated there wasn't a strong historical
reason to require the mapping to be performed on-demand at BAR
reference time.  It's much simpler to move this earlier, to
vfio_pci_core_enable(), and that then avoids having to deal with
concurrent requests later.

The first patch requests PCI resources and pci_iomap() of the BARs
from vfio_pci_core_enable(), moving this out of
vfio_pci_core_setup_barmap().

Some callers rely on vfio_pci_core_setup_barmap() for its ioremap()
effect, and other callers use it for its resource-acquiring effect.
The function turns into a cheap error check that both these actions
have occurred; that maintains the same error behaviour as before the
fix.

The second patch adds a call to vfio_pci_core_setup_barmap() to VFIO
DMABUF export to check the resource is reserved; previously this was
able to export an unrequested resource.  Although patch 1 at first
appears to fix this by requesting resources at enable time, code using
the BAR still needs to check the resource really was acquired.

The third patch refactors vfio_pci_core_setup_barmap() plus the various
vdev->barmap[] accesses into vfio_pci_core_get_iomap() which returns
either a pointer to the mapping or an ERR_PTR() describing why it
doesn't exist.  This is used by callers that need the mapping, but
also by other callers to check that the resource/mapping step was
successful.


=== Changes ===

v4:
 - Reorder patches to put fixes at the front: First, the early BAR
   setup to avoid the race.  Then, add DMABUF check.  Then,
   refactor/tidy.

 - Adjust Fixes: of first patch to point to early VFIO PCI commit, and
   reduce the patch to only the fix (don't add new error checks).
   Use pci_dbg() instead of pci_warn() when setting up BAR
   resources.  Add barmap[] error checking to vfio_pci_core_disable().

 - Add barmap[]/BAR index error checking to vfio_pci_core_get_iomap(),
   and use WARN_ON_ONCE() since the conditions truly shouldn't happen.

v3:
  https://lore.kernel.org/kvm/20260430100340.2787446-1-mattev@meta.com/

 - Remove the separate tracking of the BAR mapping versus the
   acquiring its resource.  Errors from failing iomap vs resource
   reservation are ERR_PTR()-elcoded into barmap[bar].

 - Remove the separate test helper, and add vfio_pci_core_get_iomap().
   This gets the iomap base or is used check for error/failure to
   acquire the resource.  Added comments at call sites explaining
   whether they want to just ensure the resource is reserved versus
   actually use the mapping.

v2:
  https://lore.kernel.org/kvm/20260423182517.2286030-1-mattev@meta.com/

 - Don't fail if resources can't be requested or iomapped, even for
   valid BARs, as this would change the userspace-observable error
   behaviour.  Specifically, if there was an issue with one particular
   BAR which happened to never be used, then userspace would never
   encounter an error for it.  Track iomap and resource-acquisition
   status per BAR.

 - Break out the checks for resource success from those for iomap
   success, in the form of the two new helpers.

 - Third patch to add the check to VFIO DMABUF export, because
   init-time requests can now fail.

v1:
  https://lore.kernel.org/kvm/20260421174143.3883579-1-mattev@meta.com/


Matt Evans (3):
  vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
  vfio/pci: Check BAR resources before exporting a DMABUF
  vfio/pci: Replace vfio_pci_core_setup_barmap() with
    vfio_pci_core_get_iomap()

 drivers/vfio/pci/nvgrace-gpu/main.c | 11 +++----
 drivers/vfio/pci/vfio_pci_core.c    | 47 ++++++++++++++++++++++++-----
 drivers/vfio/pci/vfio_pci_dmabuf.c  |  6 ++--
 drivers/vfio/pci/vfio_pci_rdwr.c    | 42 +++++---------------------
 drivers/vfio/pci/virtio/legacy_io.c | 13 ++++----
 include/linux/vfio_pci_core.h       | 20 +++++++++++-
 6 files changed, 81 insertions(+), 58 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
  2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
@ 2026-05-05 17:38 ` Matt Evans
  2026-05-07 22:21   ` Alex Williamson
  2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans
  2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
  2 siblings, 1 reply; 9+ messages in thread
From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw)
  To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal,
	Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum,
	Yishai Hadas
  Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization

Previously BAR resource requests and the corresponding pci_iomap()
were performed on-demand and without synchronisation, which was racy.
Rather than add synchronisation, it's simplest to address this by
doing both activities from vfio_pci_core_enable().

The resource allocation and/or pci_iomap() can still fail; their
status is tracked and existing calls to vfio_pci_core_setup_barmap()
will fail in a similar way to before.  This keeps the point of failure
as observed by userspace the same, i.e. failures to request/map unused
BARs are benign.

Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Signed-off-by: Matt Evans <mattev@meta.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 36 +++++++++++++++++++++++++++++++-
 drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++----------------
 2 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3f8d093aacf8..62931dc381d8 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -482,6 +482,39 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
 }
 #endif /* CONFIG_PM */
 
+/*
+ * Eager-request BAR resources, and iomap them.  Soft failures are
+ * allowed, and consumers must check the barmap before use in order to
+ * give compatible user-visible behaviour with the previous on-demand
+ * allocation method.
+ */
+static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev)
+{
+	struct pci_dev *pdev = vdev->pdev;
+	int i;
+
+	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
+		int bar = i + PCI_STD_RESOURCES;
+
+		vdev->barmap[bar] = ERR_PTR(-ENODEV);
+
+		if (!pci_resource_len(pdev, i))
+			continue;
+
+		if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) {
+			pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar);
+			vdev->barmap[bar] = ERR_PTR(-EBUSY);
+			continue;
+		}
+
+		vdev->barmap[bar] = pci_iomap(pdev, bar, 0);
+		if (!vdev->barmap[bar]) {
+			pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar);
+			vdev->barmap[bar] = ERR_PTR(-ENOMEM);
+		}
+	}
+}
+
 /*
  * The pci-driver core runtime PM routines always save the device state
  * before going into suspended state. If the device is going into low power
@@ -568,6 +601,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
 	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
 		vdev->has_vga = true;
 
+	vfio_pci_core_map_bars(vdev);
 
 	return 0;
 
@@ -648,7 +682,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
 
 	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 		bar = i + PCI_STD_RESOURCES;
-		if (!vdev->barmap[bar])
+		if (IS_ERR_OR_NULL(vdev->barmap[bar]))
 			continue;
 		pci_iounmap(pdev, vdev->barmap[bar]);
 		pci_release_selected_regions(pdev, 1 << bar);
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 4251ee03e146..3bfbb879a005 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -198,27 +198,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
 
+/*
+ * The barmap is set up in vfio_pci_core_enable().  Callers use this
+ * function to check that the BAR resources are requested or that the
+ * pci_iomap() was done.
+ */
 int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
 {
-	struct pci_dev *pdev = vdev->pdev;
-	int ret;
-	void __iomem *io;
-
-	if (vdev->barmap[bar])
-		return 0;
-
-	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-	if (ret)
-		return ret;
-
-	io = pci_iomap(pdev, bar, 0);
-	if (!io) {
-		pci_release_selected_regions(pdev, 1 << bar);
-		return -ENOMEM;
-	}
-
-	vdev->barmap[bar] = io;
-
+	if (IS_ERR(vdev->barmap[bar]))
+		return PTR_ERR(vdev->barmap[bar]);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF
  2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
  2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
@ 2026-05-05 17:38 ` Matt Evans
  2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
  2 siblings, 0 replies; 9+ messages in thread
From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw)
  To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal,
	Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum,
	Yishai Hadas
  Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization

A DMABUF exports access to BAR resources and, although they are
requested at startup time, we need to ensure they really were reserved
before exporting.  Otherwise, it's possible to access unreserved
resources through the export.

Add a check to the DMABUF-creation path.

Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Matt Evans <mattev@meta.com>
---
 drivers/vfio/pci/vfio_pci_dmabuf.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index f87fd32e4a01..69a5c2d511e6 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -244,9 +244,11 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 		return -EINVAL;
 
 	/*
-	 * For PCI the region_index is the BAR number like everything else.
+	 * For PCI the region_index is the BAR number like everything
+	 * else.  Check that PCI resources have been claimed for it.
 	 */
-	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
+	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
+	    vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
 		return -ENODEV;
 
 	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
  2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
  2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
  2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans
@ 2026-05-05 17:38 ` Matt Evans
  2026-05-07 22:21   ` Alex Williamson
  2 siblings, 1 reply; 9+ messages in thread
From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw)
  To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal,
	Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum,
	Yishai Hadas
  Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization

Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
resource request and iomap for the BARs was performed early, and
vfio_pci_core_setup_barmap() just checks those actions succeeded.

Move this logic to a new helper that checks success and returns the
iomap address, replacing the various bare vdev->barmap[] lookups.
This maintains the error behaviour of the previous on-demand
vfio_pci_core_setup_barmap() scheme.

Signed-off-by: Matt Evans <mattev@meta.com>
---
 drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++-------
 drivers/vfio/pci/vfio_pci_core.c    | 11 +++++------
 drivers/vfio/pci/vfio_pci_dmabuf.c  |  2 +-
 drivers/vfio/pci/vfio_pci_rdwr.c    | 30 ++++++++---------------------
 drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++-------
 include/linux/vfio_pci_core.h       | 20 ++++++++++++++++++-
 6 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index fa056b69f899..e153002258ce 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
 
 	/*
 	 * GPU readiness is checked by reading the BAR0 registers.
-	 *
-	 * ioremap BAR0 to ensure that the BAR0 mapping is present before
-	 * register reads on first fault before establishing any GPU
-	 * memory mapping.
+	 * The BAR map was just set up by vfio_pci_core_enable(), so
+	 * check that was successful and bail early if not:
 	 */
-	ret = vfio_pci_core_setup_barmap(vdev, 0);
-	if (ret)
+	if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0)))
 		goto error_exit;
 
 	if (nvdev->resmem.memlength) {
@@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
 	if (!__vfio_pci_memory_enabled(vdev))
 		return -EIO;
 
-	ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
+	ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0));
 	if (ret)
 		return ret;
 
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 62931dc381d8..5c8bd13f10d0 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned int index;
 	u64 phys_len, req_len, pgoff, req_start;
-	int ret;
+	void __iomem *bar_io;
 
 	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
 
@@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
 		return -EINVAL;
 
 	/*
-	 * Even though we don't make use of the barmap for the mmap,
-	 * we need to request the region and the barmap tracks that.
+	 * Ensure the BAR resource region is reserved for use.
 	 */
-	ret = vfio_pci_core_setup_barmap(vdev, index);
-	if (ret)
-		return ret;
+	bar_io = vfio_pci_core_get_iomap(vdev, index);
+	if (IS_ERR(bar_io))
+		return PTR_ERR(bar_io);
 
 	vma->vm_private_data = vdev;
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 69a5c2d511e6..46cd44b22c9c 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 	 * else.  Check that PCI resources have been claimed for it.
 	 */
 	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
-	    vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
+	    IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
 		return -ENODEV;
 
 	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 3bfbb879a005..7f14dd46de17 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
 
-/*
- * The barmap is set up in vfio_pci_core_enable().  Callers use this
- * function to check that the BAR resources are requested or that the
- * pci_iomap() was done.
- */
-int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
-{
-	if (IS_ERR(vdev->barmap[bar]))
-		return PTR_ERR(vdev->barmap[bar]);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
-
 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 			size_t count, loff_t *ppos, bool iswrite)
 {
@@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		 */
 		max_width = VFIO_PCI_IO_WIDTH_4;
 	} else {
-		int ret = vfio_pci_core_setup_barmap(vdev, bar);
-		if (ret) {
-			done = ret;
+		io = vfio_pci_core_get_iomap(vdev, bar);
+		if (IS_ERR(io)) {
+			done = PTR_ERR(io);
 			goto out;
 		}
-
-		io = vdev->barmap[bar];
 	}
 
 	if (bar == vdev->msix_bar) {
@@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
 	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
 	struct vfio_pci_ioeventfd *ioeventfd;
+	void __iomem *io;
 
 	/* Only support ioeventfds into BARs */
 	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
@@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 	if (count == 8)
 		return -EINVAL;
 
-	ret = vfio_pci_core_setup_barmap(vdev, bar);
-	if (ret)
-		return ret;
+	io = vfio_pci_core_get_iomap(vdev, bar);
+	if (IS_ERR(io))
+		return PTR_ERR(io);
 
 	mutex_lock(&vdev->ioeventfds_lock);
 
@@ -479,7 +465,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
 	}
 
 	ioeventfd->vdev = vdev;
-	ioeventfd->addr = vdev->barmap[bar] + pos;
+	ioeventfd->addr = io + pos;
 	ioeventfd->data = data;
 	ioeventfd->pos = pos;
 	ioeventfd->bar = bar;
diff --git a/drivers/vfio/pci/virtio/legacy_io.c b/drivers/vfio/pci/virtio/legacy_io.c
index 1ed349a55629..c868b2177310 100644
--- a/drivers/vfio/pci/virtio/legacy_io.c
+++ b/drivers/vfio/pci/virtio/legacy_io.c
@@ -299,19 +299,18 @@ int virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
 static int virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
 {
 	struct vfio_pci_core_device *core_device = &virtvdev->core_device;
-	int ret;
+	void __iomem *io;
 
 	/*
 	 * Setup the BAR where the 'notify' exists to be used by vfio as well
 	 * This will let us mmap it only once and use it when needed.
 	 */
-	ret = vfio_pci_core_setup_barmap(core_device,
-					 virtvdev->notify_bar);
-	if (ret)
-		return ret;
+	io = vfio_pci_core_get_iomap(core_device,
+				     virtvdev->notify_bar);
+	if (IS_ERR(io))
+		return PTR_ERR(io);
 
-	virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] +
-			virtvdev->notify_offset;
+	virtvdev->notify_addr = io + virtvdev->notify_offset;
 	return 0;
 }
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 2ebba746c18f..ffd67e25bf3f 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -188,7 +188,6 @@ int vfio_pci_core_match_token_uuid(struct vfio_device *core_vdev,
 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
-int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 						pci_channel_state_t state);
 ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
@@ -234,6 +233,25 @@ static inline bool is_aligned_for_order(struct vm_area_struct *vma,
 			   !IS_ALIGNED(pfn, 1 << order)));
 }
 
+/*
+ * Returns a BAR's iomap base or an ERR_PTR() if, for example, the
+ * BAR isn't valid, its resource wasn't acquired, or its iomap
+ * failed.  This shall only be used after vfio_pci_core_enable()
+ * has set up the BAR maps and before vfio_pci_core_disable()
+ * tears them down.
+ */
+static inline void __iomem __must_check *
+vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar)
+{
+	if (WARN_ON_ONCE(bar < 0 || bar >= PCI_STD_NUM_BARS))
+		return ERR_PTR(-EINVAL);
+
+	if (WARN_ON_ONCE(!vdev->barmap[bar]))
+		return ERR_PTR(-ENODEV);
+
+	return vdev->barmap[bar];
+}
+
 int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment,
 				 struct phys_vec *phys);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
  2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
@ 2026-05-07 22:21   ` Alex Williamson
  2026-05-08 14:14     ` Matt Evans
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2026-05-07 22:21 UTC (permalink / raw)
  To: Matt Evans
  Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple,
	Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas,
	Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization, alex

On Tue, 5 May 2026 10:38:29 -0700
Matt Evans <mattev@meta.com> wrote:

> Previously BAR resource requests and the corresponding pci_iomap()
> were performed on-demand and without synchronisation, which was racy.
> Rather than add synchronisation, it's simplest to address this by
> doing both activities from vfio_pci_core_enable().
> 
> The resource allocation and/or pci_iomap() can still fail; their
> status is tracked and existing calls to vfio_pci_core_setup_barmap()
> will fail in a similar way to before.  This keeps the point of failure
> as observed by userspace the same, i.e. failures to request/map unused
> BARs are benign.
> 
> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
> Signed-off-by: Matt Evans <mattev@meta.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 36 +++++++++++++++++++++++++++++++-
>  drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++----------------
>  2 files changed, 42 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3f8d093aacf8..62931dc381d8 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -482,6 +482,39 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
>  }
>  #endif /* CONFIG_PM */
>  
> +/*
> + * Eager-request BAR resources, and iomap them.  Soft failures are
> + * allowed, and consumers must check the barmap before use in order to
> + * give compatible user-visible behaviour with the previous on-demand
> + * allocation method.
> + */
> +static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	int i;
> +
> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
> +		int bar = i + PCI_STD_RESOURCES;
> +
> +		vdev->barmap[bar] = ERR_PTR(-ENODEV);
> +
> +		if (!pci_resource_len(pdev, i))
> +			continue;
> +
> +		if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) {
> +			pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar);
> +			vdev->barmap[bar] = ERR_PTR(-EBUSY);
> +			continue;
> +		}
> +
> +		vdev->barmap[bar] = pci_iomap(pdev, bar, 0);
> +		if (!vdev->barmap[bar]) {

Sashiko notes[1] correctly that we need to release the requested region
here.

[1]https://sashiko.dev/#/patchset/20260505173835.2324179-1-mattev@meta.com

> +			pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar);
> +			vdev->barmap[bar] = ERR_PTR(-ENOMEM);
> +		}
> +	}
> +}
> +
>  /*
>   * The pci-driver core runtime PM routines always save the device state
>   * before going into suspended state. If the device is going into low power
> @@ -568,6 +601,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>  	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
>  		vdev->has_vga = true;
>  
> +	vfio_pci_core_map_bars(vdev);
>  
>  	return 0;
>  
> @@ -648,7 +682,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>  
>  	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  		bar = i + PCI_STD_RESOURCES;
> -		if (!vdev->barmap[bar])
> +		if (IS_ERR_OR_NULL(vdev->barmap[bar]))
>  			continue;
>  		pci_iounmap(pdev, vdev->barmap[bar]);
>  		pci_release_selected_regions(pdev, 1 << bar);
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 4251ee03e146..3bfbb879a005 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -198,27 +198,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>  
> +/*
> + * The barmap is set up in vfio_pci_core_enable().  Callers use this
> + * function to check that the BAR resources are requested or that the
> + * pci_iomap() was done.
> + */
>  int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>  {
> -	struct pci_dev *pdev = vdev->pdev;
> -	int ret;
> -	void __iomem *io;
> -
> -	if (vdev->barmap[bar])
> -		return 0;
> -
> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
> -	if (ret)
> -		return ret;
> -
> -	io = pci_iomap(pdev, bar, 0);
> -	if (!io) {
> -		pci_release_selected_regions(pdev, 1 << bar);
> -		return -ENOMEM;
> -	}
> -
> -	vdev->barmap[bar] = io;
> -
> +	if (IS_ERR(vdev->barmap[bar]))
> +		return PTR_ERR(vdev->barmap[bar]);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
  2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
@ 2026-05-07 22:21   ` Alex Williamson
  2026-05-08 15:30     ` Matt Evans
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2026-05-07 22:21 UTC (permalink / raw)
  To: Matt Evans
  Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple,
	Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas,
	Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization, alex

On Tue, 5 May 2026 10:38:31 -0700
Matt Evans <mattev@meta.com> wrote:

> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
> resource request and iomap for the BARs was performed early, and
> vfio_pci_core_setup_barmap() just checks those actions succeeded.
> 
> Move this logic to a new helper that checks success and returns the
> iomap address, replacing the various bare vdev->barmap[] lookups.
> This maintains the error behaviour of the previous on-demand
> vfio_pci_core_setup_barmap() scheme.
> 
> Signed-off-by: Matt Evans <mattev@meta.com>
> ---
>  drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++-------
>  drivers/vfio/pci/vfio_pci_core.c    | 11 +++++------
>  drivers/vfio/pci/vfio_pci_dmabuf.c  |  2 +-
>  drivers/vfio/pci/vfio_pci_rdwr.c    | 30 ++++++++---------------------
>  drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++-------
>  include/linux/vfio_pci_core.h       | 20 ++++++++++++++++++-
>  6 files changed, 43 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index fa056b69f899..e153002258ce 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
>  
>  	/*
>  	 * GPU readiness is checked by reading the BAR0 registers.
> -	 *
> -	 * ioremap BAR0 to ensure that the BAR0 mapping is present before
> -	 * register reads on first fault before establishing any GPU
> -	 * memory mapping.
> +	 * The BAR map was just set up by vfio_pci_core_enable(), so
> +	 * check that was successful and bail early if not:
>  	 */
> -	ret = vfio_pci_core_setup_barmap(vdev, 0);
> -	if (ret)
> +	if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0)))
>  		goto error_exit;

Sashiko notes we're not setting ret here.  The bots are also paranoid
about the unreachable condition that the get_iomap below could return an
ERR_PTR.  Maybe head off both by adding an __iomem pointer to the
nvgrace_gpu_pci_core_device struct and a temporary one here.  Store the
iomap in the temporary variable, use it to test for IS_ERR() and
PTR_ERR(), then set the pointer in the structure after the last error
condition here.  Add one line in the close_device to set it NULL.  Then
just use nvdev->bar0_io below.

>  
>  	if (nvdev->resmem.memlength) {
> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
>  	if (!__vfio_pci_memory_enabled(vdev))
>  		return -EIO;
>  
> -	ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
> +	ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0));
>  	if (ret)
>  		return ret;
>  
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 62931dc381d8..5c8bd13f10d0 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>  	struct pci_dev *pdev = vdev->pdev;
>  	unsigned int index;
>  	u64 phys_len, req_len, pgoff, req_start;
> -	int ret;
> +	void __iomem *bar_io;
>  
>  	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>  
> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>  		return -EINVAL;
>  
>  	/*
> -	 * Even though we don't make use of the barmap for the mmap,
> -	 * we need to request the region and the barmap tracks that.
> +	 * Ensure the BAR resource region is reserved for use.
>  	 */
> -	ret = vfio_pci_core_setup_barmap(vdev, index);
> -	if (ret)
> -		return ret;
> +	bar_io = vfio_pci_core_get_iomap(vdev, index);
> +	if (IS_ERR(bar_io))
> +		return PTR_ERR(bar_io);
>  
>  	vma->vm_private_data = vdev;
>  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 69a5c2d511e6..46cd44b22c9c 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  	 * else.  Check that PCI resources have been claimed for it.
>  	 */
>  	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
> -	    vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
> +	    IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
>  		return -ENODEV;
>  
>  	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 3bfbb879a005..7f14dd46de17 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>  
> -/*
> - * The barmap is set up in vfio_pci_core_enable().  Callers use this
> - * function to check that the BAR resources are requested or that the
> - * pci_iomap() was done.
> - */
> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> -{
> -	if (IS_ERR(vdev->barmap[bar]))
> -		return PTR_ERR(vdev->barmap[bar]);
> -	return 0;
> -}
> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
> -
>  ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  			size_t count, loff_t *ppos, bool iswrite)
>  {
> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>  		 */
>  		max_width = VFIO_PCI_IO_WIDTH_4;
>  	} else {
> -		int ret = vfio_pci_core_setup_barmap(vdev, bar);
> -		if (ret) {
> -			done = ret;
> +		io = vfio_pci_core_get_iomap(vdev, bar);
> +		if (IS_ERR(io)) {
> +			done = PTR_ERR(io);
>  			goto out;
>  		}
> -
> -		io = vdev->barmap[bar];
>  	}
>  
>  	if (bar == vdev->msix_bar) {
> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>  	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
>  	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
>  	struct vfio_pci_ioeventfd *ioeventfd;
> +	void __iomem *io;
>  
>  	/* Only support ioeventfds into BARs */
>  	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>  	if (count == 8)
>  		return -EINVAL;
>  
> -	ret = vfio_pci_core_setup_barmap(vdev, bar);
> -	if (ret)
> -		return ret;
> +	io = vfio_pci_core_get_iomap(vdev, bar);
> +	if (IS_ERR(io))
> +		return PTR_ERR(io);

Sashiko seems to note a real existing error here that should also be
pulled out to a separate fix.  Given the right offset, this could
generate a negative BAR value.  The test at the end of the previous
chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`.
Do you want to pick that up in this series?  I think it's the only case
that lets that slip through.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
  2026-05-07 22:21   ` Alex Williamson
@ 2026-05-08 14:14     ` Matt Evans
  0 siblings, 0 replies; 9+ messages in thread
From: Matt Evans @ 2026-05-08 14:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple,
	Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas,
	Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization

Hi Alex,

On 07/05/2026 23:21, Alex Williamson wrote:
> 
> On Tue, 5 May 2026 10:38:29 -0700
> Matt Evans <mattev@meta.com> wrote:
> 
>> Previously BAR resource requests and the corresponding pci_iomap()
>> were performed on-demand and without synchronisation, which was racy.
>> Rather than add synchronisation, it's simplest to address this by
>> doing both activities from vfio_pci_core_enable().
>>
>> The resource allocation and/or pci_iomap() can still fail; their
>> status is tracked and existing calls to vfio_pci_core_setup_barmap()
>> will fail in a similar way to before.  This keeps the point of failure
>> as observed by userspace the same, i.e. failures to request/map unused
>> BARs are benign.
>>
>> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
>> Signed-off-by: Matt Evans <mattev@meta.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_core.c | 36 +++++++++++++++++++++++++++++++-
>>   drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++----------------
>>   2 files changed, 42 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 3f8d093aacf8..62931dc381d8 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -482,6 +482,39 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
>>   }
>>   #endif /* CONFIG_PM */
>>   
>> +/*
>> + * Eager-request BAR resources, and iomap them.  Soft failures are
>> + * allowed, and consumers must check the barmap before use in order to
>> + * give compatible user-visible behaviour with the previous on-demand
>> + * allocation method.
>> + */
>> +static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev)
>> +{
>> +	struct pci_dev *pdev = vdev->pdev;
>> +	int i;
>> +
>> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>> +		int bar = i + PCI_STD_RESOURCES;
>> +
>> +		vdev->barmap[bar] = ERR_PTR(-ENODEV);
>> +
>> +		if (!pci_resource_len(pdev, i))
>> +			continue;
>> +
>> +		if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) {
>> +			pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar);
>> +			vdev->barmap[bar] = ERR_PTR(-EBUSY);
>> +			continue;
>> +		}
>> +
>> +		vdev->barmap[bar] = pci_iomap(pdev, bar, 0);
>> +		if (!vdev->barmap[bar]) {
> 
> Sashiko notes[1] correctly that we need to release the requested region
> here.
> 
> [1]https://urldefense.com/v3/__https://sashiko.dev/*/patchset/20260505173835.2324179-1-mattev@meta.com__;Iw!!Bt8RZUm9aw!75pHBGTcV8AYGiGGjzomqZLfDp7iR_j2JC6qCiJufoo7TxJTPuViQZjqp7I3ZRPPxwj1YtYSNQ$

Hnnng.  Right, fixed.

  -Matt

>> +			pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar);
>> +			vdev->barmap[bar] = ERR_PTR(-ENOMEM);
>> +		}
>> +	}
>> +}
>> +
>>   /*
>>    * The pci-driver core runtime PM routines always save the device state
>>    * before going into suspended state. If the device is going into low power
>> @@ -568,6 +601,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>>   	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
>>   		vdev->has_vga = true;
>>   
>> +	vfio_pci_core_map_bars(vdev);
>>   
>>   	return 0;
>>   
>> @@ -648,7 +682,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
>>   
>>   	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>>   		bar = i + PCI_STD_RESOURCES;
>> -		if (!vdev->barmap[bar])
>> +		if (IS_ERR_OR_NULL(vdev->barmap[bar]))
>>   			continue;
>>   		pci_iounmap(pdev, vdev->barmap[bar]);
>>   		pci_release_selected_regions(pdev, 1 << bar);
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 4251ee03e146..3bfbb879a005 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -198,27 +198,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>>   
>> +/*
>> + * The barmap is set up in vfio_pci_core_enable().  Callers use this
>> + * function to check that the BAR resources are requested or that the
>> + * pci_iomap() was done.
>> + */
>>   int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>>   {
>> -	struct pci_dev *pdev = vdev->pdev;
>> -	int ret;
>> -	void __iomem *io;
>> -
>> -	if (vdev->barmap[bar])
>> -		return 0;
>> -
>> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>> -	if (ret)
>> -		return ret;
>> -
>> -	io = pci_iomap(pdev, bar, 0);
>> -	if (!io) {
>> -		pci_release_selected_regions(pdev, 1 << bar);
>> -		return -ENOMEM;
>> -	}
>> -
>> -	vdev->barmap[bar] = io;
>> -
>> +	if (IS_ERR(vdev->barmap[bar]))
>> +		return PTR_ERR(vdev->barmap[bar]);
>>   	return 0;
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
  2026-05-07 22:21   ` Alex Williamson
@ 2026-05-08 15:30     ` Matt Evans
  2026-05-08 17:45       ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Matt Evans @ 2026-05-08 15:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple,
	Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas,
	Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization

Hi Alex,

On 07/05/2026 23:21, Alex Williamson wrote:
> 
> On Tue, 5 May 2026 10:38:31 -0700
> Matt Evans <mattev@meta.com> wrote:
> 
>> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
>> resource request and iomap for the BARs was performed early, and
>> vfio_pci_core_setup_barmap() just checks those actions succeeded.
>>
>> Move this logic to a new helper that checks success and returns the
>> iomap address, replacing the various bare vdev->barmap[] lookups.
>> This maintains the error behaviour of the previous on-demand
>> vfio_pci_core_setup_barmap() scheme.
>>
>> Signed-off-by: Matt Evans <mattev@meta.com>
>> ---
>>   drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++-------
>>   drivers/vfio/pci/vfio_pci_core.c    | 11 +++++------
>>   drivers/vfio/pci/vfio_pci_dmabuf.c  |  2 +-
>>   drivers/vfio/pci/vfio_pci_rdwr.c    | 30 ++++++++---------------------
>>   drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++-------
>>   include/linux/vfio_pci_core.h       | 20 ++++++++++++++++++-
>>   6 files changed, 43 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
>> index fa056b69f899..e153002258ce 100644
>> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
>> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
>> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
>>   
>>   	/*
>>   	 * GPU readiness is checked by reading the BAR0 registers.
>> -	 *
>> -	 * ioremap BAR0 to ensure that the BAR0 mapping is present before
>> -	 * register reads on first fault before establishing any GPU
>> -	 * memory mapping.
>> +	 * The BAR map was just set up by vfio_pci_core_enable(), so
>> +	 * check that was successful and bail early if not:
>>   	 */
>> -	ret = vfio_pci_core_setup_barmap(vdev, 0);
>> -	if (ret)
>> +	if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0)))
>>   		goto error_exit;
> 
> Sashiko notes we're not setting ret here.  The bots are also paranoid
> about the unreachable condition that the get_iomap below could return an
> ERR_PTR.  Maybe head off both by adding an __iomem pointer to the
> nvgrace_gpu_pci_core_device struct and a temporary one here.  Store the
> iomap in the temporary variable, use it to test for IS_ERR() and
> PTR_ERR(), then set the pointer in the structure after the last error
> condition here.  Add one line in the close_device to set it NULL.  Then
> just use nvdev->bar0_io below.

Right about ret.  On the 2nd, the bots could benefit from a comment on
the ...get_iomap() below saying that it "cannot fail" if this one
passes, but hey.  I can add a struct member to track it (bots can then
worry that it might be NULL, if they don't notice that
nvgrace_gpu_check_device_ready() can't happen if
nvgrace_gpu_open_device() didn't succeed, etc. etc.).

>>   
>>   	if (nvdev->resmem.memlength) {
>> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
>>   	if (!__vfio_pci_memory_enabled(vdev))
>>   		return -EIO;
>>   
>> -	ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
>> +	ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0));
>>   	if (ret)
>>   		return ret;
>>   
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 62931dc381d8..5c8bd13f10d0 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>>   	struct pci_dev *pdev = vdev->pdev;
>>   	unsigned int index;
>>   	u64 phys_len, req_len, pgoff, req_start;
>> -	int ret;
>> +	void __iomem *bar_io;
>>   
>>   	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>>   
>> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>>   		return -EINVAL;
>>   
>>   	/*
>> -	 * Even though we don't make use of the barmap for the mmap,
>> -	 * we need to request the region and the barmap tracks that.
>> +	 * Ensure the BAR resource region is reserved for use.
>>   	 */
>> -	ret = vfio_pci_core_setup_barmap(vdev, index);
>> -	if (ret)
>> -		return ret;
>> +	bar_io = vfio_pci_core_get_iomap(vdev, index);
>> +	if (IS_ERR(bar_io))
>> +		return PTR_ERR(bar_io);
>>   
>>   	vma->vm_private_data = vdev;
>>   	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
>> index 69a5c2d511e6..46cd44b22c9c 100644
>> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
>> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
>> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>>   	 * else.  Check that PCI resources have been claimed for it.
>>   	 */
>>   	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
>> -	    vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
>> +	    IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
>>   		return -ENODEV;
>>   
>>   	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 3bfbb879a005..7f14dd46de17 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>>   
>> -/*
>> - * The barmap is set up in vfio_pci_core_enable().  Callers use this
>> - * function to check that the BAR resources are requested or that the
>> - * pci_iomap() was done.
>> - */
>> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> -{
>> -	if (IS_ERR(vdev->barmap[bar]))
>> -		return PTR_ERR(vdev->barmap[bar]);
>> -	return 0;
>> -}
>> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
>> -
>>   ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   			size_t count, loff_t *ppos, bool iswrite)
>>   {
>> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   		 */
>>   		max_width = VFIO_PCI_IO_WIDTH_4;
>>   	} else {
>> -		int ret = vfio_pci_core_setup_barmap(vdev, bar);
>> -		if (ret) {
>> -			done = ret;
>> +		io = vfio_pci_core_get_iomap(vdev, bar);
>> +		if (IS_ERR(io)) {
>> +			done = PTR_ERR(io);
>>   			goto out;
>>   		}
>> -
>> -		io = vdev->barmap[bar];
>>   	}
>>   
>>   	if (bar == vdev->msix_bar) {
>> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>>   	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
>>   	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
>>   	struct vfio_pci_ioeventfd *ioeventfd;
>> +	void __iomem *io;
>>   
>>   	/* Only support ioeventfds into BARs */
>>   	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
>> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>>   	if (count == 8)
>>   		return -EINVAL;
>>   
>> -	ret = vfio_pci_core_setup_barmap(vdev, bar);
>> -	if (ret)
>> -		return ret;
>> +	io = vfio_pci_core_get_iomap(vdev, bar);
>> +	if (IS_ERR(io))
>> +		return PTR_ERR(io);
> 
> Sashiko seems to note a real existing error here that should also be
> pulled out to a separate fix.  Given the right offset, this could
> generate a negative BAR value.

Yuck, loff_t signed, yep.  Isn't the real root of this that it
never makes sense for VFIO_PCI_OFFSET_TO_INDEX() to return a negative
index here or anywhere else?

I suggest instead, to also avoid this elsewhere in future, something
like:

  #define VFIO_PCI_OFFSET_TO_INDEX(off)	((u64)(off) >> VFIO_PCI_OFFSET_SHIFT)

> The test at the end of the previous
> chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`.

Not necessary if VFIO_PCI_OFFSET_TO_INDEX() can't return < 0 (the
magnitude would be 24b so can't overflow the `int bar` it's assigned
into).

> Do you want to pick that up in this series?  I think it's the only case
> that lets that slip through.  Thanks,

Sure, I'll post a fix.  I don't think it needs to be part of this series
though if it's just the macro, do you agree?

Do you know why drivers/gpu/drm/i915/gvt/kvmgt.c has copied
VFIO_PCI_OFFSET_TO_INDEX() and friends?  Perhaps the shift was different
(the reason drivers/vfio/pci/ism/main.c has its own versions).  The same
loff_t issue seems to exist in both of those places, unfortunately.


Matt

PS: with minor question:
Relatedly, I'd made `bar` an int following existing convention in

  vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar)

But I'll make this `unsigned int`, please flag if this violates taste
and decency.  IMO any BAR/index parameter should be unsigned; most are,
some signed remain.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
  2026-05-08 15:30     ` Matt Evans
@ 2026-05-08 17:45       ` Alex Williamson
  0 siblings, 0 replies; 9+ messages in thread
From: Alex Williamson @ 2026-05-08 17:45 UTC (permalink / raw)
  To: Matt Evans
  Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple,
	Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas,
	Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
	Zhi Wang, kvm, linux-kernel, virtualization, alex

On Fri, 8 May 2026 16:30:40 +0100
Matt Evans <mattev@meta.com> wrote:

> Hi Alex,
> 
> On 07/05/2026 23:21, Alex Williamson wrote:
> > 
> > On Tue, 5 May 2026 10:38:31 -0700
> > Matt Evans <mattev@meta.com> wrote:
> >   
> >> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
> >> resource request and iomap for the BARs was performed early, and
> >> vfio_pci_core_setup_barmap() just checks those actions succeeded.
> >>
> >> Move this logic to a new helper that checks success and returns the
> >> iomap address, replacing the various bare vdev->barmap[] lookups.
> >> This maintains the error behaviour of the previous on-demand
> >> vfio_pci_core_setup_barmap() scheme.
> >>
> >> Signed-off-by: Matt Evans <mattev@meta.com>
> >> ---
> >>   drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++-------
> >>   drivers/vfio/pci/vfio_pci_core.c    | 11 +++++------
> >>   drivers/vfio/pci/vfio_pci_dmabuf.c  |  2 +-
> >>   drivers/vfio/pci/vfio_pci_rdwr.c    | 30 ++++++++---------------------
> >>   drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++-------
> >>   include/linux/vfio_pci_core.h       | 20 ++++++++++++++++++-
> >>   6 files changed, 43 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> >> index fa056b69f899..e153002258ce 100644
> >> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> >> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> >> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
> >>   
> >>   	/*
> >>   	 * GPU readiness is checked by reading the BAR0 registers.
> >> -	 *
> >> -	 * ioremap BAR0 to ensure that the BAR0 mapping is present before
> >> -	 * register reads on first fault before establishing any GPU
> >> -	 * memory mapping.
> >> +	 * The BAR map was just set up by vfio_pci_core_enable(), so
> >> +	 * check that was successful and bail early if not:
> >>   	 */
> >> -	ret = vfio_pci_core_setup_barmap(vdev, 0);
> >> -	if (ret)
> >> +	if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0)))
> >>   		goto error_exit;  
> > 
> > Sashiko notes we're not setting ret here.  The bots are also paranoid
> > about the unreachable condition that the get_iomap below could return an
> > ERR_PTR.  Maybe head off both by adding an __iomem pointer to the
> > nvgrace_gpu_pci_core_device struct and a temporary one here.  Store the
> > iomap in the temporary variable, use it to test for IS_ERR() and
> > PTR_ERR(), then set the pointer in the structure after the last error
> > condition here.  Add one line in the close_device to set it NULL.  Then
> > just use nvdev->bar0_io below.  
> 
> Right about ret.  On the 2nd, the bots could benefit from a comment on
> the ...get_iomap() below saying that it "cannot fail" if this one
> passes, but hey.  I can add a struct member to track it (bots can then
> worry that it might be NULL, if they don't notice that
> nvgrace_gpu_check_device_ready() can't happen if
> nvgrace_gpu_open_device() didn't succeed, etc. etc.).

While I agree that there's always something that can be overlooked, it
does seem semantically cleaner that the io is tested when it's
retrieved for the driver structure and used from that structure in a
fixed lifecycle than retrieved without testing the results at time of
use.
 
> >>   
> >>   	if (nvdev->resmem.memlength) {
> >> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
> >>   	if (!__vfio_pci_memory_enabled(vdev))
> >>   		return -EIO;
> >>   
> >> -	ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
> >> +	ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0));
> >>   	if (ret)
> >>   		return ret;
> >>   
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index 62931dc381d8..5c8bd13f10d0 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> >>   	struct pci_dev *pdev = vdev->pdev;
> >>   	unsigned int index;
> >>   	u64 phys_len, req_len, pgoff, req_start;
> >> -	int ret;
> >> +	void __iomem *bar_io;
> >>   
> >>   	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
> >>   
> >> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> >>   		return -EINVAL;
> >>   
> >>   	/*
> >> -	 * Even though we don't make use of the barmap for the mmap,
> >> -	 * we need to request the region and the barmap tracks that.
> >> +	 * Ensure the BAR resource region is reserved for use.
> >>   	 */
> >> -	ret = vfio_pci_core_setup_barmap(vdev, index);
> >> -	if (ret)
> >> -		return ret;
> >> +	bar_io = vfio_pci_core_get_iomap(vdev, index);
> >> +	if (IS_ERR(bar_io))
> >> +		return PTR_ERR(bar_io);
> >>   
> >>   	vma->vm_private_data = vdev;
> >>   	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> >> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> >> index 69a5c2d511e6..46cd44b22c9c 100644
> >> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> >> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> >> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> >>   	 * else.  Check that PCI resources have been claimed for it.
> >>   	 */
> >>   	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
> >> -	    vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
> >> +	    IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
> >>   		return -ENODEV;
> >>   
> >>   	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
> >> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> >> index 3bfbb879a005..7f14dd46de17 100644
> >> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> >> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> >> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
> >>   }
> >>   EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
> >>   
> >> -/*
> >> - * The barmap is set up in vfio_pci_core_enable().  Callers use this
> >> - * function to check that the BAR resources are requested or that the
> >> - * pci_iomap() was done.
> >> - */
> >> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
> >> -{
> >> -	if (IS_ERR(vdev->barmap[bar]))
> >> -		return PTR_ERR(vdev->barmap[bar]);
> >> -	return 0;
> >> -}
> >> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
> >> -
> >>   ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
> >>   			size_t count, loff_t *ppos, bool iswrite)
> >>   {
> >> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
> >>   		 */
> >>   		max_width = VFIO_PCI_IO_WIDTH_4;
> >>   	} else {
> >> -		int ret = vfio_pci_core_setup_barmap(vdev, bar);
> >> -		if (ret) {
> >> -			done = ret;
> >> +		io = vfio_pci_core_get_iomap(vdev, bar);
> >> +		if (IS_ERR(io)) {
> >> +			done = PTR_ERR(io);
> >>   			goto out;
> >>   		}
> >> -
> >> -		io = vdev->barmap[bar];
> >>   	}
> >>   
> >>   	if (bar == vdev->msix_bar) {
> >> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
> >>   	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
> >>   	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
> >>   	struct vfio_pci_ioeventfd *ioeventfd;
> >> +	void __iomem *io;
> >>   
> >>   	/* Only support ioeventfds into BARs */
> >>   	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
> >> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
> >>   	if (count == 8)
> >>   		return -EINVAL;
> >>   
> >> -	ret = vfio_pci_core_setup_barmap(vdev, bar);
> >> -	if (ret)
> >> -		return ret;
> >> +	io = vfio_pci_core_get_iomap(vdev, bar);
> >> +	if (IS_ERR(io))
> >> +		return PTR_ERR(io);  
> > 
> > Sashiko seems to note a real existing error here that should also be
> > pulled out to a separate fix.  Given the right offset, this could
> > generate a negative BAR value.  
> 
> Yuck, loff_t signed, yep.  Isn't the real root of this that it
> never makes sense for VFIO_PCI_OFFSET_TO_INDEX() to return a negative
> index here or anywhere else?

Yes

> I suggest instead, to also avoid this elsewhere in future, something
> like:
> 
>   #define VFIO_PCI_OFFSET_TO_INDEX(off)	((u64)(off) >> VFIO_PCI_OFFSET_SHIFT)

Sure, that has better coverage.

> > The test at the end of the previous
> > chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`.  
> 
> Not necessary if VFIO_PCI_OFFSET_TO_INDEX() can't return < 0 (the
> magnitude would be 24b so can't overflow the `int bar` it's assigned
> into).

Yep.

> > Do you want to pick that up in this series?  I think it's the only case
> > that lets that slip through.  Thanks,  
> 
> Sure, I'll post a fix.  I don't think it needs to be part of this series
> though if it's just the macro, do you agree?

I was hoping to collect the fixes from this series for v7.1-rc
regardless, so either way.

> Do you know why drivers/gpu/drm/i915/gvt/kvmgt.c has copied
> VFIO_PCI_OFFSET_TO_INDEX() and friends?  Perhaps the shift was different
> (the reason drivers/vfio/pci/ism/main.c has its own versions).  The same
> loff_t issue seems to exist in both of those places, unfortunately.

I think because it was previously defined in a drivers/vfio/pci/ header
that couldn't be cleanly included.  The region shift is implementation,
not API, so drivers are free to define their own region spacing, see
for instance the new ISM driver that needs >40bits per region.  We're
likely going to move to a maple tree for defining regions in the future
so that we can more easily account for such large BARs as they become
more common.  Jason linked a branch with a rough draft of this not long
ago.

> PS: with minor question:
> Relatedly, I'd made `bar` an int following existing convention in
> 
>   vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar)
> 
> But I'll make this `unsigned int`, please flag if this violates taste
> and decency.  IMO any BAR/index parameter should be unsigned; most are,
> some signed remain.

Yep, I think that would make sense and avoids needing two tests to make
sure it's in range.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-08 17:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
2026-05-07 22:21   ` Alex Williamson
2026-05-08 14:14     ` Matt Evans
2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans
2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
2026-05-07 22:21   ` Alex Williamson
2026-05-08 15:30     ` Matt Evans
2026-05-08 17:45       ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox