* [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time
@ 2026-05-05 17:38 Matt Evans
2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw)
To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal,
Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum,
Yishai Hadas
Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy,
Zhi Wang, kvm, linux-kernel, virtualization
Hi,
These patches fix a potential race for concurrent calls to
vfio_pci_core_setup_barmap(), and a DMABUF missing check for resource
before the export. Discussion on a previous series (different,
replaced by this one) is here:
https://lore.kernel.org/kvm/20260415181423.1008458-1-mattev@meta.com
Responses in that thread indicated there wasn't a strong historical
reason to require the mapping to be performed on-demand at BAR
reference time. It's much simpler to move this earlier, to
vfio_pci_core_enable(), and that then avoids having to deal with
concurrent requests later.
The first patch requests PCI resources and pci_iomap() of the BARs
from vfio_pci_core_enable(), moving this out of
vfio_pci_core_setup_barmap().
Some callers rely on vfio_pci_core_setup_barmap() for its ioremap()
effect, and other callers use it for its resource-acquiring effect.
The function turns into a cheap error check that both these actions
have occurred; that maintains the same error behaviour as before the
fix.
The second patch adds a call to vfio_pci_core_setup_barmap() to VFIO
DMABUF export to check the resource is reserved; previously this was
able to export an unrequested resource. Although patch 1 at first
appears to fix this by requesting resources at enable time, code using
the BAR still needs to check the resource really was acquired.
The third patch refactors vfio_pci_core_setup_barmap() plus the various
vdev->barmap[] accesses into vfio_pci_core_get_iomap() which returns
either a pointer to the mapping or an ERR_PTR() describing why it
doesn't exist. This is used by callers that need the mapping, but
also by other callers to check that the resource/mapping step was
successful.
=== Changes ===
v4:
- Reorder patches to put fixes at the front: First, the early BAR
setup to avoid the race. Then, add DMABUF check. Then,
refactor/tidy.
- Adjust Fixes: of first patch to point to early VFIO PCI commit, and
reduce the patch to only the fix (don't add new error checks).
Use pci_dbg() instead of pci_warn() when setting up BAR
resources. Add barmap[] error checking to vfio_pci_core_disable().
- Add barmap[]/BAR index error checking to vfio_pci_core_get_iomap(),
and use WARN_ON_ONCE() since the conditions truly shouldn't happen.
v3:
https://lore.kernel.org/kvm/20260430100340.2787446-1-mattev@meta.com/
- Remove the separate tracking of the BAR mapping versus the
acquiring its resource. Errors from failing iomap vs resource
reservation are ERR_PTR()-elcoded into barmap[bar].
- Remove the separate test helper, and add vfio_pci_core_get_iomap().
This gets the iomap base or is used check for error/failure to
acquire the resource. Added comments at call sites explaining
whether they want to just ensure the resource is reserved versus
actually use the mapping.
v2:
https://lore.kernel.org/kvm/20260423182517.2286030-1-mattev@meta.com/
- Don't fail if resources can't be requested or iomapped, even for
valid BARs, as this would change the userspace-observable error
behaviour. Specifically, if there was an issue with one particular
BAR which happened to never be used, then userspace would never
encounter an error for it. Track iomap and resource-acquisition
status per BAR.
- Break out the checks for resource success from those for iomap
success, in the form of the two new helpers.
- Third patch to add the check to VFIO DMABUF export, because
init-time requests can now fail.
v1:
https://lore.kernel.org/kvm/20260421174143.3883579-1-mattev@meta.com/
Matt Evans (3):
vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable()
vfio/pci: Check BAR resources before exporting a DMABUF
vfio/pci: Replace vfio_pci_core_setup_barmap() with
vfio_pci_core_get_iomap()
drivers/vfio/pci/nvgrace-gpu/main.c | 11 +++----
drivers/vfio/pci/vfio_pci_core.c | 47 ++++++++++++++++++++++++-----
drivers/vfio/pci/vfio_pci_dmabuf.c | 6 ++--
drivers/vfio/pci/vfio_pci_rdwr.c | 42 +++++---------------------
drivers/vfio/pci/virtio/legacy_io.c | 13 ++++----
include/linux/vfio_pci_core.h | 20 +++++++++++-
6 files changed, 81 insertions(+), 58 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() 2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans @ 2026-05-05 17:38 ` Matt Evans 2026-05-07 22:21 ` Alex Williamson 2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans 2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans 2 siblings, 1 reply; 9+ messages in thread From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw) To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization Previously BAR resource requests and the corresponding pci_iomap() were performed on-demand and without synchronisation, which was racy. Rather than add synchronisation, it's simplest to address this by doing both activities from vfio_pci_core_enable(). The resource allocation and/or pci_iomap() can still fail; their status is tracked and existing calls to vfio_pci_core_setup_barmap() will fail in a similar way to before. This keeps the point of failure as observed by userspace the same, i.e. failures to request/map unused BARs are benign. Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Signed-off-by: Matt Evans <mattev@meta.com> --- drivers/vfio/pci/vfio_pci_core.c | 36 +++++++++++++++++++++++++++++++- drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++---------------- 2 files changed, 42 insertions(+), 20 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 3f8d093aacf8..62931dc381d8 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -482,6 +482,39 @@ static int vfio_pci_core_runtime_resume(struct device *dev) } #endif /* CONFIG_PM */ +/* + * Eager-request BAR resources, and iomap them. Soft failures are + * allowed, and consumers must check the barmap before use in order to + * give compatible user-visible behaviour with the previous on-demand + * allocation method. + */ +static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + int i; + + for (i = 0; i < PCI_STD_NUM_BARS; i++) { + int bar = i + PCI_STD_RESOURCES; + + vdev->barmap[bar] = ERR_PTR(-ENODEV); + + if (!pci_resource_len(pdev, i)) + continue; + + if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) { + pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar); + vdev->barmap[bar] = ERR_PTR(-EBUSY); + continue; + } + + vdev->barmap[bar] = pci_iomap(pdev, bar, 0); + if (!vdev->barmap[bar]) { + pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar); + vdev->barmap[bar] = ERR_PTR(-ENOMEM); + } + } +} + /* * The pci-driver core runtime PM routines always save the device state * before going into suspended state. If the device is going into low power @@ -568,6 +601,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev)) vdev->has_vga = true; + vfio_pci_core_map_bars(vdev); return 0; @@ -648,7 +682,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) for (i = 0; i < PCI_STD_NUM_BARS; i++) { bar = i + PCI_STD_RESOURCES; - if (!vdev->barmap[bar]) + if (IS_ERR_OR_NULL(vdev->barmap[bar])) continue; pci_iounmap(pdev, vdev->barmap[bar]); pci_release_selected_regions(pdev, 1 << bar); diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 4251ee03e146..3bfbb879a005 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -198,27 +198,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, } EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); +/* + * The barmap is set up in vfio_pci_core_enable(). Callers use this + * function to check that the BAR resources are requested or that the + * pci_iomap() was done. + */ int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) { - struct pci_dev *pdev = vdev->pdev; - int ret; - void __iomem *io; - - if (vdev->barmap[bar]) - return 0; - - ret = pci_request_selected_regions(pdev, 1 << bar, "vfio"); - if (ret) - return ret; - - io = pci_iomap(pdev, bar, 0); - if (!io) { - pci_release_selected_regions(pdev, 1 << bar); - return -ENOMEM; - } - - vdev->barmap[bar] = io; - + if (IS_ERR(vdev->barmap[bar])) + return PTR_ERR(vdev->barmap[bar]); return 0; } EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() 2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans @ 2026-05-07 22:21 ` Alex Williamson 2026-05-08 14:14 ` Matt Evans 0 siblings, 1 reply; 9+ messages in thread From: Alex Williamson @ 2026-05-07 22:21 UTC (permalink / raw) To: Matt Evans Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas, Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization, alex On Tue, 5 May 2026 10:38:29 -0700 Matt Evans <mattev@meta.com> wrote: > Previously BAR resource requests and the corresponding pci_iomap() > were performed on-demand and without synchronisation, which was racy. > Rather than add synchronisation, it's simplest to address this by > doing both activities from vfio_pci_core_enable(). > > The resource allocation and/or pci_iomap() can still fail; their > status is tracked and existing calls to vfio_pci_core_setup_barmap() > will fail in a similar way to before. This keeps the point of failure > as observed by userspace the same, i.e. failures to request/map unused > BARs are benign. > > Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") > Signed-off-by: Matt Evans <mattev@meta.com> > --- > drivers/vfio/pci/vfio_pci_core.c | 36 +++++++++++++++++++++++++++++++- > drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++---------------- > 2 files changed, 42 insertions(+), 20 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 3f8d093aacf8..62931dc381d8 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -482,6 +482,39 @@ static int vfio_pci_core_runtime_resume(struct device *dev) > } > #endif /* CONFIG_PM */ > > +/* > + * Eager-request BAR resources, and iomap them. Soft failures are > + * allowed, and consumers must check the barmap before use in order to > + * give compatible user-visible behaviour with the previous on-demand > + * allocation method. > + */ > +static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev) > +{ > + struct pci_dev *pdev = vdev->pdev; > + int i; > + > + for (i = 0; i < PCI_STD_NUM_BARS; i++) { > + int bar = i + PCI_STD_RESOURCES; > + > + vdev->barmap[bar] = ERR_PTR(-ENODEV); > + > + if (!pci_resource_len(pdev, i)) > + continue; > + > + if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) { > + pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar); > + vdev->barmap[bar] = ERR_PTR(-EBUSY); > + continue; > + } > + > + vdev->barmap[bar] = pci_iomap(pdev, bar, 0); > + if (!vdev->barmap[bar]) { Sashiko notes[1] correctly that we need to release the requested region here. [1]https://sashiko.dev/#/patchset/20260505173835.2324179-1-mattev@meta.com > + pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar); > + vdev->barmap[bar] = ERR_PTR(-ENOMEM); > + } > + } > +} > + > /* > * The pci-driver core runtime PM routines always save the device state > * before going into suspended state. If the device is going into low power > @@ -568,6 +601,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) > if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev)) > vdev->has_vga = true; > > + vfio_pci_core_map_bars(vdev); > > return 0; > > @@ -648,7 +682,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) > > for (i = 0; i < PCI_STD_NUM_BARS; i++) { > bar = i + PCI_STD_RESOURCES; > - if (!vdev->barmap[bar]) > + if (IS_ERR_OR_NULL(vdev->barmap[bar])) > continue; > pci_iounmap(pdev, vdev->barmap[bar]); > pci_release_selected_regions(pdev, 1 << bar); > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c > index 4251ee03e146..3bfbb879a005 100644 > --- a/drivers/vfio/pci/vfio_pci_rdwr.c > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c > @@ -198,27 +198,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, > } > EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); > > +/* > + * The barmap is set up in vfio_pci_core_enable(). Callers use this > + * function to check that the BAR resources are requested or that the > + * pci_iomap() was done. > + */ > int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) > { > - struct pci_dev *pdev = vdev->pdev; > - int ret; > - void __iomem *io; > - > - if (vdev->barmap[bar]) > - return 0; > - > - ret = pci_request_selected_regions(pdev, 1 << bar, "vfio"); > - if (ret) > - return ret; > - > - io = pci_iomap(pdev, bar, 0); > - if (!io) { > - pci_release_selected_regions(pdev, 1 << bar); > - return -ENOMEM; > - } > - > - vdev->barmap[bar] = io; > - > + if (IS_ERR(vdev->barmap[bar])) > + return PTR_ERR(vdev->barmap[bar]); > return 0; > } > EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() 2026-05-07 22:21 ` Alex Williamson @ 2026-05-08 14:14 ` Matt Evans 0 siblings, 0 replies; 9+ messages in thread From: Matt Evans @ 2026-05-08 14:14 UTC (permalink / raw) To: Alex Williamson Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas, Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization Hi Alex, On 07/05/2026 23:21, Alex Williamson wrote: > > On Tue, 5 May 2026 10:38:29 -0700 > Matt Evans <mattev@meta.com> wrote: > >> Previously BAR resource requests and the corresponding pci_iomap() >> were performed on-demand and without synchronisation, which was racy. >> Rather than add synchronisation, it's simplest to address this by >> doing both activities from vfio_pci_core_enable(). >> >> The resource allocation and/or pci_iomap() can still fail; their >> status is tracked and existing calls to vfio_pci_core_setup_barmap() >> will fail in a similar way to before. This keeps the point of failure >> as observed by userspace the same, i.e. failures to request/map unused >> BARs are benign. >> >> Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") >> Signed-off-by: Matt Evans <mattev@meta.com> >> --- >> drivers/vfio/pci/vfio_pci_core.c | 36 +++++++++++++++++++++++++++++++- >> drivers/vfio/pci/vfio_pci_rdwr.c | 26 +++++++---------------- >> 2 files changed, 42 insertions(+), 20 deletions(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c >> index 3f8d093aacf8..62931dc381d8 100644 >> --- a/drivers/vfio/pci/vfio_pci_core.c >> +++ b/drivers/vfio/pci/vfio_pci_core.c >> @@ -482,6 +482,39 @@ static int vfio_pci_core_runtime_resume(struct device *dev) >> } >> #endif /* CONFIG_PM */ >> >> +/* >> + * Eager-request BAR resources, and iomap them. Soft failures are >> + * allowed, and consumers must check the barmap before use in order to >> + * give compatible user-visible behaviour with the previous on-demand >> + * allocation method. >> + */ >> +static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev) >> +{ >> + struct pci_dev *pdev = vdev->pdev; >> + int i; >> + >> + for (i = 0; i < PCI_STD_NUM_BARS; i++) { >> + int bar = i + PCI_STD_RESOURCES; >> + >> + vdev->barmap[bar] = ERR_PTR(-ENODEV); >> + >> + if (!pci_resource_len(pdev, i)) >> + continue; >> + >> + if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) { >> + pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar); >> + vdev->barmap[bar] = ERR_PTR(-EBUSY); >> + continue; >> + } >> + >> + vdev->barmap[bar] = pci_iomap(pdev, bar, 0); >> + if (!vdev->barmap[bar]) { > > Sashiko notes[1] correctly that we need to release the requested region > here. > > [1]https://urldefense.com/v3/__https://sashiko.dev/*/patchset/20260505173835.2324179-1-mattev@meta.com__;Iw!!Bt8RZUm9aw!75pHBGTcV8AYGiGGjzomqZLfDp7iR_j2JC6qCiJufoo7TxJTPuViQZjqp7I3ZRPPxwj1YtYSNQ$ Hnnng. Right, fixed. -Matt >> + pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar); >> + vdev->barmap[bar] = ERR_PTR(-ENOMEM); >> + } >> + } >> +} >> + >> /* >> * The pci-driver core runtime PM routines always save the device state >> * before going into suspended state. If the device is going into low power >> @@ -568,6 +601,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) >> if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev)) >> vdev->has_vga = true; >> >> + vfio_pci_core_map_bars(vdev); >> >> return 0; >> >> @@ -648,7 +682,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) >> >> for (i = 0; i < PCI_STD_NUM_BARS; i++) { >> bar = i + PCI_STD_RESOURCES; >> - if (!vdev->barmap[bar]) >> + if (IS_ERR_OR_NULL(vdev->barmap[bar])) >> continue; >> pci_iounmap(pdev, vdev->barmap[bar]); >> pci_release_selected_regions(pdev, 1 << bar); >> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c >> index 4251ee03e146..3bfbb879a005 100644 >> --- a/drivers/vfio/pci/vfio_pci_rdwr.c >> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c >> @@ -198,27 +198,15 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, >> } >> EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); >> >> +/* >> + * The barmap is set up in vfio_pci_core_enable(). Callers use this >> + * function to check that the BAR resources are requested or that the >> + * pci_iomap() was done. >> + */ >> int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) >> { >> - struct pci_dev *pdev = vdev->pdev; >> - int ret; >> - void __iomem *io; >> - >> - if (vdev->barmap[bar]) >> - return 0; >> - >> - ret = pci_request_selected_regions(pdev, 1 << bar, "vfio"); >> - if (ret) >> - return ret; >> - >> - io = pci_iomap(pdev, bar, 0); >> - if (!io) { >> - pci_release_selected_regions(pdev, 1 << bar); >> - return -ENOMEM; >> - } >> - >> - vdev->barmap[bar] = io; >> - >> + if (IS_ERR(vdev->barmap[bar])) >> + return PTR_ERR(vdev->barmap[bar]); >> return 0; >> } >> EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF 2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans 2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans @ 2026-05-05 17:38 ` Matt Evans 2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans 2 siblings, 0 replies; 9+ messages in thread From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw) To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization A DMABUF exports access to BAR resources and, although they are requested at startup time, we need to ensure they really were reserved before exporting. Otherwise, it's possible to access unreserved resources through the export. Add a check to the DMABUF-creation path. Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO regions") Signed-off-by: Matt Evans <mattev@meta.com> --- drivers/vfio/pci/vfio_pci_dmabuf.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index f87fd32e4a01..69a5c2d511e6 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -244,9 +244,11 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, return -EINVAL; /* - * For PCI the region_index is the BAR number like everything else. + * For PCI the region_index is the BAR number like everything + * else. Check that PCI resources have been claimed for it. */ - if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX) + if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX || + vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index)) return -ENODEV; dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() 2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans 2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans 2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans @ 2026-05-05 17:38 ` Matt Evans 2026-05-07 22:21 ` Alex Williamson 2 siblings, 1 reply; 9+ messages in thread From: Matt Evans @ 2026-05-05 17:38 UTC (permalink / raw) To: Alex Williamson, Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas Cc: Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the resource request and iomap for the BARs was performed early, and vfio_pci_core_setup_barmap() just checks those actions succeeded. Move this logic to a new helper that checks success and returns the iomap address, replacing the various bare vdev->barmap[] lookups. This maintains the error behaviour of the previous on-demand vfio_pci_core_setup_barmap() scheme. Signed-off-by: Matt Evans <mattev@meta.com> --- drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++------- drivers/vfio/pci/vfio_pci_core.c | 11 +++++------ drivers/vfio/pci/vfio_pci_dmabuf.c | 2 +- drivers/vfio/pci/vfio_pci_rdwr.c | 30 ++++++++--------------------- drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++------- include/linux/vfio_pci_core.h | 20 ++++++++++++++++++- 6 files changed, 43 insertions(+), 44 deletions(-) diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c index fa056b69f899..e153002258ce 100644 --- a/drivers/vfio/pci/nvgrace-gpu/main.c +++ b/drivers/vfio/pci/nvgrace-gpu/main.c @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev) /* * GPU readiness is checked by reading the BAR0 registers. - * - * ioremap BAR0 to ensure that the BAR0 mapping is present before - * register reads on first fault before establishing any GPU - * memory mapping. + * The BAR map was just set up by vfio_pci_core_enable(), so + * check that was successful and bail early if not: */ - ret = vfio_pci_core_setup_barmap(vdev, 0); - if (ret) + if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0))) goto error_exit; if (nvdev->resmem.memlength) { @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev) if (!__vfio_pci_memory_enabled(vdev)) return -EIO; - ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]); + ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0)); if (ret) return ret; diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 62931dc381d8..5c8bd13f10d0 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma struct pci_dev *pdev = vdev->pdev; unsigned int index; u64 phys_len, req_len, pgoff, req_start; - int ret; + void __iomem *bar_io; index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma return -EINVAL; /* - * Even though we don't make use of the barmap for the mmap, - * we need to request the region and the barmap tracks that. + * Ensure the BAR resource region is reserved for use. */ - ret = vfio_pci_core_setup_barmap(vdev, index); - if (ret) - return ret; + bar_io = vfio_pci_core_get_iomap(vdev, index); + if (IS_ERR(bar_io)) + return PTR_ERR(bar_io); vma->vm_private_data = vdev; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index 69a5c2d511e6..46cd44b22c9c 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, * else. Check that PCI resources have been claimed for it. */ if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX || - vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index)) + IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index))) return -ENODEV; dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 3bfbb879a005..7f14dd46de17 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, } EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); -/* - * The barmap is set up in vfio_pci_core_enable(). Callers use this - * function to check that the BAR resources are requested or that the - * pci_iomap() was done. - */ -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) -{ - if (IS_ERR(vdev->barmap[bar])) - return PTR_ERR(vdev->barmap[bar]); - return 0; -} -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); - ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, */ max_width = VFIO_PCI_IO_WIDTH_4; } else { - int ret = vfio_pci_core_setup_barmap(vdev, bar); - if (ret) { - done = ret; + io = vfio_pci_core_get_iomap(vdev, bar); + if (IS_ERR(io)) { + done = PTR_ERR(io); goto out; } - - io = vdev->barmap[bar]; } if (bar == vdev->msix_bar) { @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, loff_t pos = offset & VFIO_PCI_OFFSET_MASK; int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); struct vfio_pci_ioeventfd *ioeventfd; + void __iomem *io; /* Only support ioeventfds into BARs */ if (bar > VFIO_PCI_BAR5_REGION_INDEX) @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, if (count == 8) return -EINVAL; - ret = vfio_pci_core_setup_barmap(vdev, bar); - if (ret) - return ret; + io = vfio_pci_core_get_iomap(vdev, bar); + if (IS_ERR(io)) + return PTR_ERR(io); mutex_lock(&vdev->ioeventfds_lock); @@ -479,7 +465,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, } ioeventfd->vdev = vdev; - ioeventfd->addr = vdev->barmap[bar] + pos; + ioeventfd->addr = io + pos; ioeventfd->data = data; ioeventfd->pos = pos; ioeventfd->bar = bar; diff --git a/drivers/vfio/pci/virtio/legacy_io.c b/drivers/vfio/pci/virtio/legacy_io.c index 1ed349a55629..c868b2177310 100644 --- a/drivers/vfio/pci/virtio/legacy_io.c +++ b/drivers/vfio/pci/virtio/legacy_io.c @@ -299,19 +299,18 @@ int virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev, static int virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev) { struct vfio_pci_core_device *core_device = &virtvdev->core_device; - int ret; + void __iomem *io; /* * Setup the BAR where the 'notify' exists to be used by vfio as well * This will let us mmap it only once and use it when needed. */ - ret = vfio_pci_core_setup_barmap(core_device, - virtvdev->notify_bar); - if (ret) - return ret; + io = vfio_pci_core_get_iomap(core_device, + virtvdev->notify_bar); + if (IS_ERR(io)) + return PTR_ERR(io); - virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] + - virtvdev->notify_offset; + virtvdev->notify_addr = io + virtvdev->notify_offset; return 0; } diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 2ebba746c18f..ffd67e25bf3f 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -188,7 +188,6 @@ int vfio_pci_core_match_token_uuid(struct vfio_device *core_vdev, int vfio_pci_core_enable(struct vfio_pci_core_device *vdev); void vfio_pci_core_disable(struct vfio_pci_core_device *vdev); void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar); pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, pci_channel_state_t state); ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, @@ -234,6 +233,25 @@ static inline bool is_aligned_for_order(struct vm_area_struct *vma, !IS_ALIGNED(pfn, 1 << order))); } +/* + * Returns a BAR's iomap base or an ERR_PTR() if, for example, the + * BAR isn't valid, its resource wasn't acquired, or its iomap + * failed. This shall only be used after vfio_pci_core_enable() + * has set up the BAR maps and before vfio_pci_core_disable() + * tears them down. + */ +static inline void __iomem __must_check * +vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar) +{ + if (WARN_ON_ONCE(bar < 0 || bar >= PCI_STD_NUM_BARS)) + return ERR_PTR(-EINVAL); + + if (WARN_ON_ONCE(!vdev->barmap[bar])) + return ERR_PTR(-ENODEV); + + return vdev->barmap[bar]; +} + int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, struct phys_vec *phys); -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() 2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans @ 2026-05-07 22:21 ` Alex Williamson 2026-05-08 15:30 ` Matt Evans 0 siblings, 1 reply; 9+ messages in thread From: Alex Williamson @ 2026-05-07 22:21 UTC (permalink / raw) To: Matt Evans Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas, Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization, alex On Tue, 5 May 2026 10:38:31 -0700 Matt Evans <mattev@meta.com> wrote: > Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the > resource request and iomap for the BARs was performed early, and > vfio_pci_core_setup_barmap() just checks those actions succeeded. > > Move this logic to a new helper that checks success and returns the > iomap address, replacing the various bare vdev->barmap[] lookups. > This maintains the error behaviour of the previous on-demand > vfio_pci_core_setup_barmap() scheme. > > Signed-off-by: Matt Evans <mattev@meta.com> > --- > drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++------- > drivers/vfio/pci/vfio_pci_core.c | 11 +++++------ > drivers/vfio/pci/vfio_pci_dmabuf.c | 2 +- > drivers/vfio/pci/vfio_pci_rdwr.c | 30 ++++++++--------------------- > drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++------- > include/linux/vfio_pci_core.h | 20 ++++++++++++++++++- > 6 files changed, 43 insertions(+), 44 deletions(-) > > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c > index fa056b69f899..e153002258ce 100644 > --- a/drivers/vfio/pci/nvgrace-gpu/main.c > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c > @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev) > > /* > * GPU readiness is checked by reading the BAR0 registers. > - * > - * ioremap BAR0 to ensure that the BAR0 mapping is present before > - * register reads on first fault before establishing any GPU > - * memory mapping. > + * The BAR map was just set up by vfio_pci_core_enable(), so > + * check that was successful and bail early if not: > */ > - ret = vfio_pci_core_setup_barmap(vdev, 0); > - if (ret) > + if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0))) > goto error_exit; Sashiko notes we're not setting ret here. The bots are also paranoid about the unreachable condition that the get_iomap below could return an ERR_PTR. Maybe head off both by adding an __iomem pointer to the nvgrace_gpu_pci_core_device struct and a temporary one here. Store the iomap in the temporary variable, use it to test for IS_ERR() and PTR_ERR(), then set the pointer in the structure after the last error condition here. Add one line in the close_device to set it NULL. Then just use nvdev->bar0_io below. > > if (nvdev->resmem.memlength) { > @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev) > if (!__vfio_pci_memory_enabled(vdev)) > return -EIO; > > - ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]); > + ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0)); > if (ret) > return ret; > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 62931dc381d8..5c8bd13f10d0 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma > struct pci_dev *pdev = vdev->pdev; > unsigned int index; > u64 phys_len, req_len, pgoff, req_start; > - int ret; > + void __iomem *bar_io; > > index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); > > @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma > return -EINVAL; > > /* > - * Even though we don't make use of the barmap for the mmap, > - * we need to request the region and the barmap tracks that. > + * Ensure the BAR resource region is reserved for use. > */ > - ret = vfio_pci_core_setup_barmap(vdev, index); > - if (ret) > - return ret; > + bar_io = vfio_pci_core_get_iomap(vdev, index); > + if (IS_ERR(bar_io)) > + return PTR_ERR(bar_io); > > vma->vm_private_data = vdev; > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c > index 69a5c2d511e6..46cd44b22c9c 100644 > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c > @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > * else. Check that PCI resources have been claimed for it. > */ > if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX || > - vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index)) > + IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index))) > return -ENODEV; > > dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c > index 3bfbb879a005..7f14dd46de17 100644 > --- a/drivers/vfio/pci/vfio_pci_rdwr.c > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c > @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, > } > EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); > > -/* > - * The barmap is set up in vfio_pci_core_enable(). Callers use this > - * function to check that the BAR resources are requested or that the > - * pci_iomap() was done. > - */ > -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) > -{ > - if (IS_ERR(vdev->barmap[bar])) > - return PTR_ERR(vdev->barmap[bar]); > - return 0; > -} > -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); > - > ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, > size_t count, loff_t *ppos, bool iswrite) > { > @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, > */ > max_width = VFIO_PCI_IO_WIDTH_4; > } else { > - int ret = vfio_pci_core_setup_barmap(vdev, bar); > - if (ret) { > - done = ret; > + io = vfio_pci_core_get_iomap(vdev, bar); > + if (IS_ERR(io)) { > + done = PTR_ERR(io); > goto out; > } > - > - io = vdev->barmap[bar]; > } > > if (bar == vdev->msix_bar) { > @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, > loff_t pos = offset & VFIO_PCI_OFFSET_MASK; > int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); > struct vfio_pci_ioeventfd *ioeventfd; > + void __iomem *io; > > /* Only support ioeventfds into BARs */ > if (bar > VFIO_PCI_BAR5_REGION_INDEX) > @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, > if (count == 8) > return -EINVAL; > > - ret = vfio_pci_core_setup_barmap(vdev, bar); > - if (ret) > - return ret; > + io = vfio_pci_core_get_iomap(vdev, bar); > + if (IS_ERR(io)) > + return PTR_ERR(io); Sashiko seems to note a real existing error here that should also be pulled out to a separate fix. Given the right offset, this could generate a negative BAR value. The test at the end of the previous chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`. Do you want to pick that up in this series? I think it's the only case that lets that slip through. Thanks, Alex ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() 2026-05-07 22:21 ` Alex Williamson @ 2026-05-08 15:30 ` Matt Evans 2026-05-08 17:45 ` Alex Williamson 0 siblings, 1 reply; 9+ messages in thread From: Matt Evans @ 2026-05-08 15:30 UTC (permalink / raw) To: Alex Williamson Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas, Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization Hi Alex, On 07/05/2026 23:21, Alex Williamson wrote: > > On Tue, 5 May 2026 10:38:31 -0700 > Matt Evans <mattev@meta.com> wrote: > >> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the >> resource request and iomap for the BARs was performed early, and >> vfio_pci_core_setup_barmap() just checks those actions succeeded. >> >> Move this logic to a new helper that checks success and returns the >> iomap address, replacing the various bare vdev->barmap[] lookups. >> This maintains the error behaviour of the previous on-demand >> vfio_pci_core_setup_barmap() scheme. >> >> Signed-off-by: Matt Evans <mattev@meta.com> >> --- >> drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++------- >> drivers/vfio/pci/vfio_pci_core.c | 11 +++++------ >> drivers/vfio/pci/vfio_pci_dmabuf.c | 2 +- >> drivers/vfio/pci/vfio_pci_rdwr.c | 30 ++++++++--------------------- >> drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++------- >> include/linux/vfio_pci_core.h | 20 ++++++++++++++++++- >> 6 files changed, 43 insertions(+), 44 deletions(-) >> >> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c >> index fa056b69f899..e153002258ce 100644 >> --- a/drivers/vfio/pci/nvgrace-gpu/main.c >> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c >> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev) >> >> /* >> * GPU readiness is checked by reading the BAR0 registers. >> - * >> - * ioremap BAR0 to ensure that the BAR0 mapping is present before >> - * register reads on first fault before establishing any GPU >> - * memory mapping. >> + * The BAR map was just set up by vfio_pci_core_enable(), so >> + * check that was successful and bail early if not: >> */ >> - ret = vfio_pci_core_setup_barmap(vdev, 0); >> - if (ret) >> + if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0))) >> goto error_exit; > > Sashiko notes we're not setting ret here. The bots are also paranoid > about the unreachable condition that the get_iomap below could return an > ERR_PTR. Maybe head off both by adding an __iomem pointer to the > nvgrace_gpu_pci_core_device struct and a temporary one here. Store the > iomap in the temporary variable, use it to test for IS_ERR() and > PTR_ERR(), then set the pointer in the structure after the last error > condition here. Add one line in the close_device to set it NULL. Then > just use nvdev->bar0_io below. Right about ret. On the 2nd, the bots could benefit from a comment on the ...get_iomap() below saying that it "cannot fail" if this one passes, but hey. I can add a struct member to track it (bots can then worry that it might be NULL, if they don't notice that nvgrace_gpu_check_device_ready() can't happen if nvgrace_gpu_open_device() didn't succeed, etc. etc.). >> >> if (nvdev->resmem.memlength) { >> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev) >> if (!__vfio_pci_memory_enabled(vdev)) >> return -EIO; >> >> - ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]); >> + ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0)); >> if (ret) >> return ret; >> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c >> index 62931dc381d8..5c8bd13f10d0 100644 >> --- a/drivers/vfio/pci/vfio_pci_core.c >> +++ b/drivers/vfio/pci/vfio_pci_core.c >> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma >> struct pci_dev *pdev = vdev->pdev; >> unsigned int index; >> u64 phys_len, req_len, pgoff, req_start; >> - int ret; >> + void __iomem *bar_io; >> >> index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); >> >> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma >> return -EINVAL; >> >> /* >> - * Even though we don't make use of the barmap for the mmap, >> - * we need to request the region and the barmap tracks that. >> + * Ensure the BAR resource region is reserved for use. >> */ >> - ret = vfio_pci_core_setup_barmap(vdev, index); >> - if (ret) >> - return ret; >> + bar_io = vfio_pci_core_get_iomap(vdev, index); >> + if (IS_ERR(bar_io)) >> + return PTR_ERR(bar_io); >> >> vma->vm_private_data = vdev; >> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); >> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c >> index 69a5c2d511e6..46cd44b22c9c 100644 >> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c >> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c >> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, >> * else. Check that PCI resources have been claimed for it. >> */ >> if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX || >> - vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index)) >> + IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index))) >> return -ENODEV; >> >> dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, >> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c >> index 3bfbb879a005..7f14dd46de17 100644 >> --- a/drivers/vfio/pci/vfio_pci_rdwr.c >> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c >> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, >> } >> EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); >> >> -/* >> - * The barmap is set up in vfio_pci_core_enable(). Callers use this >> - * function to check that the BAR resources are requested or that the >> - * pci_iomap() was done. >> - */ >> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) >> -{ >> - if (IS_ERR(vdev->barmap[bar])) >> - return PTR_ERR(vdev->barmap[bar]); >> - return 0; >> -} >> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); >> - >> ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, >> size_t count, loff_t *ppos, bool iswrite) >> { >> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, >> */ >> max_width = VFIO_PCI_IO_WIDTH_4; >> } else { >> - int ret = vfio_pci_core_setup_barmap(vdev, bar); >> - if (ret) { >> - done = ret; >> + io = vfio_pci_core_get_iomap(vdev, bar); >> + if (IS_ERR(io)) { >> + done = PTR_ERR(io); >> goto out; >> } >> - >> - io = vdev->barmap[bar]; >> } >> >> if (bar == vdev->msix_bar) { >> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, >> loff_t pos = offset & VFIO_PCI_OFFSET_MASK; >> int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); >> struct vfio_pci_ioeventfd *ioeventfd; >> + void __iomem *io; >> >> /* Only support ioeventfds into BARs */ >> if (bar > VFIO_PCI_BAR5_REGION_INDEX) >> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, >> if (count == 8) >> return -EINVAL; >> >> - ret = vfio_pci_core_setup_barmap(vdev, bar); >> - if (ret) >> - return ret; >> + io = vfio_pci_core_get_iomap(vdev, bar); >> + if (IS_ERR(io)) >> + return PTR_ERR(io); > > Sashiko seems to note a real existing error here that should also be > pulled out to a separate fix. Given the right offset, this could > generate a negative BAR value. Yuck, loff_t signed, yep. Isn't the real root of this that it never makes sense for VFIO_PCI_OFFSET_TO_INDEX() to return a negative index here or anywhere else? I suggest instead, to also avoid this elsewhere in future, something like: #define VFIO_PCI_OFFSET_TO_INDEX(off) ((u64)(off) >> VFIO_PCI_OFFSET_SHIFT) > The test at the end of the previous > chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`. Not necessary if VFIO_PCI_OFFSET_TO_INDEX() can't return < 0 (the magnitude would be 24b so can't overflow the `int bar` it's assigned into). > Do you want to pick that up in this series? I think it's the only case > that lets that slip through. Thanks, Sure, I'll post a fix. I don't think it needs to be part of this series though if it's just the macro, do you agree? Do you know why drivers/gpu/drm/i915/gvt/kvmgt.c has copied VFIO_PCI_OFFSET_TO_INDEX() and friends? Perhaps the shift was different (the reason drivers/vfio/pci/ism/main.c has its own versions). The same loff_t issue seems to exist in both of those places, unfortunately. Matt PS: with minor question: Relatedly, I'd made `bar` an int following existing convention in vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar) But I'll make this `unsigned int`, please flag if this violates taste and decency. IMO any BAR/index parameter should be unsigned; most are, some signed remain. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() 2026-05-08 15:30 ` Matt Evans @ 2026-05-08 17:45 ` Alex Williamson 0 siblings, 0 replies; 9+ messages in thread From: Alex Williamson @ 2026-05-08 17:45 UTC (permalink / raw) To: Matt Evans Cc: Kevin Tian, Jason Gunthorpe, Ankit Agrawal, Alistair Popple, Leon Romanovsky, Kees Cook, Shameer Kolothum, Yishai Hadas, Alexey Kardashevskiy, Eric Auger, Peter Xu, Vivek Kasireddy, Zhi Wang, kvm, linux-kernel, virtualization, alex On Fri, 8 May 2026 16:30:40 +0100 Matt Evans <mattev@meta.com> wrote: > Hi Alex, > > On 07/05/2026 23:21, Alex Williamson wrote: > > > > On Tue, 5 May 2026 10:38:31 -0700 > > Matt Evans <mattev@meta.com> wrote: > > > >> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the > >> resource request and iomap for the BARs was performed early, and > >> vfio_pci_core_setup_barmap() just checks those actions succeeded. > >> > >> Move this logic to a new helper that checks success and returns the > >> iomap address, replacing the various bare vdev->barmap[] lookups. > >> This maintains the error behaviour of the previous on-demand > >> vfio_pci_core_setup_barmap() scheme. > >> > >> Signed-off-by: Matt Evans <mattev@meta.com> > >> --- > >> drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++------- > >> drivers/vfio/pci/vfio_pci_core.c | 11 +++++------ > >> drivers/vfio/pci/vfio_pci_dmabuf.c | 2 +- > >> drivers/vfio/pci/vfio_pci_rdwr.c | 30 ++++++++--------------------- > >> drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++------- > >> include/linux/vfio_pci_core.h | 20 ++++++++++++++++++- > >> 6 files changed, 43 insertions(+), 44 deletions(-) > >> > >> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c > >> index fa056b69f899..e153002258ce 100644 > >> --- a/drivers/vfio/pci/nvgrace-gpu/main.c > >> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c > >> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev) > >> > >> /* > >> * GPU readiness is checked by reading the BAR0 registers. > >> - * > >> - * ioremap BAR0 to ensure that the BAR0 mapping is present before > >> - * register reads on first fault before establishing any GPU > >> - * memory mapping. > >> + * The BAR map was just set up by vfio_pci_core_enable(), so > >> + * check that was successful and bail early if not: > >> */ > >> - ret = vfio_pci_core_setup_barmap(vdev, 0); > >> - if (ret) > >> + if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0))) > >> goto error_exit; > > > > Sashiko notes we're not setting ret here. The bots are also paranoid > > about the unreachable condition that the get_iomap below could return an > > ERR_PTR. Maybe head off both by adding an __iomem pointer to the > > nvgrace_gpu_pci_core_device struct and a temporary one here. Store the > > iomap in the temporary variable, use it to test for IS_ERR() and > > PTR_ERR(), then set the pointer in the structure after the last error > > condition here. Add one line in the close_device to set it NULL. Then > > just use nvdev->bar0_io below. > > Right about ret. On the 2nd, the bots could benefit from a comment on > the ...get_iomap() below saying that it "cannot fail" if this one > passes, but hey. I can add a struct member to track it (bots can then > worry that it might be NULL, if they don't notice that > nvgrace_gpu_check_device_ready() can't happen if > nvgrace_gpu_open_device() didn't succeed, etc. etc.). While I agree that there's always something that can be overlooked, it does seem semantically cleaner that the io is tested when it's retrieved for the driver structure and used from that structure in a fixed lifecycle than retrieved without testing the results at time of use. > >> > >> if (nvdev->resmem.memlength) { > >> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev) > >> if (!__vfio_pci_memory_enabled(vdev)) > >> return -EIO; > >> > >> - ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]); > >> + ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0)); > >> if (ret) > >> return ret; > >> > >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > >> index 62931dc381d8..5c8bd13f10d0 100644 > >> --- a/drivers/vfio/pci/vfio_pci_core.c > >> +++ b/drivers/vfio/pci/vfio_pci_core.c > >> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma > >> struct pci_dev *pdev = vdev->pdev; > >> unsigned int index; > >> u64 phys_len, req_len, pgoff, req_start; > >> - int ret; > >> + void __iomem *bar_io; > >> > >> index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); > >> > >> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma > >> return -EINVAL; > >> > >> /* > >> - * Even though we don't make use of the barmap for the mmap, > >> - * we need to request the region and the barmap tracks that. > >> + * Ensure the BAR resource region is reserved for use. > >> */ > >> - ret = vfio_pci_core_setup_barmap(vdev, index); > >> - if (ret) > >> - return ret; > >> + bar_io = vfio_pci_core_get_iomap(vdev, index); > >> + if (IS_ERR(bar_io)) > >> + return PTR_ERR(bar_io); > >> > >> vma->vm_private_data = vdev; > >> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > >> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c > >> index 69a5c2d511e6..46cd44b22c9c 100644 > >> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c > >> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c > >> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > >> * else. Check that PCI resources have been claimed for it. > >> */ > >> if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX || > >> - vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index)) > >> + IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index))) > >> return -ENODEV; > >> > >> dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, > >> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c > >> index 3bfbb879a005..7f14dd46de17 100644 > >> --- a/drivers/vfio/pci/vfio_pci_rdwr.c > >> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c > >> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, > >> } > >> EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw); > >> > >> -/* > >> - * The barmap is set up in vfio_pci_core_enable(). Callers use this > >> - * function to check that the BAR resources are requested or that the > >> - * pci_iomap() was done. > >> - */ > >> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) > >> -{ > >> - if (IS_ERR(vdev->barmap[bar])) > >> - return PTR_ERR(vdev->barmap[bar]); > >> - return 0; > >> -} > >> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); > >> - > >> ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, > >> size_t count, loff_t *ppos, bool iswrite) > >> { > >> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, > >> */ > >> max_width = VFIO_PCI_IO_WIDTH_4; > >> } else { > >> - int ret = vfio_pci_core_setup_barmap(vdev, bar); > >> - if (ret) { > >> - done = ret; > >> + io = vfio_pci_core_get_iomap(vdev, bar); > >> + if (IS_ERR(io)) { > >> + done = PTR_ERR(io); > >> goto out; > >> } > >> - > >> - io = vdev->barmap[bar]; > >> } > >> > >> if (bar == vdev->msix_bar) { > >> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, > >> loff_t pos = offset & VFIO_PCI_OFFSET_MASK; > >> int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset); > >> struct vfio_pci_ioeventfd *ioeventfd; > >> + void __iomem *io; > >> > >> /* Only support ioeventfds into BARs */ > >> if (bar > VFIO_PCI_BAR5_REGION_INDEX) > >> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, > >> if (count == 8) > >> return -EINVAL; > >> > >> - ret = vfio_pci_core_setup_barmap(vdev, bar); > >> - if (ret) > >> - return ret; > >> + io = vfio_pci_core_get_iomap(vdev, bar); > >> + if (IS_ERR(io)) > >> + return PTR_ERR(io); > > > > Sashiko seems to note a real existing error here that should also be > > pulled out to a separate fix. Given the right offset, this could > > generate a negative BAR value. > > Yuck, loff_t signed, yep. Isn't the real root of this that it > never makes sense for VFIO_PCI_OFFSET_TO_INDEX() to return a negative > index here or anywhere else? Yes > I suggest instead, to also avoid this elsewhere in future, something > like: > > #define VFIO_PCI_OFFSET_TO_INDEX(off) ((u64)(off) >> VFIO_PCI_OFFSET_SHIFT) Sure, that has better coverage. > > The test at the end of the previous > > chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`. > > Not necessary if VFIO_PCI_OFFSET_TO_INDEX() can't return < 0 (the > magnitude would be 24b so can't overflow the `int bar` it's assigned > into). Yep. > > Do you want to pick that up in this series? I think it's the only case > > that lets that slip through. Thanks, > > Sure, I'll post a fix. I don't think it needs to be part of this series > though if it's just the macro, do you agree? I was hoping to collect the fixes from this series for v7.1-rc regardless, so either way. > Do you know why drivers/gpu/drm/i915/gvt/kvmgt.c has copied > VFIO_PCI_OFFSET_TO_INDEX() and friends? Perhaps the shift was different > (the reason drivers/vfio/pci/ism/main.c has its own versions). The same > loff_t issue seems to exist in both of those places, unfortunately. I think because it was previously defined in a drivers/vfio/pci/ header that couldn't be cleanly included. The region shift is implementation, not API, so drivers are free to define their own region spacing, see for instance the new ISM driver that needs >40bits per region. We're likely going to move to a maple tree for defining regions in the future so that we can more easily account for such large BARs as they become more common. Jason linked a branch with a rough draft of this not long ago. > PS: with minor question: > Relatedly, I'd made `bar` an int following existing convention in > > vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar) > > But I'll make this `unsigned int`, please flag if this violates taste > and decency. IMO any BAR/index parameter should be unsigned; most are, > some signed remain. Yep, I think that would make sense and avoids needing two tests to make sure it's in range. Thanks, Alex ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-08 17:45 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans 2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans 2026-05-07 22:21 ` Alex Williamson 2026-05-08 14:14 ` Matt Evans 2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans 2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans 2026-05-07 22:21 ` Alex Williamson 2026-05-08 15:30 ` Matt Evans 2026-05-08 17:45 ` Alex Williamson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox