From: Matt Evans <mattev@meta.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Kevin Tian <kevin.tian@intel.com>, Jason Gunthorpe <jgg@ziepe.ca>,
Ankit Agrawal <ankita@nvidia.com>,
Alistair Popple <apopple@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, Kees Cook <kees@kernel.org>,
Shameer Kolothum <skolothumtho@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>,
Alexey Kardashevskiy <aik@ozlabs.ru>,
Eric Auger <eric.auger@redhat.com>, Peter Xu <peterx@redhat.com>,
Vivek Kasireddy <vivek.kasireddy@intel.com>,
Zhi Wang <zhiw@nvidia.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux.dev
Subject: Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
Date: Fri, 8 May 2026 16:30:40 +0100 [thread overview]
Message-ID: <015b1e9c-0a2d-472a-b750-9154800832ee@meta.com> (raw)
In-Reply-To: <20260507162141.072483ce@shazbot.org>
Hi Alex,
On 07/05/2026 23:21, Alex Williamson wrote:
>
> On Tue, 5 May 2026 10:38:31 -0700
> Matt Evans <mattev@meta.com> wrote:
>
>> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
>> resource request and iomap for the BARs was performed early, and
>> vfio_pci_core_setup_barmap() just checks those actions succeeded.
>>
>> Move this logic to a new helper that checks success and returns the
>> iomap address, replacing the various bare vdev->barmap[] lookups.
>> This maintains the error behaviour of the previous on-demand
>> vfio_pci_core_setup_barmap() scheme.
>>
>> Signed-off-by: Matt Evans <mattev@meta.com>
>> ---
>> drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++-------
>> drivers/vfio/pci/vfio_pci_core.c | 11 +++++------
>> drivers/vfio/pci/vfio_pci_dmabuf.c | 2 +-
>> drivers/vfio/pci/vfio_pci_rdwr.c | 30 ++++++++---------------------
>> drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++-------
>> include/linux/vfio_pci_core.h | 20 ++++++++++++++++++-
>> 6 files changed, 43 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
>> index fa056b69f899..e153002258ce 100644
>> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
>> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
>> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
>>
>> /*
>> * GPU readiness is checked by reading the BAR0 registers.
>> - *
>> - * ioremap BAR0 to ensure that the BAR0 mapping is present before
>> - * register reads on first fault before establishing any GPU
>> - * memory mapping.
>> + * The BAR map was just set up by vfio_pci_core_enable(), so
>> + * check that was successful and bail early if not:
>> */
>> - ret = vfio_pci_core_setup_barmap(vdev, 0);
>> - if (ret)
>> + if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0)))
>> goto error_exit;
>
> Sashiko notes we're not setting ret here. The bots are also paranoid
> about the unreachable condition that the get_iomap below could return an
> ERR_PTR. Maybe head off both by adding an __iomem pointer to the
> nvgrace_gpu_pci_core_device struct and a temporary one here. Store the
> iomap in the temporary variable, use it to test for IS_ERR() and
> PTR_ERR(), then set the pointer in the structure after the last error
> condition here. Add one line in the close_device to set it NULL. Then
> just use nvdev->bar0_io below.
Right about ret. On the 2nd, the bots could benefit from a comment on
the ...get_iomap() below saying that it "cannot fail" if this one
passes, but hey. I can add a struct member to track it (bots can then
worry that it might be NULL, if they don't notice that
nvgrace_gpu_check_device_ready() can't happen if
nvgrace_gpu_open_device() didn't succeed, etc. etc.).
>>
>> if (nvdev->resmem.memlength) {
>> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
>> if (!__vfio_pci_memory_enabled(vdev))
>> return -EIO;
>>
>> - ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
>> + ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0));
>> if (ret)
>> return ret;
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 62931dc381d8..5c8bd13f10d0 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>> struct pci_dev *pdev = vdev->pdev;
>> unsigned int index;
>> u64 phys_len, req_len, pgoff, req_start;
>> - int ret;
>> + void __iomem *bar_io;
>>
>> index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>>
>> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>> return -EINVAL;
>>
>> /*
>> - * Even though we don't make use of the barmap for the mmap,
>> - * we need to request the region and the barmap tracks that.
>> + * Ensure the BAR resource region is reserved for use.
>> */
>> - ret = vfio_pci_core_setup_barmap(vdev, index);
>> - if (ret)
>> - return ret;
>> + bar_io = vfio_pci_core_get_iomap(vdev, index);
>> + if (IS_ERR(bar_io))
>> + return PTR_ERR(bar_io);
>>
>> vma->vm_private_data = vdev;
>> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
>> index 69a5c2d511e6..46cd44b22c9c 100644
>> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
>> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
>> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>> * else. Check that PCI resources have been claimed for it.
>> */
>> if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
>> - vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
>> + IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
>> return -ENODEV;
>>
>> dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 3bfbb879a005..7f14dd46de17 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>> }
>> EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>>
>> -/*
>> - * The barmap is set up in vfio_pci_core_enable(). Callers use this
>> - * function to check that the BAR resources are requested or that the
>> - * pci_iomap() was done.
>> - */
>> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> -{
>> - if (IS_ERR(vdev->barmap[bar]))
>> - return PTR_ERR(vdev->barmap[bar]);
>> - return 0;
>> -}
>> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
>> -
>> ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>> size_t count, loff_t *ppos, bool iswrite)
>> {
>> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>> */
>> max_width = VFIO_PCI_IO_WIDTH_4;
>> } else {
>> - int ret = vfio_pci_core_setup_barmap(vdev, bar);
>> - if (ret) {
>> - done = ret;
>> + io = vfio_pci_core_get_iomap(vdev, bar);
>> + if (IS_ERR(io)) {
>> + done = PTR_ERR(io);
>> goto out;
>> }
>> -
>> - io = vdev->barmap[bar];
>> }
>>
>> if (bar == vdev->msix_bar) {
>> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>> loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
>> int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
>> struct vfio_pci_ioeventfd *ioeventfd;
>> + void __iomem *io;
>>
>> /* Only support ioeventfds into BARs */
>> if (bar > VFIO_PCI_BAR5_REGION_INDEX)
>> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>> if (count == 8)
>> return -EINVAL;
>>
>> - ret = vfio_pci_core_setup_barmap(vdev, bar);
>> - if (ret)
>> - return ret;
>> + io = vfio_pci_core_get_iomap(vdev, bar);
>> + if (IS_ERR(io))
>> + return PTR_ERR(io);
>
> Sashiko seems to note a real existing error here that should also be
> pulled out to a separate fix. Given the right offset, this could
> generate a negative BAR value.
Yuck, loff_t signed, yep. Isn't the real root of this that it
never makes sense for VFIO_PCI_OFFSET_TO_INDEX() to return a negative
index here or anywhere else?
I suggest instead, to also avoid this elsewhere in future, something
like:
#define VFIO_PCI_OFFSET_TO_INDEX(off) ((u64)(off) >> VFIO_PCI_OFFSET_SHIFT)
> The test at the end of the previous
> chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`.
Not necessary if VFIO_PCI_OFFSET_TO_INDEX() can't return < 0 (the
magnitude would be 24b so can't overflow the `int bar` it's assigned
into).
> Do you want to pick that up in this series? I think it's the only case
> that lets that slip through. Thanks,
Sure, I'll post a fix. I don't think it needs to be part of this series
though if it's just the macro, do you agree?
Do you know why drivers/gpu/drm/i915/gvt/kvmgt.c has copied
VFIO_PCI_OFFSET_TO_INDEX() and friends? Perhaps the shift was different
(the reason drivers/vfio/pci/ism/main.c has its own versions). The same
loff_t issue seems to exist in both of those places, unfortunately.
Matt
PS: with minor question:
Relatedly, I'd made `bar` an int following existing convention in
vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar)
But I'll make this `unsigned int`, please flag if this violates taste
and decency. IMO any BAR/index parameter should be unsigned; most are,
some signed remain.
next prev parent reply other threads:[~2026-05-08 15:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
2026-05-07 22:21 ` Alex Williamson
2026-05-08 14:14 ` Matt Evans
2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans
2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
2026-05-07 22:21 ` Alex Williamson
2026-05-08 15:30 ` Matt Evans [this message]
2026-05-08 17:45 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=015b1e9c-0a2d-472a-b750-9154800832ee@meta.com \
--to=mattev@meta.com \
--cc=aik@ozlabs.ru \
--cc=alex@shazbot.org \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=eric.auger@redhat.com \
--cc=jgg@ziepe.ca \
--cc=kees@kernel.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=skolothumtho@nvidia.com \
--cc=virtualization@lists.linux.dev \
--cc=vivek.kasireddy@intel.com \
--cc=yishaih@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox