The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Matt Evans <mattev@meta.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Kevin Tian <kevin.tian@intel.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Ankit Agrawal <ankita@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Kees Cook <kees@kernel.org>,
	Shameer Kolothum <skolothumtho@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	Eric Auger <eric.auger@redhat.com>, Peter Xu <peterx@redhat.com>,
	Vivek Kasireddy <vivek.kasireddy@intel.com>,
	Zhi Wang <zhiw@nvidia.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux.dev
Subject: Re: [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap()
Date: Fri, 8 May 2026 16:30:40 +0100	[thread overview]
Message-ID: <015b1e9c-0a2d-472a-b750-9154800832ee@meta.com> (raw)
In-Reply-To: <20260507162141.072483ce@shazbot.org>

Hi Alex,

On 07/05/2026 23:21, Alex Williamson wrote:
> 
> On Tue, 5 May 2026 10:38:31 -0700
> Matt Evans <mattev@meta.com> wrote:
> 
>> Since "vfio/pci: Set up barmap in vfio_pci_core_enable()", the
>> resource request and iomap for the BARs was performed early, and
>> vfio_pci_core_setup_barmap() just checks those actions succeeded.
>>
>> Move this logic to a new helper that checks success and returns the
>> iomap address, replacing the various bare vdev->barmap[] lookups.
>> This maintains the error behaviour of the previous on-demand
>> vfio_pci_core_setup_barmap() scheme.
>>
>> Signed-off-by: Matt Evans <mattev@meta.com>
>> ---
>>   drivers/vfio/pci/nvgrace-gpu/main.c | 11 ++++-------
>>   drivers/vfio/pci/vfio_pci_core.c    | 11 +++++------
>>   drivers/vfio/pci/vfio_pci_dmabuf.c  |  2 +-
>>   drivers/vfio/pci/vfio_pci_rdwr.c    | 30 ++++++++---------------------
>>   drivers/vfio/pci/virtio/legacy_io.c | 13 ++++++-------
>>   include/linux/vfio_pci_core.h       | 20 ++++++++++++++++++-
>>   6 files changed, 43 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
>> index fa056b69f899..e153002258ce 100644
>> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
>> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
>> @@ -184,13 +184,10 @@ static int nvgrace_gpu_open_device(struct vfio_device *core_vdev)
>>   
>>   	/*
>>   	 * GPU readiness is checked by reading the BAR0 registers.
>> -	 *
>> -	 * ioremap BAR0 to ensure that the BAR0 mapping is present before
>> -	 * register reads on first fault before establishing any GPU
>> -	 * memory mapping.
>> +	 * The BAR map was just set up by vfio_pci_core_enable(), so
>> +	 * check that was successful and bail early if not:
>>   	 */
>> -	ret = vfio_pci_core_setup_barmap(vdev, 0);
>> -	if (ret)
>> +	if (IS_ERR(vfio_pci_core_get_iomap(vdev, 0)))
>>   		goto error_exit;
> 
> Sashiko notes we're not setting ret here.  The bots are also paranoid
> about the unreachable condition that the get_iomap below could return an
> ERR_PTR.  Maybe head off both by adding an __iomem pointer to the
> nvgrace_gpu_pci_core_device struct and a temporary one here.  Store the
> iomap in the temporary variable, use it to test for IS_ERR() and
> PTR_ERR(), then set the pointer in the structure after the last error
> condition here.  Add one line in the close_device to set it NULL.  Then
> just use nvdev->bar0_io below.

Right about ret.  On the 2nd, the bots could benefit from a comment on
the ...get_iomap() below saying that it "cannot fail" if this one
passes, but hey.  I can add a struct member to track it (bots can then
worry that it might be NULL, if they don't notice that
nvgrace_gpu_check_device_ready() can't happen if
nvgrace_gpu_open_device() didn't succeed, etc. etc.).

>>   
>>   	if (nvdev->resmem.memlength) {
>> @@ -275,7 +272,7 @@ nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev)
>>   	if (!__vfio_pci_memory_enabled(vdev))
>>   		return -EIO;
>>   
>> -	ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]);
>> +	ret = nvgrace_gpu_wait_device_ready(vfio_pci_core_get_iomap(vdev, 0));
>>   	if (ret)
>>   		return ret;
>>   
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 62931dc381d8..5c8bd13f10d0 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -1761,7 +1761,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>>   	struct pci_dev *pdev = vdev->pdev;
>>   	unsigned int index;
>>   	u64 phys_len, req_len, pgoff, req_start;
>> -	int ret;
>> +	void __iomem *bar_io;
>>   
>>   	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
>>   
>> @@ -1795,12 +1795,11 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
>>   		return -EINVAL;
>>   
>>   	/*
>> -	 * Even though we don't make use of the barmap for the mmap,
>> -	 * we need to request the region and the barmap tracks that.
>> +	 * Ensure the BAR resource region is reserved for use.
>>   	 */
>> -	ret = vfio_pci_core_setup_barmap(vdev, index);
>> -	if (ret)
>> -		return ret;
>> +	bar_io = vfio_pci_core_get_iomap(vdev, index);
>> +	if (IS_ERR(bar_io))
>> +		return PTR_ERR(bar_io);
>>   
>>   	vma->vm_private_data = vdev;
>>   	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
>> index 69a5c2d511e6..46cd44b22c9c 100644
>> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
>> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
>> @@ -248,7 +248,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>>   	 * else.  Check that PCI resources have been claimed for it.
>>   	 */
>>   	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX ||
>> -	    vfio_pci_core_setup_barmap(vdev, get_dma_buf.region_index))
>> +	    IS_ERR(vfio_pci_core_get_iomap(vdev, get_dma_buf.region_index)))
>>   		return -ENODEV;
>>   
>>   	dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges,
>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 3bfbb879a005..7f14dd46de17 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -198,19 +198,6 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
>>   }
>>   EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>>   
>> -/*
>> - * The barmap is set up in vfio_pci_core_enable().  Callers use this
>> - * function to check that the BAR resources are requested or that the
>> - * pci_iomap() was done.
>> - */
>> -int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>> -{
>> -	if (IS_ERR(vdev->barmap[bar]))
>> -		return PTR_ERR(vdev->barmap[bar]);
>> -	return 0;
>> -}
>> -EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
>> -
>>   ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   			size_t count, loff_t *ppos, bool iswrite)
>>   {
>> @@ -262,13 +249,11 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
>>   		 */
>>   		max_width = VFIO_PCI_IO_WIDTH_4;
>>   	} else {
>> -		int ret = vfio_pci_core_setup_barmap(vdev, bar);
>> -		if (ret) {
>> -			done = ret;
>> +		io = vfio_pci_core_get_iomap(vdev, bar);
>> +		if (IS_ERR(io)) {
>> +			done = PTR_ERR(io);
>>   			goto out;
>>   		}
>> -
>> -		io = vdev->barmap[bar];
>>   	}
>>   
>>   	if (bar == vdev->msix_bar) {
>> @@ -423,6 +408,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>>   	loff_t pos = offset & VFIO_PCI_OFFSET_MASK;
>>   	int ret, bar = VFIO_PCI_OFFSET_TO_INDEX(offset);
>>   	struct vfio_pci_ioeventfd *ioeventfd;
>> +	void __iomem *io;
>>   
>>   	/* Only support ioeventfds into BARs */
>>   	if (bar > VFIO_PCI_BAR5_REGION_INDEX)
>> @@ -440,9 +426,9 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
>>   	if (count == 8)
>>   		return -EINVAL;
>>   
>> -	ret = vfio_pci_core_setup_barmap(vdev, bar);
>> -	if (ret)
>> -		return ret;
>> +	io = vfio_pci_core_get_iomap(vdev, bar);
>> +	if (IS_ERR(io))
>> +		return PTR_ERR(io);
> 
> Sashiko seems to note a real existing error here that should also be
> pulled out to a separate fix.  Given the right offset, this could
> generate a negative BAR value.

Yuck, loff_t signed, yep.  Isn't the real root of this that it
never makes sense for VFIO_PCI_OFFSET_TO_INDEX() to return a negative
index here or anywhere else?

I suggest instead, to also avoid this elsewhere in future, something
like:

  #define VFIO_PCI_OFFSET_TO_INDEX(off)	((u64)(off) >> VFIO_PCI_OFFSET_SHIFT)

> The test at the end of the previous
> chunk should should be expanded to `if (bar < 0 || bar > ...BAR5...)`.

Not necessary if VFIO_PCI_OFFSET_TO_INDEX() can't return < 0 (the
magnitude would be 24b so can't overflow the `int bar` it's assigned
into).

> Do you want to pick that up in this series?  I think it's the only case
> that lets that slip through.  Thanks,

Sure, I'll post a fix.  I don't think it needs to be part of this series
though if it's just the macro, do you agree?

Do you know why drivers/gpu/drm/i915/gvt/kvmgt.c has copied
VFIO_PCI_OFFSET_TO_INDEX() and friends?  Perhaps the shift was different
(the reason drivers/vfio/pci/ism/main.c has its own versions).  The same
loff_t issue seems to exist in both of those places, unfortunately.


Matt

PS: with minor question:
Relatedly, I'd made `bar` an int following existing convention in

  vfio_pci_core_get_iomap(struct vfio_pci_core_device *vdev, int bar)

But I'll make this `unsigned int`, please flag if this violates taste
and decency.  IMO any BAR/index parameter should be unsigned; most are,
some signed remain.


  reply	other threads:[~2026-05-08 15:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 17:38 [PATCH v4 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
2026-05-05 17:38 ` [PATCH v4 1/3] vfio/pci: Set up BAR resources and maps in vfio_pci_core_enable() Matt Evans
2026-05-07 22:21   ` Alex Williamson
2026-05-08 14:14     ` Matt Evans
2026-05-05 17:38 ` [PATCH v4 2/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans
2026-05-05 17:38 ` [PATCH v4 3/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
2026-05-07 22:21   ` Alex Williamson
2026-05-08 15:30     ` Matt Evans [this message]
2026-05-08 17:45       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=015b1e9c-0a2d-472a-b750-9154800832ee@meta.com \
    --to=mattev@meta.com \
    --cc=aik@ozlabs.ru \
    --cc=alex@shazbot.org \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=eric.auger@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=skolothumtho@nvidia.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vivek.kasireddy@intel.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox