public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Matt Evans <mattev@meta.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Kevin Tian <kevin.tian@intel.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Ankit Agrawal <ankita@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Kees Cook <kees@kernel.org>,
	Shameer Kolothum <skolothumtho@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	Eric Auger <eric.auger@redhat.com>, Peter Xu <peterx@redhat.com>,
	Vivek Kasireddy <vivek.kasireddy@intel.com>,
	Zhi Wang <zhiw@nvidia.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux.dev
Subject: Re: [PATCH v3 1/3] vfio/pci: Set up bar resources and maps in vfio_pci_core_enable()
Date: Tue, 5 May 2026 17:40:14 +0100	[thread overview]
Message-ID: <ab327008-2b5f-4fbb-8045-736ff90ea5ab@meta.com> (raw)
In-Reply-To: <20260430141341.163ed827@shazbot.org>

Hi Alex,

On 30/04/2026 21:13, Alex Williamson wrote:
> 
> On Thu, 30 Apr 2026 03:03:20 -0700
> Matt Evans <mattev@meta.com> wrote:
> 
>> Previously BAR resource requests and the corresponding pci_iomap()
>> were performed on-demand and without synchronisation, which was racy.
>> Rather than add synchronisation, it's simplest to address this by
>> doing both activities from vfio_pci_core_enable().
>>
>> The resource allocation and/or pci_iomap() can still fail; their
>> status is tracked and existing calls to vfio_pci_core_setup_barmap()
>> will fail in a similar way to before.  This keeps the point of failure
>> as observed by userspace the same, i.e. failures to request/map unused
>> BARs are benign.
>>
>> Fixes: 7f5764e179c6 ("vfio: use vfio_pci_core_setup_barmap to map bar in mmap")
>> Fixes: 0d77ed3589ac0 ("vfio/pci: Pull BAR mapping setup from read-write path")
> 
> Neither of these introduced races, they only moved what they were
> already doing into a function or made use of that shared function for
> what they were already doing.  I'm inclined to believe the raciness
> existed from the introduction, 89e1f7d4c66d.
> 
>> Signed-off-by: Matt Evans <mattev@meta.com>
>> ---
>>   drivers/vfio/pci/vfio_pci_core.c | 33 ++++++++++++++++++++++++++++++++
>>   drivers/vfio/pci/vfio_pci_rdwr.c | 29 ++++++++++++----------------
>>   2 files changed, 45 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 3f8d093aacf8..eab4f2626b39 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -482,6 +482,38 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
>>   }
>>   #endif /* CONFIG_PM */
>>   
>> +static void vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev)
>> +{
>> +	struct pci_dev *pdev = vdev->pdev;
>> +	int i;
>> +
>> +	/*
>> +	 * Eager-request BAR resources, and iomap.  Soft failures are
>> +	 * allowed, and consumers must check the barmap before use in
>> +	 * order to give compatible user-visible behaviour with the
>> +	 * previous on-demand allocation method.
>> +	 */
>> +	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>> +		int bar = i + PCI_STD_RESOURCES;
>> +		void __iomem *io = ERR_PTR(-ENODEV);
> 
> It would collapse the nesting depth to just do:
> 
> 		vdev->barmap[bar] = ERR_PTR(-ENODEV);
> 
> 		if (!pci_resource_len(pdev, i))
> 			continue;
> 
> 		if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) {
> 			pci_dbg(vdev->pdev, "Failed to reserve region %d\n", bar);
> 			vdev->barmap[bar] = ERR_PTR(-EBUSY);
> 			continue;
> 		}
> 
> 		vdev->barmap[bar] = pci_iomap(pdev, bar, 0);
> 		if (!vdev->barmap[bar]) {
> 			pci_dbg(vdev->pdev, "Failed to iomap region %d\n", bar);
> 			vdev->barmap[bar] = ERR_PTR(-ENOMEM);
> 		}
> 
> It's debatable what level to use for the errors, but we were previously
> silent on this, so going all the way to pci_warn() seems unnecessary.

Hm, okay, returned it to a nesting-less format and replaced pci_warn()s 
with pci_dbg().

>> +
>> +		if (pci_resource_len(pdev, i) > 0) {
>> +			if (pci_request_selected_regions(pdev, 1 << bar, "vfio")) {
>> +				pci_warn(vdev->pdev, "Failed to reserve region %d\n", bar);
>> +				io = ERR_PTR(-EBUSY);
>> +			} else {
>> +				io = pci_iomap(pdev, bar, 0);
>> +				if (!io) {
>> +					pci_warn(vdev->pdev, "Failed to iomap region %d\n",
>> +						 bar);
>> +					io = ERR_PTR(-ENOMEM);
>> +				}
>> +			}
>> +		}
>> +		vdev->barmap[bar] = io;
>> +	}
>> +}
>> +
>>   /*
>>    * The pci-driver core runtime PM routines always save the device state
>>    * before going into suspended state. If the device is going into low power
>> @@ -568,6 +600,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
>>   	if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
>>   		vdev->has_vga = true;
>>   
>> +	vfio_pci_core_map_bars(vdev);
>>   
>>   	return 0;
> 
> You're missing the barmap test in vfio_pci_core_disable() now, it's
> still testing for NULL, which is (almost?) never true.  It needs to
> convert to IS_ERR_OR_NULL().

Arrrrgh, yes it does, thank you.  (For the second time, the first being 
the !IS_ERR() typo you caught in patch #3 :(  Thanks there also; it 
slipped by my usual testing routine.)

>> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
>> index 4251ee03e146..f66ad3d96481 100644
>> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
>> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
>> @@ -200,25 +200,20 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
>>   
>>   int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
>>   {
>> -	struct pci_dev *pdev = vdev->pdev;
>> -	int ret;
>> -	void __iomem *io;
>> -
>> -	if (vdev->barmap[bar])
>> -		return 0;
>> -
>> -	ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
>> -	if (ret)
>> -		return ret;
>> -
>> -	io = pci_iomap(pdev, bar, 0);
>> -	if (!io) {
>> -		pci_release_selected_regions(pdev, 1 << bar);
>> -		return -ENOMEM;
>> -	}
>> +	/*
>> +	 * The barmap is set up in vfio_pci_core_enable().  Callers
>> +	 * use this function to check that the BAR resources are
>> +	 * requested or that the pci_iomap() was done.
>> +	 */
> 
> Looks like a function level comment to be placed above the function
> definition.  TBH, the comment in the previous function could also be
> pulled up as a function level comment.
> 
>> +	if (bar < 0 || bar >= PCI_STD_NUM_BARS)
> 
> Maybe `if ((unsigned)bar >= PCI_STD_NUM_BARS)` but really author
> preference here.
> 
>> +		return -EINVAL;
>>   
>> -	vdev->barmap[bar] = io;
>> +	/* Did vfio_pci_core_map_bars() set it up yet? */
>> +	if (!vdev->barmap[bar])
>> +		return -ENODEV;
> 
> What hits this?  Should it be a WARN_ON_ONCE?  It would need to be a use
> case that accesses barmap outside of the window between enable and
> disable, where I think we're defining the contract that it's only valid
> between those events.  Both this and the range check could move to the
> iomap implemenation to keep the Fixes: patch reasonably small since
> afaik they're not triggered.  The BAR range test could be WARN_ON_ONCE
> as well, only driver bugs should hit it.  Thanks,

I've reduced the fix patch #1 to just an IS_ERR test (without the null 
or range checks as you suggest).  And indeed WARN_ON_ONCE() is a good 
idea as only tremendous mishaps would lead to these conditions 
triggering (worth testing though).

Also ack on your suggestion on patch #2 to make the call to 
nvgrace_gpu_wait_device_ready() more minimalist, and to order the 2x 
fixes up front.  Posting v4 shortly, cheers!


Thanks,


Matt


  reply	other threads:[~2026-05-05 16:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 10:03 [PATCH v3 0/3] vfio/pci: Request resources and map BARs at enable time Matt Evans
2026-04-30 10:03 ` [PATCH v3 1/3] vfio/pci: Set up bar resources and maps in vfio_pci_core_enable() Matt Evans
2026-04-30 20:13   ` Alex Williamson
2026-05-05 16:40     ` Matt Evans [this message]
2026-04-30 10:03 ` [PATCH v3 2/3] vfio/pci: Replace vfio_pci_core_setup_barmap() with vfio_pci_core_get_iomap() Matt Evans
2026-04-30 20:13   ` Alex Williamson
2026-04-30 10:03 ` [PATCH v3 3/3] vfio/pci: Check BAR resources before exporting a DMABUF Matt Evans
2026-04-30 20:13   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab327008-2b5f-4fbb-8045-736ff90ea5ab@meta.com \
    --to=mattev@meta.com \
    --cc=aik@ozlabs.ru \
    --cc=alex@shazbot.org \
    --cc=ankita@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=eric.auger@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=skolothumtho@nvidia.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vivek.kasireddy@intel.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox