kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vfio/pci: Separate SR-IOV VF dev_set
@ 2025-06-26 22:56 Alex Williamson
  2025-06-30  6:32 ` Tian, Kevin
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Alex Williamson @ 2025-06-26 22:56 UTC (permalink / raw)
  To: alex.williamson, kvm
  Cc: aaronlewis, jgg, bhelgaas, dmatlack, vipinsh, seanjc, jrhilke,
	kevin.tian

In the below noted Fixes commit we introduced a reflck mutex to allow
better scaling between devices for open and close.  The reflck was
based on the hot reset granularity, device level for root bus devices
which cannot support hot reset or bus/slot reset otherwise.  Overlooked
in this were SR-IOV VFs, where there's also no bus reset option, but
the default for a non-root-bus, non-slot-based device is bus level
reflck granularity.

The reflck mutex has since become the dev_set mutex and is our defacto
serialization for various operations and ioctls.  It still seems to be
the case though that sets of vfio-pci devices really only need
serialization relative to hot resets affecting the entire set, which
is not relevant to SR-IOV VFs.  As described in the Closes link below,
this serialization contributes to startup latency when multiple VFs
sharing the same "bus" are opened concurrently.

Mark the device itself as the basis of the dev_set for SR-IOV VFs.

Reported-by: Aaron Lewis <aaronlewis@google.com>
Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com
Tested-by: Aaron Lewis <aaronlewis@google.com>
Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 6328c3a05bcd..261a6dc5a5fc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
 		return -EBUSY;
 	}
 
-	if (pci_is_root_bus(pdev->bus)) {
+	if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
 		ret = vfio_assign_device_set(&vdev->vdev, vdev);
 	} else if (!pci_probe_reset_slot(pdev->slot)) {
 		ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-06-26 22:56 [PATCH] vfio/pci: Separate SR-IOV VF dev_set Alex Williamson
@ 2025-06-30  6:32 ` Tian, Kevin
  2025-06-30 13:15 ` Yi Liu
  2025-07-02 16:00 ` Jason Gunthorpe
  2 siblings, 0 replies; 13+ messages in thread
From: Tian, Kevin @ 2025-06-30  6:32 UTC (permalink / raw)
  To: Alex Williamson, kvm@vger.kernel.org
  Cc: aaronlewis@google.com, jgg@nvidia.com, bhelgaas@google.com,
	dmatlack@google.com, vipinsh@google.com, seanjc@google.com,
	jrhilke@google.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 27, 2025 6:56 AM
> 
> In the below noted Fixes commit we introduced a reflck mutex to allow
> better scaling between devices for open and close.  The reflck was
> based on the hot reset granularity, device level for root bus devices
> which cannot support hot reset or bus/slot reset otherwise.  Overlooked
> in this were SR-IOV VFs, where there's also no bus reset option, but
> the default for a non-root-bus, non-slot-based device is bus level
> reflck granularity.
> 
> The reflck mutex has since become the dev_set mutex and is our defacto
> serialization for various operations and ioctls.  It still seems to be
> the case though that sets of vfio-pci devices really only need
> serialization relative to hot resets affecting the entire set, which
> is not relevant to SR-IOV VFs.  As described in the Closes link below,
> this serialization contributes to startup latency when multiple VFs
> sharing the same "bus" are opened concurrently.
> 
> Mark the device itself as the basis of the dev_set for SR-IOV VFs.
> 
> Reported-by: Aaron Lewis <aaronlewis@google.com>
> Closes: https://lore.kernel.org/all/20250626180424.632628-1-
> aaronlewis@google.com
> Tested-by: Aaron Lewis <aaronlewis@google.com>
> Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-06-26 22:56 [PATCH] vfio/pci: Separate SR-IOV VF dev_set Alex Williamson
  2025-06-30  6:32 ` Tian, Kevin
@ 2025-06-30 13:15 ` Yi Liu
  2025-06-30 14:57   ` Alex Williamson
  2025-07-02 16:00 ` Jason Gunthorpe
  2 siblings, 1 reply; 13+ messages in thread
From: Yi Liu @ 2025-06-30 13:15 UTC (permalink / raw)
  To: Alex Williamson, kvm
  Cc: aaronlewis, jgg, bhelgaas, dmatlack, vipinsh, seanjc, jrhilke,
	kevin.tian

On 2025/6/27 06:56, Alex Williamson wrote:
> In the below noted Fixes commit we introduced a reflck mutex to allow
> better scaling between devices for open and close.  The reflck was
> based on the hot reset granularity, device level for root bus devices
> which cannot support hot reset or bus/slot reset otherwise.  Overlooked
> in this were SR-IOV VFs, where there's also no bus reset option, but
> the default for a non-root-bus, non-slot-based device is bus level
> reflck granularity.
> 
> The reflck mutex has since become the dev_set mutex and is our defacto
> serialization for various operations and ioctls.  It still seems to be
> the case though that sets of vfio-pci devices really only need

a nit: not sure if mentioning 2cd8b14aaa66 which convers reflck to dev_set
mutex is helpful. Perhaps, it's welcomed by people working on backporting. :)

> serialization relative to hot resets affecting the entire set, which
> is not relevant to SR-IOV VFs.  As described in the Closes link below,
> this serialization contributes to startup latency when multiple VFs
> sharing the same "bus" are opened concurrently.
> 
> Mark the device itself as the basis of the dev_set for SR-IOV VFs.
> 
> Reported-by: Aaron Lewis <aaronlewis@google.com>
> Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com
> Tested-by: Aaron Lewis <aaronlewis@google.com>
> Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>   drivers/vfio/pci/vfio_pci_core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-06-30 13:15 ` Yi Liu
@ 2025-06-30 14:57   ` Alex Williamson
  0 siblings, 0 replies; 13+ messages in thread
From: Alex Williamson @ 2025-06-30 14:57 UTC (permalink / raw)
  To: Yi Liu
  Cc: kvm, aaronlewis, jgg, bhelgaas, dmatlack, vipinsh, seanjc,
	jrhilke, kevin.tian

On Mon, 30 Jun 2025 21:15:58 +0800
Yi Liu <yi.l.liu@intel.com> wrote:

> On 2025/6/27 06:56, Alex Williamson wrote:
> > In the below noted Fixes commit we introduced a reflck mutex to allow
> > better scaling between devices for open and close.  The reflck was
> > based on the hot reset granularity, device level for root bus devices
> > which cannot support hot reset or bus/slot reset otherwise.  Overlooked
> > in this were SR-IOV VFs, where there's also no bus reset option, but
> > the default for a non-root-bus, non-slot-based device is bus level
> > reflck granularity.
> > 
> > The reflck mutex has since become the dev_set mutex and is our defacto
> > serialization for various operations and ioctls.  It still seems to be
> > the case though that sets of vfio-pci devices really only need  
> 
> a nit: not sure if mentioning 2cd8b14aaa66 which convers reflck to dev_set
> mutex is helpful. Perhaps, it's welcomed by people working on backporting. :)

Sure, I'll just insert a reference here on commit: "... has since become
the dev_set mutex (via commit 2cd8b14aaa66 ("vfio/pci: Move to the
device set infrastructure")) and is our defacto..."

> > serialization relative to hot resets affecting the entire set, which
> > is not relevant to SR-IOV VFs.  As described in the Closes link below,
> > this serialization contributes to startup latency when multiple VFs
> > sharing the same "bus" are opened concurrently.
> > 
> > Mark the device itself as the basis of the dev_set for SR-IOV VFs.
> > 
> > Reported-by: Aaron Lewis <aaronlewis@google.com>
> > Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com
> > Tested-by: Aaron Lewis <aaronlewis@google.com>
> > Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release")
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >   drivers/vfio/pci/vfio_pci_core.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)  
> 
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>

Thanks,
Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-06-26 22:56 [PATCH] vfio/pci: Separate SR-IOV VF dev_set Alex Williamson
  2025-06-30  6:32 ` Tian, Kevin
  2025-06-30 13:15 ` Yi Liu
@ 2025-07-02 16:00 ` Jason Gunthorpe
  2025-07-02 17:50   ` Alex Williamson
  2 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 16:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, aaronlewis, bhelgaas, dmatlack, vipinsh, seanjc, jrhilke,
	kevin.tian

On Thu, Jun 26, 2025 at 04:56:18PM -0600, Alex Williamson wrote:
> @@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
>  		return -EBUSY;
>  	}
>  
> -	if (pci_is_root_bus(pdev->bus)) {
> +	if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
>  		ret = vfio_assign_device_set(&vdev->vdev, vdev);
>  	} else if (!pci_probe_reset_slot(pdev->slot)) {
>  		ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);

What about the logic in vfio_pci_dev_set_resettable()?

I guess in most cases vfio_pci_dev_set_resettable() == NULL already
for VFs since it would be rare that the PFs and VFs are all under
VFIO.

But it could happen and this would permanently block hot reset? Maybe
just mention it in the commit?

I guess the commit message is also trying to say that we don't use
VFIO_DEVICE_PCI_HOT_RESET if VFs are present on either the VF or
PF - and this change will block that.

All VF resets should go through VFIO_DEVICE_RESETF. If you want to
slot/bus reset the PF then you have to disable SRIOV first.

?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-02 16:00 ` Jason Gunthorpe
@ 2025-07-02 17:50   ` Alex Williamson
  2025-07-02 17:55     ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2025-07-02 17:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, aaronlewis, bhelgaas, dmatlack, vipinsh, seanjc, jrhilke,
	kevin.tian

On Wed, 2 Jul 2025 13:00:31 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Jun 26, 2025 at 04:56:18PM -0600, Alex Williamson wrote:
> > @@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> >  		return -EBUSY;
> >  	}
> >  
> > -	if (pci_is_root_bus(pdev->bus)) {
> > +	if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
> >  		ret = vfio_assign_device_set(&vdev->vdev, vdev);
> >  	} else if (!pci_probe_reset_slot(pdev->slot)) {
> >  		ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);  
> 
> What about the logic in vfio_pci_dev_set_resettable()?

IIRC, VFs are going to fail the pci_probe_reset_slot() and
pci_probe_reset_bus() tests because they don't have a pdev->slot or
pdev->bus->self respectively.

> I guess in most cases vfio_pci_dev_set_resettable() == NULL already
> for VFs since it would be rare that the PFs and VFs are all under
> VFIO.

Regardless, the VF will return NULL.

> But it could happen and this would permanently block hot reset? Maybe
> just mention it in the commit?

There is no bridge from which to initiate an SBR for a VF, there would
only be the bridge above the PF.  We require SR-IOV is disabled when
the PF is opened, so the dev_set of the PF would never have included the
VFs.  I haven't tried it, but it may be possible to trigger a hot reset
on a user owned PF while there are open VFs.  If that is possible, I
wonder if it isn't just a userspace problem though, it doesn't seem
there's anything fundamentally wrong with it from a vfio perspective.
The vf-token already indicates at the kernel level that there is
collaboration between PF and VF userspace drivers.

> I guess the commit message is also trying to say that we don't use
> VFIO_DEVICE_PCI_HOT_RESET if VFs are present on either the VF or
> PF - and this change will block that.

The hot reset ioctl has never been available to VFs, there's no bridge
from which to initiate an SBR without traversing above the PF.
 
> All VF resets should go through VFIO_DEVICE_RESETF. If you want to
> slot/bus reset the PF then you have to disable SRIOV first.

I'm not positive on the latter, but AFAIK it's always been the case
that only FLR is available on VFs and this doesn't change that.  Am I
still overlooking something that concerns you?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-02 17:50   ` Alex Williamson
@ 2025-07-02 17:55     ` Jason Gunthorpe
  2025-07-03  6:10       ` Tian, Kevin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 17:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, aaronlewis, bhelgaas, dmatlack, vipinsh, seanjc, jrhilke,
	kevin.tian

On Wed, Jul 02, 2025 at 11:50:32AM -0600, Alex Williamson wrote:
> On Wed, 2 Jul 2025 13:00:31 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Thu, Jun 26, 2025 at 04:56:18PM -0600, Alex Williamson wrote:
> > > @@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> > >  		return -EBUSY;
> > >  	}
> > >  
> > > -	if (pci_is_root_bus(pdev->bus)) {
> > > +	if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) {
> > >  		ret = vfio_assign_device_set(&vdev->vdev, vdev);
> > >  	} else if (!pci_probe_reset_slot(pdev->slot)) {
> > >  		ret = vfio_assign_device_set(&vdev->vdev, pdev->slot);  
> > 
> > What about the logic in vfio_pci_dev_set_resettable()?
> 
> IIRC, VFs are going to fail the pci_probe_reset_slot() and
> pci_probe_reset_bus() tests because they don't have a pdev->slot or
> pdev->bus->self respectively.
> 
> > I guess in most cases vfio_pci_dev_set_resettable() == NULL already
> > for VFs since it would be rare that the PFs and VFs are all under
> > VFIO.
> 
> Regardless, the VF will return NULL.

The PF could have returned non-NULL?

> > But it could happen and this would permanently block hot reset? Maybe
> > just mention it in the commit?
> 
> There is no bridge from which to initiate an SBR for a VF, there would
> only be the bridge above the PF.  We require SR-IOV is disabled when
> the PF is opened, so the dev_set of the PF would never have included the
> VFs.

I thought we can turn on SRIOV within VFIO and then attach VFIO to the
VFs which would have included the whole lot in the same dev sset?

> I haven't tried it, but it may be possible to trigger a hot reset
> on a user owned PF while there are open VFs.  If that is possible, I
> wonder if it isn't just a userspace problem though, it doesn't seem
> there's anything fundamentally wrong with it from a vfio perspective.
> The vf-token already indicates at the kernel level that there is
> collaboration between PF and VF userspace drivers.

I think it will disable SRIOV and that will leave something of a
mess. Arguably we should be blocking resets that disable SRIOV inside
vfio?

> I'm not positive on the latter, but AFAIK it's always been the case
> that only FLR is available on VFs and this doesn't change that.  Am I
> still overlooking something that concerns you?  Thanks,

No, it makes sense, it is just not entirely obvious on some of these
details.

I also did not find any locking concerns, so

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-02 17:55     ` Jason Gunthorpe
@ 2025-07-03  6:10       ` Tian, Kevin
  2025-07-03 13:23         ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Tian, Kevin @ 2025-07-03  6:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: kvm@vger.kernel.org, aaronlewis@google.com, bhelgaas@google.com,
	dmatlack@google.com, vipinsh@google.com, seanjc@google.com,
	jrhilke@google.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, July 3, 2025 1:56 AM
> 
> On Wed, Jul 02, 2025 at 11:50:32AM -0600, Alex Williamson wrote:
> > I haven't tried it, but it may be possible to trigger a hot reset
> > on a user owned PF while there are open VFs.  If that is possible, I
> > wonder if it isn't just a userspace problem though, it doesn't seem
> > there's anything fundamentally wrong with it from a vfio perspective.
> > The vf-token already indicates at the kernel level that there is
> > collaboration between PF and VF userspace drivers.
> 
> I think it will disable SRIOV and that will leave something of a
> mess. Arguably we should be blocking resets that disable SRIOV inside
> vfio?
> 

Is there any reset which doesn't disable SRIOV? According to PCIe
spec both conventional reset and FLR targeting a PF clears the
VF enable bit.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-03  6:10       ` Tian, Kevin
@ 2025-07-03 13:23         ` Jason Gunthorpe
  2025-07-03 20:29           ` Alex Williamson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 13:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, kvm@vger.kernel.org, aaronlewis@google.com,
	bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	seanjc@google.com, jrhilke@google.com

On Thu, Jul 03, 2025 at 06:10:19AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, July 3, 2025 1:56 AM
> > 
> > On Wed, Jul 02, 2025 at 11:50:32AM -0600, Alex Williamson wrote:
> > > I haven't tried it, but it may be possible to trigger a hot reset
> > > on a user owned PF while there are open VFs.  If that is possible, I
> > > wonder if it isn't just a userspace problem though, it doesn't seem
> > > there's anything fundamentally wrong with it from a vfio perspective.
> > > The vf-token already indicates at the kernel level that there is
> > > collaboration between PF and VF userspace drivers.
> > 
> > I think it will disable SRIOV and that will leave something of a
> > mess. Arguably we should be blocking resets that disable SRIOV inside
> > vfio?
> > 
> 
> Is there any reset which doesn't disable SRIOV? According to PCIe
> spec both conventional reset and FLR targeting a PF clears the
> VF enable bit.

This is my understanding, I think there might be a little hole here in
the vfio SRIOV support?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-03 13:23         ` Jason Gunthorpe
@ 2025-07-03 20:29           ` Alex Williamson
  2025-07-03 23:35             ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2025-07-03 20:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, kvm@vger.kernel.org, aaronlewis@google.com,
	bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	seanjc@google.com, jrhilke@google.com

On Thu, 3 Jul 2025 10:23:50 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Jul 03, 2025 at 06:10:19AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, July 3, 2025 1:56 AM
> > > 
> > > On Wed, Jul 02, 2025 at 11:50:32AM -0600, Alex Williamson wrote:  
> > > > I haven't tried it, but it may be possible to trigger a hot reset
> > > > on a user owned PF while there are open VFs.  If that is possible, I
> > > > wonder if it isn't just a userspace problem though, it doesn't seem
> > > > there's anything fundamentally wrong with it from a vfio perspective.
> > > > The vf-token already indicates at the kernel level that there is
> > > > collaboration between PF and VF userspace drivers.  
> > > 
> > > I think it will disable SRIOV and that will leave something of a
> > > mess. Arguably we should be blocking resets that disable SRIOV inside
> > > vfio?
> > >   
> > 
> > Is there any reset which doesn't disable SRIOV? According to PCIe
> > spec both conventional reset and FLR targeting a PF clears the
> > VF enable bit.  
> 
> This is my understanding, I think there might be a little hole here in
> the vfio SRIOV support?

I wrote a test case and we don't prevent a vfio-pci userspace driver
from resetting the PF while also having open a VF, but I'm also not
sure what problem that causes.

pci_restore_state() calls pci_restore_iov_state(), so VF Enable does get
cleared by the reset (we don't actively tear down SR-IOV before reset),
but it's restored.  VFs are not technically on a subordinate bus, so
none of those check prevent a bus reset.  I think this is why we have
the VF token, we cannot guarantee that a userspace owner of the PF
doesn't do something stupid while the VF is in use.

Also, PF->bus != VF->bus, so VFs don't get added to the PF dev_set.
The PF will do a hot reset with just the PF group fd and of course FLR
doesn't require proof of ownership of other devices.  Again, I don't
think giving each VF its own dev_set changes anything in this respect.

Should we do more here?  What problem are we solving?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-03 20:29           ` Alex Williamson
@ 2025-07-03 23:35             ` Jason Gunthorpe
  2025-07-15 18:42               ` Alex Williamson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 23:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, kvm@vger.kernel.org, aaronlewis@google.com,
	bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	seanjc@google.com, jrhilke@google.com

On Thu, Jul 03, 2025 at 02:29:04PM -0600, Alex Williamson wrote:

> > > Is there any reset which doesn't disable SRIOV? According to PCIe
> > > spec both conventional reset and FLR targeting a PF clears the
> > > VF enable bit.  
> > 
> > This is my understanding, I think there might be a little hole here in
> > the vfio SRIOV support?
> 
> I wrote a test case and we don't prevent a vfio-pci userspace driver
> from resetting the PF while also having open a VF, but I'm also not
> sure what problem that causes.
> 
> pci_restore_state() calls pci_restore_iov_state(), so VF Enable does get
> cleared by the reset (we don't actively tear down SR-IOV before reset),
> but it's restored. 

Oh interesting, I did not know that happened. Makes sense.

> Also, PF->bus != VF->bus, 

Unrelated, but I've been looking at this and I haven't tried it yet,
but it looked to me like:

	bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
 [.. inside virtfn_add_bus ]
	child = pci_find_bus(pci_domain_nr(bus), busnr);
	if (child)
		return child;

Will re-use the bus of the PF if they happen to have the same bus
numbers. I thought the virtual busses come up if the VF RID calculation:

	return dev->bus->number + ((dev->devfn + dev->sriov->offset +
				    dev->sriov->stride * vf_id) >> 8);

Exceeds the primary bus?

> so VFs don't get added to the PF dev_set.
> The PF will do a hot reset with just the PF group fd and of course FLR
> doesn't require proof of ownership of other devices.  

If the semantic of the hot reset check was to require userspace to
prove ownership of everything that gets reset it does seem like any
reset on a PF should also have ownership of the VFs. I think the UUID
is probably good enough for this though?

So that seems OK.

> Should we do more here?  What problem are we solving?  Thanks,

I think it is OK. I was originally worried the SRIOV would not come
back, but since it does that part seems fine.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-03 23:35             ` Jason Gunthorpe
@ 2025-07-15 18:42               ` Alex Williamson
  2025-07-15 18:53                 ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2025-07-15 18:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, kvm@vger.kernel.org, aaronlewis@google.com,
	bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	seanjc@google.com, jrhilke@google.com

On Thu, 3 Jul 2025 20:35:33 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Jul 03, 2025 at 02:29:04PM -0600, Alex Williamson wrote:
> 
> > > > Is there any reset which doesn't disable SRIOV? According to PCIe
> > > > spec both conventional reset and FLR targeting a PF clears the
> > > > VF enable bit.    
> > > 
> > > This is my understanding, I think there might be a little hole here in
> > > the vfio SRIOV support?  
> > 
> > I wrote a test case and we don't prevent a vfio-pci userspace driver
> > from resetting the PF while also having open a VF, but I'm also not
> > sure what problem that causes.
> > 
> > pci_restore_state() calls pci_restore_iov_state(), so VF Enable does get
> > cleared by the reset (we don't actively tear down SR-IOV before reset),
> > but it's restored.   
> 
> Oh interesting, I did not know that happened. Makes sense.
> 
> > Also, PF->bus != VF->bus,   
> 
> Unrelated, but I've been looking at this and I haven't tried it yet,
> but it looked to me like:
> 
> 	bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
>  [.. inside virtfn_add_bus ]
> 	child = pci_find_bus(pci_domain_nr(bus), busnr);
> 	if (child)
> 		return child;
> 
> Will re-use the bus of the PF if they happen to have the same bus
> numbers. I thought the virtual busses come up if the VF RID calculation:
> 
> 	return dev->bus->number + ((dev->devfn + dev->sriov->offset +
> 				    dev->sriov->stride * vf_id) >> 8);
> 
> Exceeds the primary bus?

I tried it, I've got 82576 NICs in both a system with and without ARI
hierarchy.  Without:

 VF offset: 384, stride: 2

With:

 VF offset: 128, stride: 2

So the former places VFs on the N+1 bus (new struct pci_bus) from the PF
while the latter use the same bus number and struct.  Therefore my
previous inequality is not necessarily correct, the VF and PF could use
the same struct pci_bus.

In the ARI case, I believe you were right that the PF and VFs would have
then been sharing a dev_set.

So that does seem to be a visible difference as a result of this
change, in an ARI hierarchy, it would have previously been necessary to
supply the VF group FDs as proof of ownership of affected devices for a
hot-reset of the PF.  A non-ARI hierarchy would not have required that.
With this, neither require that.

Technically the VFs are affected by a PF bus reset, but unlike other
devices the VFs are also potentially affected by lots of things that
might happen to the VF and that's why we have the VF-token concept.  So
I kind of have a hard time getting bent out of shape by this.

I went ahead and added this to my next branch because I think your
impression is that this is generally ok based on the vf-token, but if
this raises new concerns I can drop it and we can discuss further.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
  2025-07-15 18:42               ` Alex Williamson
@ 2025-07-15 18:53                 ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2025-07-15 18:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, kvm@vger.kernel.org, aaronlewis@google.com,
	bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com,
	seanjc@google.com, jrhilke@google.com

On Tue, Jul 15, 2025 at 12:42:23PM -0600, Alex Williamson wrote:

> I went ahead and added this to my next branch because I think your
> impression is that this is generally ok based on the vf-token, but if
> this raises new concerns I can drop it and we can discuss further.

Yeah, I think vf token will be good enough

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-07-15 18:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 22:56 [PATCH] vfio/pci: Separate SR-IOV VF dev_set Alex Williamson
2025-06-30  6:32 ` Tian, Kevin
2025-06-30 13:15 ` Yi Liu
2025-06-30 14:57   ` Alex Williamson
2025-07-02 16:00 ` Jason Gunthorpe
2025-07-02 17:50   ` Alex Williamson
2025-07-02 17:55     ` Jason Gunthorpe
2025-07-03  6:10       ` Tian, Kevin
2025-07-03 13:23         ` Jason Gunthorpe
2025-07-03 20:29           ` Alex Williamson
2025-07-03 23:35             ` Jason Gunthorpe
2025-07-15 18:42               ` Alex Williamson
2025-07-15 18:53                 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).