From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"aaronlewis@google.com" <aaronlewis@google.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"dmatlack@google.com" <dmatlack@google.com>,
"vipinsh@google.com" <vipinsh@google.com>,
"seanjc@google.com" <seanjc@google.com>,
"jrhilke@google.com" <jrhilke@google.com>
Subject: Re: [PATCH] vfio/pci: Separate SR-IOV VF dev_set
Date: Tue, 15 Jul 2025 12:42:23 -0600 [thread overview]
Message-ID: <20250715124223.67a36d2a.alex.williamson@redhat.com> (raw)
In-Reply-To: <20250703233533.GI1209783@nvidia.com>
On Thu, 3 Jul 2025 20:35:33 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Thu, Jul 03, 2025 at 02:29:04PM -0600, Alex Williamson wrote:
>
> > > > Is there any reset which doesn't disable SRIOV? According to PCIe
> > > > spec both conventional reset and FLR targeting a PF clears the
> > > > VF enable bit.
> > >
> > > This is my understanding, I think there might be a little hole here in
> > > the vfio SRIOV support?
> >
> > I wrote a test case and we don't prevent a vfio-pci userspace driver
> > from resetting the PF while also having open a VF, but I'm also not
> > sure what problem that causes.
> >
> > pci_restore_state() calls pci_restore_iov_state(), so VF Enable does get
> > cleared by the reset (we don't actively tear down SR-IOV before reset),
> > but it's restored.
>
> Oh interesting, I did not know that happened. Makes sense.
>
> > Also, PF->bus != VF->bus,
>
> Unrelated, but I've been looking at this and I haven't tried it yet,
> but it looked to me like:
>
> bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
> [.. inside virtfn_add_bus ]
> child = pci_find_bus(pci_domain_nr(bus), busnr);
> if (child)
> return child;
>
> Will re-use the bus of the PF if they happen to have the same bus
> numbers. I thought the virtual busses come up if the VF RID calculation:
>
> return dev->bus->number + ((dev->devfn + dev->sriov->offset +
> dev->sriov->stride * vf_id) >> 8);
>
> Exceeds the primary bus?
I tried it, I've got 82576 NICs in both a system with and without ARI
hierarchy. Without:
VF offset: 384, stride: 2
With:
VF offset: 128, stride: 2
So the former places VFs on the N+1 bus (new struct pci_bus) from the PF
while the latter use the same bus number and struct. Therefore my
previous inequality is not necessarily correct, the VF and PF could use
the same struct pci_bus.
In the ARI case, I believe you were right that the PF and VFs would have
then been sharing a dev_set.
So that does seem to be a visible difference as a result of this
change, in an ARI hierarchy, it would have previously been necessary to
supply the VF group FDs as proof of ownership of affected devices for a
hot-reset of the PF. A non-ARI hierarchy would not have required that.
With this, neither require that.
Technically the VFs are affected by a PF bus reset, but unlike other
devices the VFs are also potentially affected by lots of things that
might happen to the VF and that's why we have the VF-token concept. So
I kind of have a hard time getting bent out of shape by this.
I went ahead and added this to my next branch because I think your
impression is that this is generally ok based on the vf-token, but if
this raises new concerns I can drop it and we can discuss further.
Thanks,
Alex
next prev parent reply other threads:[~2025-07-15 18:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-26 22:56 [PATCH] vfio/pci: Separate SR-IOV VF dev_set Alex Williamson
2025-06-30 6:32 ` Tian, Kevin
2025-06-30 13:15 ` Yi Liu
2025-06-30 14:57 ` Alex Williamson
2025-07-02 16:00 ` Jason Gunthorpe
2025-07-02 17:50 ` Alex Williamson
2025-07-02 17:55 ` Jason Gunthorpe
2025-07-03 6:10 ` Tian, Kevin
2025-07-03 13:23 ` Jason Gunthorpe
2025-07-03 20:29 ` Alex Williamson
2025-07-03 23:35 ` Jason Gunthorpe
2025-07-15 18:42 ` Alex Williamson [this message]
2025-07-15 18:53 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250715124223.67a36d2a.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=aaronlewis@google.com \
--cc=bhelgaas@google.com \
--cc=dmatlack@google.com \
--cc=jgg@nvidia.com \
--cc=jrhilke@google.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
--cc=vipinsh@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).