* [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core @ 2025-10-02 2:00 John Hubbard 2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard 2025-10-02 2:00 ` [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported John Hubbard 0 siblings, 2 replies; 32+ messages in thread From: John Hubbard @ 2025-10-02 2:00 UTC (permalink / raw) To: Danilo Krummrich Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Jason Gunthorpe, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML, John Hubbard Changes in v2: 1) Completely rewrote the series, using Danilo's suggested approach of "driver supports VF probing or not": the PCI driver core will skip probing any device's VFs if the driver has coded itself as "not supporting VFs". 2) Included Joel's clarification, as a comment in the code, about what happens if VFs are probed. As with v1 [1], I've based this on top of today's driver-core-next [2], because the first patch belongs there, and the second patch applies cleanly to either driver-core-next or drm-rust-next. So this seems like the easiest to work with. [1]] https://lore.kernel.org/20250930220759.288528-1-jhubbard@nvidia.com [2] https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/ Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Joel Fernandes <joelagnelf@nvidia.com> Cc: Zhi Wang <zhiw@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Danilo Krummrich <dakr@kernel.org> John Hubbard (2): rust: pci: skip probing VFs if driver doesn't support VFs gpu: nova-core: declare that VFs are not (yet) supported drivers/gpu/nova-core/driver.rs | 5 +++++ drivers/pci/pci-driver.c | 3 +++ include/linux/pci.h | 1 + rust/kernel/pci.rs | 4 ++++ 4 files changed, 13 insertions(+) base-commit: 6d97171ac6585de698df019b0bfea3f123fd8385 -- 2.51.0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 2:00 [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core John Hubbard @ 2025-10-02 2:00 ` John Hubbard 2025-10-02 11:49 ` Danilo Krummrich 2025-10-02 12:11 ` Jason Gunthorpe 2025-10-02 2:00 ` [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported John Hubbard 1 sibling, 2 replies; 32+ messages in thread From: John Hubbard @ 2025-10-02 2:00 UTC (permalink / raw) To: Danilo Krummrich Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Jason Gunthorpe, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML, John Hubbard Add a "supports_vf" flag to struct pci_driver to let drivers declare Virtual Function (VF) support. If a driver does not support VFs, then the PCI driver core will not probe() any VFs for that driver's devices. On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to false: drivers must explicitly opt into VF support. Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Joel Fernandes <joelagnelf@nvidia.com> Cc: Zhi Wang <zhiw@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- drivers/pci/pci-driver.c | 3 +++ include/linux/pci.h | 1 + rust/kernel/pci.rs | 4 ++++ 3 files changed, 8 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 63665240ae87..588666cc7254 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -412,6 +412,9 @@ static int __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev) if (drv->probe) { error = -ENODEV; + if (pci_dev->is_virtfn && !drv->supports_vf) + return error; + id = pci_match_device(drv, pci_dev); if (id) error = pci_call_probe(drv, pci_dev, id); diff --git a/include/linux/pci.h b/include/linux/pci.h index 59876de13860..92510886a086 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -983,6 +983,7 @@ struct pci_driver { struct device_driver driver; struct pci_dynids dynids; bool driver_managed_dma; + bool supports_vf; /* Will bind to Virtual Functions */ }; #define to_pci_driver(__drv) \ diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs index 7fcc5f6022c1..c5d036770e65 100644 --- a/rust/kernel/pci.rs +++ b/rust/kernel/pci.rs @@ -47,6 +47,7 @@ unsafe fn register( (*pdrv.get()).probe = Some(Self::probe_callback); (*pdrv.get()).remove = Some(Self::remove_callback); (*pdrv.get()).id_table = T::ID_TABLE.as_ptr(); + (*pdrv.get()).supports_vf = T::SUPPORTS_VF; } // SAFETY: `pdrv` is guaranteed to be a valid `RegType`. @@ -268,6 +269,9 @@ pub trait Driver: Send { /// The table of device ids supported by the driver. const ID_TABLE: IdTable<Self::IdInfo>; + /// Whether the driver supports Virtual Functions. + const SUPPORTS_VF: bool = false; + /// PCI driver probe. /// /// Called when a new pci device is added or discovered. Implementers should -- 2.51.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard @ 2025-10-02 11:49 ` Danilo Krummrich 2025-10-02 12:11 ` Jason Gunthorpe 1 sibling, 0 replies; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 11:49 UTC (permalink / raw) To: John Hubbard Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Jason Gunthorpe, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 4:00 AM CEST, John Hubbard wrote: > Add a "supports_vf" flag to struct pci_driver to let drivers declare > Virtual Function (VF) support. If a driver does not support VFs, then > the PCI driver core will not probe() any VFs for that driver's devices. > > On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to > false: drivers must explicitly opt into VF support. > > Cc: Alexandre Courbot <acourbot@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Joel Fernandes <joelagnelf@nvidia.com> > Cc: Zhi Wang <zhiw@nvidia.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Jason Gunthorpe <jgg@nvidia.com> > Suggested-by: Danilo Krummrich <dakr@kernel.org> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > drivers/pci/pci-driver.c | 3 +++ > include/linux/pci.h | 1 + > rust/kernel/pci.rs | 4 ++++ > 3 files changed, 8 insertions(+) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 63665240ae87..588666cc7254 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -412,6 +412,9 @@ static int __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev) > if (drv->probe) { > error = -ENODEV; > > + if (pci_dev->is_virtfn && !drv->supports_vf) > + return error; > + > id = pci_match_device(drv, pci_dev); > if (id) > error = pci_call_probe(drv, pci_dev, id); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 59876de13860..92510886a086 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -983,6 +983,7 @@ struct pci_driver { > struct device_driver driver; > struct pci_dynids dynids; > bool driver_managed_dma; > + bool supports_vf; /* Will bind to Virtual Functions */ I don't see any driver changes in this patch, are we sure this doesn't break any existing drivers given that this defaults to false? Obviously, the safe call would be to invert the logic, such that it defaults to VFs being supported, though I clearly do prefer the opt-in. Also, in C this always defaults to false, whereas in Rust we have the choice to make it true by default, hence in C we'd need to invert the semantics, which is not desirable either. > }; > > #define to_pci_driver(__drv) \ > diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs > index 7fcc5f6022c1..c5d036770e65 100644 > --- a/rust/kernel/pci.rs > +++ b/rust/kernel/pci.rs > @@ -47,6 +47,7 @@ unsafe fn register( > (*pdrv.get()).probe = Some(Self::probe_callback); > (*pdrv.get()).remove = Some(Self::remove_callback); > (*pdrv.get()).id_table = T::ID_TABLE.as_ptr(); > + (*pdrv.get()).supports_vf = T::SUPPORTS_VF; > } > > // SAFETY: `pdrv` is guaranteed to be a valid `RegType`. > @@ -268,6 +269,9 @@ pub trait Driver: Send { > /// The table of device ids supported by the driver. > const ID_TABLE: IdTable<Self::IdInfo>; > > + /// Whether the driver supports Virtual Functions. > + const SUPPORTS_VF: bool = false; > + > /// PCI driver probe. > /// > /// Called when a new pci device is added or discovered. Implementers should > -- > 2.51.0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard 2025-10-02 11:49 ` Danilo Krummrich @ 2025-10-02 12:11 ` Jason Gunthorpe 2025-10-02 12:18 ` Danilo Krummrich 1 sibling, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 12:11 UTC (permalink / raw) To: John Hubbard Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote: > Add a "supports_vf" flag to struct pci_driver to let drivers declare > Virtual Function (VF) support. If a driver does not support VFs, then > the PCI driver core will not probe() any VFs for that driver's devices. > > On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to > false: drivers must explicitly opt into VF support. As I said in the other thread - please no. Linux drivers are expected to run on their VFs. This temporary weirdness of novacore should not be elevated to a core behavior that people will misuse. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 12:11 ` Jason Gunthorpe @ 2025-10-02 12:18 ` Danilo Krummrich 2025-10-02 12:39 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 12:18 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote: > On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote: >> Add a "supports_vf" flag to struct pci_driver to let drivers declare >> Virtual Function (VF) support. If a driver does not support VFs, then >> the PCI driver core will not probe() any VFs for that driver's devices. >> >> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to >> false: drivers must explicitly opt into VF support. > > As I said in the other thread - please no. > > Linux drivers are expected to run on their VFs. The consequence would be that drivers for HW that can export VFs would need to be rejected upstream if they only support the PF, but no VFs. IMHO, that's an unreasonable requirement. Don't get me wrong, I agree that it'd be great if all drivers would (be able to) support the corresponding VFs. But in practice that's not always the case, so I'd rather have a common way to let drivers assert what they support over going back to the "old model" of probing drivers, where we just rely on probe() to fail. > This temporary > weirdness of novacore should not be elevated to a core behavior that > people will misuse. It's not just nova-core, please see [1]. [1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF@kernel.org/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 12:18 ` Danilo Krummrich @ 2025-10-02 12:39 ` Jason Gunthorpe 2025-10-02 13:03 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 12:39 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 02:18:36PM +0200, Danilo Krummrich wrote: > On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote: > > On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote: > >> Add a "supports_vf" flag to struct pci_driver to let drivers declare > >> Virtual Function (VF) support. If a driver does not support VFs, then > >> the PCI driver core will not probe() any VFs for that driver's devices. > >> > >> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to > >> false: drivers must explicitly opt into VF support. > > > > As I said in the other thread - please no. > > > > Linux drivers are expected to run on their VFs. > > The consequence would be that drivers for HW that can export VFs would need to > be rejected upstream if they only support the PF, but no VFs. IMHO, that's an > unreasonable requirement. Not rejected, they just need to open code a simple isvf check and fail during probe if they really have a (hopefully temporary) problem. This not really a realistic case. Linux running in the VM *should* have drivers that operate the VF, and those existing drivers *should* work in the PF context. Drivers that work in VM but not in a host should not be encouraged!! AFAICT this is even true for novacore, the driver should "work" but the VF won't be provisioned today so it should fail startup in some way. eg "no vram" or something like that. > > This temporary > > weirdness of novacore should not be elevated to a core behavior that > > people will misuse. > > It's not just nova-core, please see [1]. > > [1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF@kernel.org/ I responded there, I don't think the reasons those were added to ICE and then cargo-culted are very good, not good enough to justify adding it to the core code. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 12:39 ` Jason Gunthorpe @ 2025-10-02 13:03 ` Danilo Krummrich 2025-10-02 13:56 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 13:03 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 2:39 PM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 02:18:36PM +0200, Danilo Krummrich wrote: >> On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote: >> > On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote: >> >> Add a "supports_vf" flag to struct pci_driver to let drivers declare >> >> Virtual Function (VF) support. If a driver does not support VFs, then >> >> the PCI driver core will not probe() any VFs for that driver's devices. >> >> >> >> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to >> >> false: drivers must explicitly opt into VF support. >> > >> > As I said in the other thread - please no. >> > >> > Linux drivers are expected to run on their VFs. >> >> The consequence would be that drivers for HW that can export VFs would need to >> be rejected upstream if they only support the PF, but no VFs. IMHO, that's an >> unreasonable requirement. > > Not rejected, they just need to open code a simple isvf check and fail > during probe if they really have a (hopefully temporary) problem. The question is whether it is due to a (temporary) problem, or if it is by design. I think it's not unreasonable to have a driver for the PF and a separate driver for the VFs if they are different enough; the drivers can still share common code of course. Surely, you can argue that if they have different enough requirements they should have different device IDs, but "different enough requirements" is pretty vague and it's not under our control either. > This not really a realistic case. Linux running in the VM *should* > have drivers that operate the VF, and those existing drivers *should* > work in the PF context. > > Drivers that work in VM but not in a host should not be encouraged!! I agree, we should indeed encourage HW manufacturers to design the HW in a way that a single driver works in both cases, i.e. less less code to maintain, less surface for bugs, etc., if that is what you mean. But, if there is another solution for VFs already, e.g. in the case of nova-core vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned I prefer opt-in, but you convinced me that it should rather be opt-out.) > AFAICT this is even true for novacore, the driver should "work" but > the VF won't be provisioned today so it should fail startup in some > way. eg "no vram" or something like that. > >> > This temporary >> > weirdness of novacore should not be elevated to a core behavior that >> > people will misuse. >> >> It's not just nova-core, please see [1]. >> >> [1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF@kernel.org/ > > I responded there, I don't think the reasons those were added to ICE > and then cargo-culted are very good, not good enough to justify adding > it to the core code. Indeed, the justification of ICE is clearly wrong. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 13:03 ` Danilo Krummrich @ 2025-10-02 13:56 ` Jason Gunthorpe 2025-10-02 15:11 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 13:56 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 03:03:38PM +0200, Danilo Krummrich wrote: > I think it's not unreasonable to have a driver for the PF and a separate driver > for the VFs if they are different enough; the drivers can still share common > code of course. This isn't feasible without different PCI IDs. ICE does this for example where they have two totally different drivers, and of course two different PCI IDs. > Surely, you can argue that if they have different enough requirements they > should have different device IDs, but "different enough requirements" is pretty > vague and it's not under our control either. If you want two drivers in Linux you need two PCI IDs. We can't reliably select different drivers based on VFness because VFness is wiped out during virtualization. > But, if there is another solution for VFs already, e.g. in the case of nova-core > vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned > I prefer opt-in, but you convinced me that it should rather be > opt-out.) I think nova-core has a temporary (OOT even!) issue that should be resolved - that doesn't justify adding core kernel infrastructure that will encourage more drivers to go away from our kernel design goals of drivers working equally in host and VM. Nova core should work in that it probes, detects an unprovisioned VF and then fails probe. This edge case could perhaps arise in a VM anyhow and needs to be handled anyhow. Also, this is all pointless until nova-core gets an sriov_configure callback - you can't even turn on SRIOV without that! Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 13:56 ` Jason Gunthorpe @ 2025-10-02 15:11 ` Danilo Krummrich 2025-10-02 15:23 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 15:11 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 3:56 PM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 03:03:38PM +0200, Danilo Krummrich wrote: > >> I think it's not unreasonable to have a driver for the PF and a separate driver >> for the VFs if they are different enough; the drivers can still share common >> code of course. > > This isn't feasible without different PCI IDs. At least on the host you can obviously differentiate them. >> Surely, you can argue that if they have different enough requirements they >> should have different device IDs, but "different enough requirements" is pretty >> vague and it's not under our control either. > > If you want two drivers in Linux you need two PCI IDs. > > We can't reliably select different drivers based on VFness because > VFness is wiped out during virtualization. Sure, but I thought the whole point is that some VFs are not given directly to the VM, but have some kind of intermediate layer, such as vGPU. I.e. some kind of hybrid approach between full pass-through and mediated devices? >> But, if there is another solution for VFs already, e.g. in the case of nova-core >> vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned >> I prefer opt-in, but you convinced me that it should rather be >> opt-out.) > > I think nova-core has a temporary (OOT even!) issue that should be > resolved - that doesn't justify adding core kernel infrastructure that > will encourage more drivers to go away from our kernel design goals of > drivers working equally in host and VM. My understanding is that vGPU will ensure that the device exposed to the VM will be set up to be (at least mostly) compatible with nova-core's PF code paths? So, there is a semantical difference between vGPU and nova-core that makes a differentiation between VF and PF meaningful and justified. But maybe this understanding is not correct. If so, please educate me. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 15:11 ` Danilo Krummrich @ 2025-10-02 15:23 ` Jason Gunthorpe 2025-10-02 16:05 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 15:23 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 05:11:01PM +0200, Danilo Krummrich wrote: > On Thu Oct 2, 2025 at 3:56 PM CEST, Jason Gunthorpe wrote: > >> Surely, you can argue that if they have different enough requirements they > >> should have different device IDs, but "different enough requirements" is pretty > >> vague and it's not under our control either. > > > > If you want two drivers in Linux you need two PCI IDs. > > > > We can't reliably select different drivers based on VFness because > > VFness is wiped out during virtualization. > > Sure, but I thought the whole point is that some VFs are not given directly to > the VM, but have some kind of intermediate layer, such as vGPU. I.e. some kind > of hybrid approach between full pass-through and mediated devices? We'd call the intermediate layer "mediation", and it is abnormal to have something significant there because it impacts performance and increases the hypervisor attack surface. Usually you just give the whole VF mostly as-is. The ideal vfio driver is the common one because the raw VF doesn't need any meddling. > >> But, if there is another solution for VFs already, e.g. in the case of nova-core > >> vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned > >> I prefer opt-in, but you convinced me that it should rather be > >> opt-out.) > > > > I think nova-core has a temporary (OOT even!) issue that should be > > resolved - that doesn't justify adding core kernel infrastructure that > > will encourage more drivers to go away from our kernel design goals of > > drivers working equally in host and VM. > > My understanding is that vGPU will ensure that the device exposed to the VM will > be set up to be (at least mostly) compatible with nova-core's PF code paths? This is not what I've been told, the VF driver has significant programming model differences in the NVIDIA model, and supports different commands. If you look at the VFIO driver RFC it basically does no mediation, it isn't intercepting MMIO - the guest sees the BARs directly. Most of the code is "profiling" from what I can tell. Some config space meddling. I think Zhi could make it more clear what mediation the actual VFIO part of the driver is even doing and why.. Keep in mind for NVIDIA vGPU this has changed alot over the years - I am talking here about the SRIOV driver proposed recently that is specifically designed to substantially remove the hypervisor mediation. > So, there is a semantical difference between vGPU and nova-core that makes a > differentiation between VF and PF meaningful and justified. Aa was said earlier, there is a register in the MMIO that identifies if the vGPU programming model is used, that is what nova core should key on to determine how to act. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 15:23 ` Jason Gunthorpe @ 2025-10-02 16:05 ` Danilo Krummrich 2025-10-02 17:05 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 16:05 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote: > This is not what I've been told, the VF driver has significant > programming model differences in the NVIDIA model, and supports > different commands. Ok, that means there are some more fundamental differences between the host PF and the "VM PF" code that we have to deal with. But that doesn't necessarily require that the VF parts of the host have to be in nova-core as well, i.e. with the information we have we can differentiate between PF, VF and PF in the VM (indicated by a device register). > If you look at the VFIO driver RFC it basically does no mediation, it > isn't intercepting MMIO - the guest sees the BARs directly. Most of > the code is "profiling" from what I can tell. Some config space > meddling. Sure, there is no mediation in that sense, but it needs quite some setup regardless, no? I thought there is a significant amount of semantics that is different between booting the PF and the VF on the host. Also, the idea was to use a layered approach, i.e. let nova-core serve as an abstraction layer, where the DRM and VFIO parts can be layered on top of. Are you suggesting to merge vGPU into nova-core? (The VF specific firmware interfaces, should be abstracted in nova-core, so, technically, we will have some vGPU specific code in nova-core.) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 16:05 ` Danilo Krummrich @ 2025-10-02 17:05 ` Jason Gunthorpe 2025-10-02 17:37 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 17:05 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote: > On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote: > > This is not what I've been told, the VF driver has significant > > programming model differences in the NVIDIA model, and supports > > different commands. > > Ok, that means there are some more fundamental differences between the host PF > and the "VM PF" code that we have to deal with. That was my understanding. > But that doesn't necessarily require that the VF parts of the host have to be in > nova-core as well, i.e. with the information we have we can differentiate > between PF, VF and PF in the VM (indicated by a device register). I'm not entirely sure what you mean by this.. The driver to operate the function in "vGPU" mode as indicated by the register has to be in nova-core, since there is only one device ID. > > If you look at the VFIO driver RFC it basically does no mediation, it > > isn't intercepting MMIO - the guest sees the BARs directly. Most of > > the code is "profiling" from what I can tell. Some config space > > meddling. > > Sure, there is no mediation in that sense, but it needs quite some setup > regardless, no? > > I thought there is a significant amount of semantics that is different between > booting the PF and the VF on the host. I think it would be good to have Zhi clarify more of this, but from what I understand are at least three activites comingled all together: 1) Boot the PF in "vGPU" mode so it can enable SRIOV 2) Enable SRIOV and profile VFs to allocate HW resources to them 3) VFIO variant driver to convert the VF into a "VM PF" with whatever mediation and enhancement needed From a broad perspective we in the kernel have put #2 outside VFIO because all of that is actually run through the PF and doesn't use the VF at all. #3 is the vfio driver and I would like it if vfio drivers restrained themselves to focus on the mediation, live migration and things like that which are directly related to VFIO.. > Also, the idea was to use a layered approach, i.e. let nova-core > serve as an abstraction layer, where the DRM and VFIO parts can be > layered on top of. Yes, I think everyone is good with some version of this.. A big question in my mind is where do you put #2, and what uapi does it provide. It has to layer on top of nova-core because it has to use the PF to do profiling. I'm not a fan of the vfio based sysfs as uAPI for #2, for reasons touched on in this thread. NIC drivers are using fwctl and devlink for profiling, managed by the PF driver. I think I'd want to here reasons why those interfaces cannot be used here. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 17:05 ` Jason Gunthorpe @ 2025-10-02 17:37 ` Danilo Krummrich 2025-10-02 17:40 ` Danilo Krummrich 2025-10-02 17:56 ` Jason Gunthorpe 0 siblings, 2 replies; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 17:37 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 7:05 PM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote: >> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote: >> > This is not what I've been told, the VF driver has significant >> > programming model differences in the NVIDIA model, and supports >> > different commands. >> >> Ok, that means there are some more fundamental differences between the host PF >> and the "VM PF" code that we have to deal with. > > That was my understanding. > >> But that doesn't necessarily require that the VF parts of the host have to be in >> nova-core as well, i.e. with the information we have we can differentiate >> between PF, VF and PF in the VM (indicated by a device register). > > I'm not entirely sure what you mean by this.. > > The driver to operate the function in "vGPU" mode as indicated by the > register has to be in nova-core, since there is only one device ID. Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM have to be that same. But the VF driver on the host can still be a seaparate one. >> > If you look at the VFIO driver RFC it basically does no mediation, it >> > isn't intercepting MMIO - the guest sees the BARs directly. Most of >> > the code is "profiling" from what I can tell. Some config space >> > meddling. >> >> Sure, there is no mediation in that sense, but it needs quite some setup >> regardless, no? >> >> I thought there is a significant amount of semantics that is different between >> booting the PF and the VF on the host. > > I think it would be good to have Zhi clarify more of this, but from > what I understand are at least three activites comingled all together: > > 1) Boot the PF in "vGPU" mode so it can enable SRIOV Ok, this might be where the confusion above comes from. When I talk about nova-core in vGPU mode I mean nova-core running in the VM on the (from VM perspective) PF. But you seem to mean nova-core running on the host PF with vGPU on top? That of course has to be in nova-core. > 2) Enable SRIOV and profile VFs to allocate HW resources to them I think that's partially in nova-core and partially in vGPU; nova-core providing the abstraction of the corresponding firmware / hardware interfaces and vGPU controlling the semantics of the resource handling? This is what I thought vGPU has a secondary part for where it binds to nova-core through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the VFIO parts and a "per VF resource controller". > 3) VFIO variant driver to convert the VF into a "VM PF" with whatever > mediation and enhancement needed That should be vGPU only land. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 17:37 ` Danilo Krummrich @ 2025-10-02 17:40 ` Danilo Krummrich 2025-10-02 17:49 ` John Hubbard 2025-10-02 17:56 ` Jason Gunthorpe 1 sibling, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 17:40 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 7:37 PM CEST, Danilo Krummrich wrote: > On Thu Oct 2, 2025 at 7:05 PM CEST, Jason Gunthorpe wrote: >> On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote: >>> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote: >>> > This is not what I've been told, the VF driver has significant >>> > programming model differences in the NVIDIA model, and supports >>> > different commands. >>> >>> Ok, that means there are some more fundamental differences between the host PF >>> and the "VM PF" code that we have to deal with. >> >> That was my understanding. >> >>> But that doesn't necessarily require that the VF parts of the host have to be in >>> nova-core as well, i.e. with the information we have we can differentiate >>> between PF, VF and PF in the VM (indicated by a device register). >> >> I'm not entirely sure what you mean by this.. >> >> The driver to operate the function in "vGPU" mode as indicated by the >> register has to be in nova-core, since there is only one device ID. > > Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM > have to be that same. But the VF driver on the host can still be a seaparate > one. > >>> > If you look at the VFIO driver RFC it basically does no mediation, it >>> > isn't intercepting MMIO - the guest sees the BARs directly. Most of >>> > the code is "profiling" from what I can tell. Some config space >>> > meddling. >>> >>> Sure, there is no mediation in that sense, but it needs quite some setup >>> regardless, no? >>> >>> I thought there is a significant amount of semantics that is different between >>> booting the PF and the VF on the host. >> >> I think it would be good to have Zhi clarify more of this, but from >> what I understand are at least three activites comingled all together: >> >> 1) Boot the PF in "vGPU" mode so it can enable SRIOV > > Ok, this might be where the confusion above comes from. When I talk about > nova-core in vGPU mode I mean nova-core running in the VM on the (from VM > perspective) PF. > > But you seem to mean nova-core running on the host PF with vGPU on top? That of > course has to be in nova-core. > >> 2) Enable SRIOV and profile VFs to allocate HW resources to them > > I think that's partially in nova-core and partially in vGPU; nova-core providing > the abstraction of the corresponding firmware / hardware interfaces and vGPU > controlling the semantics of the resource handling? > > This is what I thought vGPU has a secondary part for where it binds to nova-core > through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the > VFIO parts and a "per VF resource controller". Forgot to add: But I think Zhi explained that this is not necessary and can be controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. >> 3) VFIO variant driver to convert the VF into a "VM PF" with whatever >> mediation and enhancement needed > > That should be vGPU only land. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 17:40 ` Danilo Krummrich @ 2025-10-02 17:49 ` John Hubbard 2025-10-02 18:05 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: John Hubbard @ 2025-10-02 17:49 UTC (permalink / raw) To: Danilo Krummrich, Jason Gunthorpe Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On 10/2/25 10:40 AM, Danilo Krummrich wrote: > On Thu Oct 2, 2025 at 7:37 PM CEST, Danilo Krummrich wrote: >> On Thu Oct 2, 2025 at 7:05 PM CEST, Jason Gunthorpe wrote: >>> On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote: >>>> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote: >>>>> This is not what I've been told, the VF driver has significant >>>>> programming model differences in the NVIDIA model, and supports >>>>> different commands. >>>> >>>> Ok, that means there are some more fundamental differences between the host PF >>>> and the "VM PF" code that we have to deal with. >>> >>> That was my understanding. >>> >>>> But that doesn't necessarily require that the VF parts of the host have to be in >>>> nova-core as well, i.e. with the information we have we can differentiate >>>> between PF, VF and PF in the VM (indicated by a device register). >>> >>> I'm not entirely sure what you mean by this.. >>> >>> The driver to operate the function in "vGPU" mode as indicated by the >>> register has to be in nova-core, since there is only one device ID. >> >> Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM >> have to be that same. But the VF driver on the host can still be a seaparate >> one. >> >>>>> If you look at the VFIO driver RFC it basically does no mediation, it >>>>> isn't intercepting MMIO - the guest sees the BARs directly. Most of >>>>> the code is "profiling" from what I can tell. Some config space >>>>> meddling. >>>> >>>> Sure, there is no mediation in that sense, but it needs quite some setup >>>> regardless, no? >>>> >>>> I thought there is a significant amount of semantics that is different between >>>> booting the PF and the VF on the host. >>> >>> I think it would be good to have Zhi clarify more of this, but from >>> what I understand are at least three activites comingled all together: >>> >>> 1) Boot the PF in "vGPU" mode so it can enable SRIOV >> For this, we could pass a kernel module parameter to nova-core. >> Ok, this might be where the confusion above comes from. When I talk about >> nova-core in vGPU mode I mean nova-core running in the VM on the (from VM >> perspective) PF. >> >> But you seem to mean nova-core running on the host PF with vGPU on top? That of >> course has to be in nova-core. >> >>> 2) Enable SRIOV and profile VFs to allocate HW resources to them >> >> I think that's partially in nova-core and partially in vGPU; nova-core providing >> the abstraction of the corresponding firmware / hardware interfaces and vGPU >> controlling the semantics of the resource handling? >> >> This is what I thought vGPU has a secondary part for where it binds to nova-core >> through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the >> VFIO parts and a "per VF resource controller". > > Forgot to add: But I think Zhi explained that this is not necessary and can be > controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. Yes, this is the direction that I originally (3 whole days ago, haha) had in mind, after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver handles the VFs, and use the "is virtual" logic to sort them out. Looking forward to Zhi's reaction to the other approach that you and Jason have been debating. This is all very educational to me, as a VFIO newbie. :) > >>> 3) VFIO variant driver to convert the VF into a "VM PF" with whatever >>> mediation and enhancement needed >> >> That should be vGPU only land. > thanks, -- John Hubbard ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 17:49 ` John Hubbard @ 2025-10-02 18:05 ` Jason Gunthorpe 2025-10-02 18:09 ` John Hubbard 2025-10-02 18:17 ` Danilo Krummrich 0 siblings, 2 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 18:05 UTC (permalink / raw) To: John Hubbard Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote: > > Forgot to add: But I think Zhi explained that this is not necessary and can be > > controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. > > Yes, this is the direction that I originally (3 whole days ago, haha) had in mind, > after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver > handles the VFs, and use the "is virtual" logic to sort them out. To be clear, no matter what the VFIO driver bound to the VF should not become entangled with any aux devices. The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF to request the PF's help. Eg for live migration or things of that nature. My point here is that generally we don't put profiling code in the VFIO driver and then use pci_iov_get_pf_drvdata() to access the PF do actually do the profiling. The VF cannot/should not control profiling of itself - that would be a security problem once it is assigned to a VM. So the profiling resides entirely inside the PF world and should operate without VFIO. As I've said this design is compatible with VFs for containers and so on. So it is the strongly preferred design pattern. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:05 ` Jason Gunthorpe @ 2025-10-02 18:09 ` John Hubbard 2025-10-02 18:32 ` Jason Gunthorpe 2025-10-02 18:17 ` Danilo Krummrich 1 sibling, 1 reply; 32+ messages in thread From: John Hubbard @ 2025-10-02 18:09 UTC (permalink / raw) To: Jason Gunthorpe Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On 10/2/25 11:05 AM, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote: >>> Forgot to add: But I think Zhi explained that this is not necessary and can be >>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. >> >> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind, >> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver >> handles the VFs, and use the "is virtual" logic to sort them out. > > To be clear, no matter what the VFIO driver bound to the VF should not > become entangled with any aux devices. I was fine until you said "aux devices". :) What does that mean in this context? > > The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF > to request the PF's help. Eg for live migration or things of that > nature. > > My point here is that generally we don't put profiling code in the > VFIO driver and then use pci_iov_get_pf_drvdata() to access the PF do > actually do the profiling. > > The VF cannot/should not control profiling of itself - that would be a > security problem once it is assigned to a VM. > > So the profiling resides entirely inside the PF world and should > operate without VFIO. As I've said this design is compatible with VFs > for containers and so on. So it is the strongly preferred design > pattern. > OK, that all makes sense. thanks, -- John Hubbard ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:09 ` John Hubbard @ 2025-10-02 18:32 ` Jason Gunthorpe 0 siblings, 0 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 18:32 UTC (permalink / raw) To: John Hubbard Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 11:09:40AM -0700, John Hubbard wrote: > On 10/2/25 11:05 AM, Jason Gunthorpe wrote: > > On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote: > >>> Forgot to add: But I think Zhi explained that this is not necessary and can be > >>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. > >> > >> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind, > >> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver > >> handles the VFs, and use the "is virtual" logic to sort them out. > > > > To be clear, no matter what the VFIO driver bound to the VF should not > > become entangled with any aux devices. > > I was fine until you said "aux devices". :) What does that mean in this > context? Something using struct auxiliary_device Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:05 ` Jason Gunthorpe 2025-10-02 18:09 ` John Hubbard @ 2025-10-02 18:17 ` Danilo Krummrich 2025-10-02 18:31 ` Jason Gunthorpe 1 sibling, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 18:17 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On 10/2/25 8:05 PM, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote: >>> Forgot to add: But I think Zhi explained that this is not necessary and can be >>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. >> >> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind, >> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver >> handles the VFs, and use the "is virtual" logic to sort them out. > > To be clear, no matter what the VFIO driver bound to the VF should not > become entangled with any aux devices. > > The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF > to request the PF's help. Eg for live migration or things of that > nature. Ick! The VF driver should never mess with the PF driver's private data. Instead the PF driver should provide an API for the VF driver to get things done on behalf. It also has the implication that we need to guarantee that PF driver unbind will also unbind all VFs, but we need this guarantee anyways. I.e. when the VFIO driver calls into nova-core we want the guarantee that we're in a scope where the PF driver is bound. > My point here is that generally we don't put profiling code in the > VFIO driver and then use pci_iov_get_pf_drvdata() to access the PF do > actually do the profiling. > > The VF cannot/should not control profiling of itself - that would be a > security problem once it is assigned to a VM. As mentioned above, if at all I think the PF driver has to provide an API for that. > So the profiling resides entirely inside the PF world and should > operate without VFIO. Perfectly fine with me, I'm open to both approaches. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:17 ` Danilo Krummrich @ 2025-10-02 18:31 ` Jason Gunthorpe 2025-10-02 18:42 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 18:31 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 08:17:10PM +0200, Danilo Krummrich wrote: > On 10/2/25 8:05 PM, Jason Gunthorpe wrote: > > On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote: > >>> Forgot to add: But I think Zhi explained that this is not necessary and can be > >>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself. > >> > >> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind, > >> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver > >> handles the VFs, and use the "is virtual" logic to sort them out. > > > > To be clear, no matter what the VFIO driver bound to the VF should not > > become entangled with any aux devices. > > > > The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF > > to request the PF's help. Eg for live migration or things of that > > nature. > > Ick! The VF driver should never mess with the PF driver's private data. > Instead the PF driver should provide an API for the VF driver to get things done > on behalf. This exactly how this function is used. The core PF driver provides an API: struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev) Which takes in the VF as pdev and internally it invokes: mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver); Which asserts the PF is bound to the right driver, which asserts the format of the drvdata is known, and now we have a proper handle to use in the rest of API surface the VF driver will use. By forcing the ops into the signature we strongly encourage people to follow this design pattern and provide APIs, otherwise you have to export the ops to call pci_iov_get_pf_drvdata() There really isn't another good way to obtain the 'struct mlx5_core_dev' from a simple VF pci_dev. > It also has the implication that we need to guarantee that PF driver unbind will > also unbind all VFs, but we need this guarantee anyways. This is another reason why pci_iov_get_pf_drvdata() exists. /** * pci_iov_get_pf_drvdata - Return the drvdata of a PF * @dev: VF pci_dev * @pf_driver: Device driver required to own the PF * * This must be called from a context that ensures that a VF driver is attached. * The value returned is invalid once the VF driver completes its remove() * callback. * * Locking is achieved by the driver core. A VF driver cannot be probed until * pci_enable_sriov() is called and pci_disable_sriov() does not return until * all VF drivers have completed their remove(). * * The PF driver must call pci_disable_sriov() before it begins to destroy the * drvdata. */ Meaning nova-core has to guarentee to call pci_disable_sriov() before remove completes or before a failing probe returns as part of implementing SRIOV support. Userspace cannot create VFs until those calls are done. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:31 ` Jason Gunthorpe @ 2025-10-02 18:42 ` Danilo Krummrich 2025-10-02 18:56 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 18:42 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On 10/2/25 8:31 PM, Jason Gunthorpe wrote: > This exactly how this function is used. > > The core PF driver provides an API: > > struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev) > > Which takes in the VF as pdev and internally it invokes: > > mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver); Oh, I see, that makes sense then. Thanks for clarifying. I think I already had in mind how this would look like in the Rust abstraction, and there we don't need pci_iov_get_pf_drvdata() to achieve the same thing. > /** > * pci_iov_get_pf_drvdata - Return the drvdata of a PF > * @dev: VF pci_dev > * @pf_driver: Device driver required to own the PF > * > * This must be called from a context that ensures that a VF driver is attached. > * The value returned is invalid once the VF driver completes its remove() > * callback. > * > * Locking is achieved by the driver core. A VF driver cannot be probed until > * pci_enable_sriov() is called and pci_disable_sriov() does not return until > * all VF drivers have completed their remove(). > * > * The PF driver must call pci_disable_sriov() before it begins to destroy the > * drvdata. > */ > > Meaning nova-core has to guarentee to call pci_disable_sriov() before > remove completes or before a failing probe returns as part of > implementing SRIOV support. Yes, I already thought about this. In the context of adding support for SR-IOV in the Rust abstractions I'm planning on sending an RFC to let the subsystem provide this guarantee instead (at least under certain conditions). This will allow us to assert the device to be bound by the type system in the Rust PCI abstraction, rather than having the driver to provide a guarantee to call pci_disable_sriov() manually. :) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:42 ` Danilo Krummrich @ 2025-10-02 18:56 ` Jason Gunthorpe 2025-10-02 19:36 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 18:56 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 08:42:58PM +0200, Danilo Krummrich wrote: > On 10/2/25 8:31 PM, Jason Gunthorpe wrote: > > This exactly how this function is used. > > > > The core PF driver provides an API: > > > > struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev) > > > > Which takes in the VF as pdev and internally it invokes: > > > > mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver); > > Oh, I see, that makes sense then. Thanks for clarifying. I think I already had > in mind how this would look like in the Rust abstraction, and there we don't > need pci_iov_get_pf_drvdata() to achieve the same thing. I'm skeptical, there is nothing about rust that should avoid having to us pci_iov_get_pf_drvdata().. It does a number of safety checks related to the linux driver model that are not optional. Don't forget in linux you actually can bind VFIO to the PF, start SRIOV and then bind VFs to other drivers which then fail pci_iov_get_pf_drvdata(). Blindly converting a struct device to an instance memory without this check will be buggy. > Yes, I already thought about this. In the context of adding support for SR-IOV > in the Rust abstractions I'm planning on sending an RFC to let the subsystem > provide this guarantee instead (at least under certain conditions). Certain conditions may be workable, some drivers seem to have preferences not to call disable, though I think that is wrong :\ Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 18:56 ` Jason Gunthorpe @ 2025-10-02 19:36 ` Danilo Krummrich 2025-10-02 21:04 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 19:36 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 8:56 PM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 08:42:58PM +0200, Danilo Krummrich wrote: >> On 10/2/25 8:31 PM, Jason Gunthorpe wrote: >> > This exactly how this function is used. >> > >> > The core PF driver provides an API: >> > >> > struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev) >> > >> > Which takes in the VF as pdev and internally it invokes: >> > >> > mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver); >> >> Oh, I see, that makes sense then. Thanks for clarifying. I think I already had >> in mind how this would look like in the Rust abstraction, and there we don't >> need pci_iov_get_pf_drvdata() to achieve the same thing. > > I'm skeptical, there is nothing about rust that should avoid having to > us pci_iov_get_pf_drvdata().. It does a number of safety checks > related to the linux driver model that are not optional. The checks will be the same, but using pci_iov_get_pf_drvdata() directly is not workable because of how the abstractions are layered. If we want to obtain the driver's private data from a device outside the scope of bus callbacks, we always need to ensure that the device is guaranteed to be bound and we also need to prove the type of the private data, since a device structure can't be generic over its bound driver. Usually that's not an issue because other entry points into the driver, e.g. subsystem callbacks have their own private data through the class device, IRQs have their own private data in the IRQ registration, etc. >> Yes, I already thought about this. In the context of adding support for SR-IOV >> in the Rust abstractions I'm planning on sending an RFC to let the subsystem >> provide this guarantee instead (at least under certain conditions). > > Certain conditions may be workable, some drivers seem to have > preferences not to call disable, though I think that is wrong :\ I fully agree! I was told that this is because apparently some PF drivers are only loaded to enable SR-IOV and then removed to shrink the potential attack surface. Personally, I think that's slightly paranoid, if the driver would not do anything else than enable / disable SR-IOV, but I think we can work around this use-case if people really want it. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 19:36 ` Danilo Krummrich @ 2025-10-02 21:04 ` Jason Gunthorpe 2025-10-02 21:14 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 21:04 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 09:36:17PM +0200, Danilo Krummrich wrote: > If we want to obtain the driver's private data from a device outside the scope > of bus callbacks, we always need to ensure that the device is guaranteed to be > bound and we also need to prove the type of the private data, since a device > structure can't be generic over its bound driver. pci_iov_get_pf_drvdata() does both of these things - this is what it is for. Please don't open code it :( > > Certain conditions may be workable, some drivers seem to have > > preferences not to call disable, though I think that is wrong :\ > > I fully agree! I was told that this is because apparently some PF drivers are > only loaded to enable SR-IOV and then removed to shrink the potential attack > surface. Personally, I think that's slightly paranoid, if the driver would not > do anything else than enable / disable SR-IOV, but I think we can work around > this use-case if people really want it. I've heard worse reasons than that. If that is the interest I'd suggest they should just use VFIO and leave a userspace stub process.. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 21:04 ` Jason Gunthorpe @ 2025-10-02 21:14 ` Danilo Krummrich 2025-10-02 21:32 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 21:14 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On 10/2/25 11:04 PM, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 09:36:17PM +0200, Danilo Krummrich wrote: >> If we want to obtain the driver's private data from a device outside the scope >> of bus callbacks, we always need to ensure that the device is guaranteed to be >> bound and we also need to prove the type of the private data, since a device >> structure can't be generic over its bound driver. > > pci_iov_get_pf_drvdata() does both of these things - this is what it > is for. Please don't open code it :( It makes no sense to use it, both of those things will be ensured in a more generic way in the base device implementation already (which is what I meant with layering). Both requirements are not specific to PCI, or the specific VF -> PF use-case. In order to guarantee soundness both of those things have to be guaranteed for any access to the driver's private data. I will send some patches soon, I think this will make it obvious. :) >>> Certain conditions may be workable, some drivers seem to have >>> preferences not to call disable, though I think that is wrong :\ >> >> I fully agree! I was told that this is because apparently some PF drivers are >> only loaded to enable SR-IOV and then removed to shrink the potential attack >> surface. Personally, I think that's slightly paranoid, if the driver would not >> do anything else than enable / disable SR-IOV, but I think we can work around >> this use-case if people really want it. > > I've heard worse reasons than that. If that is the interest I'd > suggest they should just use VFIO and leave a userspace stub > process.. I'm not sure I follow your proposal, can you elaborate? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 21:14 ` Danilo Krummrich @ 2025-10-02 21:32 ` Danilo Krummrich 2025-10-02 23:40 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 21:32 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 11:14 PM CEST, Danilo Krummrich wrote: > On 10/2/25 11:04 PM, Jason Gunthorpe wrote: >> On Thu, Oct 02, 2025 at 09:36:17PM +0200, Danilo Krummrich wrote: >>> If we want to obtain the driver's private data from a device outside the scope >>> of bus callbacks, we always need to ensure that the device is guaranteed to be >>> bound and we also need to prove the type of the private data, since a device >>> structure can't be generic over its bound driver. >> >> pci_iov_get_pf_drvdata() does both of these things - this is what it >> is for. Please don't open code it :( > > It makes no sense to use it, both of those things will be ensured in a more > generic way in the base device implementation already (which is what I meant > with layering). > > Both requirements are not specific to PCI, or the specific VF -> PF use-case. > > In order to guarantee soundness both of those things have to be guaranteed for > any access to the driver's private data. Actually, let me elaborate a bit more on this: In C when a driver calls dev_get_drvdata() it asserts two things: (1) The device is bound to the driver calling this function. (2) It casts the returned void * to the correct type. Obviously, relying on this in Rust would be fundamentally unsafe. Hence, the idea is to implement Device::drvdata_borrow::<T>(), which takes a &Device<Bound> as argument, which proves that the device must be bound. The T must be the driver specific driver type, i.e. the type that implements e.g. the pci::Driver trait. With that, Device::drvdata_borrow() can internally check whether the asserted type T matches the unique type ID that we store in the device in probe(). This way we can prove that the device is actually bound and that we cast to the correct type. Furthermore, the returned reference to the driver's private data can't out-live the lifetime of the given &Device<Bound>, so we're also guaranteed that the device won't be unbound while we still have a reference to the driver's private data. So, when we call pdev.physfn().drvdata_borrow::<NovaCore>() the checks are included already. > I will send some patches soon, I think this will make it obvious. :) >>>> Certain conditions may be workable, some drivers seem to have >>>> preferences not to call disable, though I think that is wrong :\ >>> >>> I fully agree! I was told that this is because apparently some PF drivers are >>> only loaded to enable SR-IOV and then removed to shrink the potential attack >>> surface. Personally, I think that's slightly paranoid, if the driver would not >>> do anything else than enable / disable SR-IOV, but I think we can work around >>> this use-case if people really want it. >> >> I've heard worse reasons than that. If that is the interest I'd >> suggest they should just use VFIO and leave a userspace stub >> process.. > > I'm not sure I follow your proposal, can you elaborate? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 21:32 ` Danilo Krummrich @ 2025-10-02 23:40 ` Jason Gunthorpe 2025-10-03 0:57 ` Danilo Krummrich 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 23:40 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 11:32:44PM +0200, Danilo Krummrich wrote: > So, when we call pdev.physfn().drvdata_borrow::<NovaCore>() the checks are > included already. I'm not keen on hiding this reasoning inside an physfn() accessor like this. ie one that returns a Device<Bound>. The reasoning for this is tricky and special. We have enough cases where physfn won't be a bound driver. I think it is big stretch just to declare that unconditionally safe. There is a reason pci_iov_get_pf_drvdata() has such a big comment.. So I'd rather see you follow the C design and have an explicit helper function to convert a VF bound device to a PF bound device and check the owner, basically split up pci_iov_get_pf_drvdata() into a part to get the struct device and an inline to get the drvdata. Rust still has an ops pointer it can pass in so it can be consistent with the C code even if it does another check inside its drvdata_borrow. This way we keep the reasoning and explanation in one place. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 23:40 ` Jason Gunthorpe @ 2025-10-03 0:57 ` Danilo Krummrich 2025-10-03 11:52 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Danilo Krummrich @ 2025-10-03 0:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Fri Oct 3, 2025 at 1:40 AM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 11:32:44PM +0200, Danilo Krummrich wrote: > >> So, when we call pdev.physfn().drvdata_borrow::<NovaCore>() the checks are >> included already. > > I'm not keen on hiding this reasoning inside an physfn() accessor like > this. ie one that returns a Device<Bound>. The reasoning for this is > tricky and special. We have enough cases where physfn won't be a bound > driver. I think it is big stretch just to declare that unconditionally > safe. In this example physfn() would return a &pci::Device<Bound>. This is what I mentioned previously; I want the subsystem to guarantee (at least under certain circumstances) that if the VF device is bound that the PF device must be bound as well. > There is a reason pci_iov_get_pf_drvdata() has such a big comment.. > > So I'd rather see you follow the C design and have an explicit helper > function to convert a VF bound device to a PF bound device Well, this is exactly what physfn() does (with the precondition that we can get the guarantee from the subsystem). > and check > the owner, basically split up pci_iov_get_pf_drvdata() into a part to > get the struct device and an inline to get the drvdata. Rust still has an > ops pointer it can pass in so it can be consistent with the C code Which ops pointer are you referring to? Do you mean the struct pci_driver pointer? If so, no we can't access this one. We could make it accessible, but it would result into horrible code, wouldn't make it possible to implement the check generically for any device (which we need anyways) and would have some other bad implications. I try to be as consistent as possible with C code, but sometimes it just doesn't fit at all and would even hurt. This is one of those cases. To give at least some background: A driver structure (like struct pci_driver) is embedded in a module structure, which is a global static and intentionally not directly accessible for drivers. Even if we'd make it accessible, the driver field within a module structure depends on the exact implementation, i.e. it depends on whether a module is declared "by hand", or whether it is generated by a module_driver!() macro (e.g. module_pci_driver!(). It probably also helps to have a look at samples/rust/rust_driver_auxiliary.rs, which open codes driver registrations, since it registers two drivers in the same module for the purpose of illustrating an auxiliary connection, i.e. it doesn't use a module_driver!() macro. The struct struct auxiliary_driver resides within the driver::Registration<auxiliary::Adapter<T>>, where driver::Registration is a generic type for registering drivers, auxiliary::Adapter defines the auxiliary specific bits for this registration and T is the driver specific type that implements the auxiliary::Driver trait, containing the bus callbacks etc. > even if it does another check inside its drvdata_borrow. It's the exact same check; just a better fitting implementation. I.e. instead of checking a pointer which is hard to access, we check if the type ID we gave to the device matches the driver specific type (the T in driver::Registration<auxiliary::Adapter<T>>). Semantically, there is no difference, but it results in less cumbersome and more flexible code. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-03 0:57 ` Danilo Krummrich @ 2025-10-03 11:52 ` Jason Gunthorpe 0 siblings, 0 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-03 11:52 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Fri, Oct 03, 2025 at 02:57:53AM +0200, Danilo Krummrich wrote: > Which ops pointer are you referring to? Do you mean the struct pci_driver > pointer? If so, no we can't access this one. We could make it accessible, but it > would result into horrible code, wouldn't make it possible to implement the > check generically for any device (which we need anyways) and would have some > other bad implications. Yes pci_driver. You must confirm the attached driver is following the right SRIOV protocol otherwise you can't claim the result is bound. This is where we are with PCI today at least. It sounds like you plan to come with patches changing how SRIOV enablement works in PCI susbsytem, if that also comes with a way to detect that the driver is following the rules without using pci_driver then great. But that would make me feel more strongly that this should be a core helper and the existing users should be converted to the new mechanism so we don't have two approaches here. > Even if we'd make it accessible, the driver field within a module structure > depends on the exact implementation, i.e. it depends on whether a module is > declared "by hand", or whether it is generated by a module_driver!() macro (e.g. > module_pci_driver!(). It is kind of a bad place to end up, drivers do occasionally use their driver pointers for little things, like this for example. It is not common but it is a technique.. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 17:37 ` Danilo Krummrich 2025-10-02 17:40 ` Danilo Krummrich @ 2025-10-02 17:56 ` Jason Gunthorpe 2025-10-02 18:55 ` Danilo Krummrich 1 sibling, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-10-02 17:56 UTC (permalink / raw) To: Danilo Krummrich Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu, Oct 02, 2025 at 07:37:45PM +0200, Danilo Krummrich wrote: > > The driver to operate the function in "vGPU" mode as indicated by the > > register has to be in nova-core, since there is only one device ID. > > Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM > have to be that same. But the VF driver on the host can still be a seaparate > one. In most cases it is going to be bound to a vfio driver.. However, if you actually want a DRM subsystem device on the VF without a VM I don't know why you'd use a different driver than the one used by the VM on the very same VF, with the very same register programming model.. > > I think it would be good to have Zhi clarify more of this, but from > > what I understand are at least three activites comingled all together: > > > > 1) Boot the PF in "vGPU" mode so it can enable SRIOV > > Ok, this might be where the confusion above comes from. When I talk about > nova-core in vGPU mode I mean nova-core running in the VM on the (from VM > perspective) PF. I would call this nova-core running on a VF (assigned to a VM) Not sure "vgpu" is a helpful word here, lets try to talk about what .ko's and struct device_drivers's the various codes should live in.. > But you seem to mean nova-core running on the host PF with vGPU on top? That of > course has to be in nova-core. Yes, #1 would be implemented as part of nova-core.ko and it's pci_driver. As I understand it around firmware loading nova-core has to tell the FW if it wants to enable "vGPU" mode or not. If it doesn't then the sriov_configure op should be inhibited and #2 disabled. If it does then sriov_configure should work, #2 is enabled, and DRM on the PF is disabled. > > 2) Enable SRIOV and profile VFs to allocate HW resources to them > > I think that's partially in nova-core and partially in vGPU; nova-core providing > the abstraction of the corresponding firmware / hardware interfaces and vGPU > controlling the semantics of the resource handling? > This is what I thought vGPU has a secondary part for where it binds to nova-core > through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the > VFIO parts and a "per VF resource controller". This is certainly one option, you can put #2 in an aux driver of the PF in a nova-sriov.ko module that is fully divorced from VFIO. It might go along with a nova-fwctl.ko module too. You could also just embed it in nova-core.ko and have it activate when the PF is booted in "vGPU" mode. Broadly I would suggest the latter. aux devices make most sense to cross subsystems. Micro splitting a single driver with aux devices will make more of a mess than required. Though a good motivating reason would be if nova-srvio.ko is large. > > 3) VFIO variant driver to convert the VF into a "VM PF" with whatever > > mediation and enhancement needed > > That should be vGPU only land. I think it is clear this part should be in a vfio-pci-nova.ko Then you have two more: 4) A PCI driver in a VM that creates a DRM subsystem device This is nova-core.ko + nova-drm.ko 5) A VF driver that creates a DRM subsystem device without a VM Zhi says the device can't do this, but lets assume it could, then I would expect this to be nova-core.ko + nova-drm.ko, same as #4. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs 2025-10-02 17:56 ` Jason Gunthorpe @ 2025-10-02 18:55 ` Danilo Krummrich 0 siblings, 0 replies; 32+ messages in thread From: Danilo Krummrich @ 2025-10-02 18:55 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML On Thu Oct 2, 2025 at 7:56 PM CEST, Jason Gunthorpe wrote: > This is certainly one option, you can put #2 in an aux driver of the > PF in a nova-sriov.ko module that is fully divorced from VFIO. It > might go along with a nova-fwctl.ko module too. > > You could also just embed it in nova-core.ko and have it activate when > the PF is booted in "vGPU" mode. > > Broadly I would suggest the latter. aux devices make most sense to > cross subsystems. Micro splitting a single driver with aux devices > will make more of a mess than required. Though a good motivating > reason would be if nova-srvio.ko is large. As mentioned in the other sub-thread, I'm fine with either approach, but I think I'd also prefer folding it into nova-core. > Then you have two more: > > 4) A PCI driver in a VM that creates a DRM subsystem device > > This is nova-core.ko + nova-drm.ko > > 5) A VF driver that creates a DRM subsystem device without a VM > > Zhi says the device can't do this, but lets assume it could, then I > would expect this to be nova-core.ko + nova-drm.ko, same as #4. Indeed, but there'd probably be an overlap between the logic in the VFIO driver and the logic required in nova-core to make this use-case happen I suspect. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported 2025-10-02 2:00 [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core John Hubbard 2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard @ 2025-10-02 2:00 ` John Hubbard 1 sibling, 0 replies; 32+ messages in thread From: John Hubbard @ 2025-10-02 2:00 UTC (permalink / raw) To: Danilo Krummrich Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple, Zhi Wang, Surath Mitra, David Airlie, Simona Vetter, Alex Williamson, Jason Gunthorpe, Bjorn Helgaas, Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML, John Hubbard nova-core does not yet support PCIe Virtual Functions (VFs). Until it does, declare to the Rust PCI driver core that VFs are not supported. Because the Rust PCI driver core defaults to Driver::SUPPORTS_VF = false, this change is not strictly necessary for functionality. Its purpose is to provide a self-documentating line of code that can be easily changed, when VF support is being added. Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Joel Fernandes <joelagnelf@nvidia.com> Cc: Zhi Wang <zhiw@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- drivers/gpu/nova-core/driver.rs | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs index 5d23a91f51dd..4c19b0067862 100644 --- a/drivers/gpu/nova-core/driver.rs +++ b/drivers/gpu/nova-core/driver.rs @@ -51,6 +51,11 @@ impl pci::Driver for NovaCore { type IdInfo = (); const ID_TABLE: pci::IdTable<Self::IdInfo> = &PCI_TABLE; + // PCI probe() will report the same device ID for a PF and its associated VFs. This will cause + // failures when trying to bind to the VFs, until NovaCore adds support to handle that + // situation. Until then, tell the PCI driver core that we don't support VFs. + const SUPPORTS_VF: bool = false; + fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> { dev_dbg!(pdev.as_ref(), "Probe Nova Core GPU driver.\n"); -- 2.51.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-10-03 11:52 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-02 2:00 [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core John Hubbard 2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard 2025-10-02 11:49 ` Danilo Krummrich 2025-10-02 12:11 ` Jason Gunthorpe 2025-10-02 12:18 ` Danilo Krummrich 2025-10-02 12:39 ` Jason Gunthorpe 2025-10-02 13:03 ` Danilo Krummrich 2025-10-02 13:56 ` Jason Gunthorpe 2025-10-02 15:11 ` Danilo Krummrich 2025-10-02 15:23 ` Jason Gunthorpe 2025-10-02 16:05 ` Danilo Krummrich 2025-10-02 17:05 ` Jason Gunthorpe 2025-10-02 17:37 ` Danilo Krummrich 2025-10-02 17:40 ` Danilo Krummrich 2025-10-02 17:49 ` John Hubbard 2025-10-02 18:05 ` Jason Gunthorpe 2025-10-02 18:09 ` John Hubbard 2025-10-02 18:32 ` Jason Gunthorpe 2025-10-02 18:17 ` Danilo Krummrich 2025-10-02 18:31 ` Jason Gunthorpe 2025-10-02 18:42 ` Danilo Krummrich 2025-10-02 18:56 ` Jason Gunthorpe 2025-10-02 19:36 ` Danilo Krummrich 2025-10-02 21:04 ` Jason Gunthorpe 2025-10-02 21:14 ` Danilo Krummrich 2025-10-02 21:32 ` Danilo Krummrich 2025-10-02 23:40 ` Jason Gunthorpe 2025-10-03 0:57 ` Danilo Krummrich 2025-10-03 11:52 ` Jason Gunthorpe 2025-10-02 17:56 ` Jason Gunthorpe 2025-10-02 18:55 ` Danilo Krummrich 2025-10-02 2:00 ` [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported John Hubbard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).