* [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core
@ 2025-10-02 2:00 John Hubbard
2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard
2025-10-02 2:00 ` [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported John Hubbard
0 siblings, 2 replies; 32+ messages in thread
From: John Hubbard @ 2025-10-02 2:00 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple,
Zhi Wang, Surath Mitra, David Airlie, Simona Vetter,
Alex Williamson, Jason Gunthorpe, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML, John Hubbard
Changes in v2:
1) Completely rewrote the series, using Danilo's suggested approach of
"driver supports VF probing or not": the PCI driver core will skip
probing any device's VFs if the driver has coded itself as "not
supporting VFs".
2) Included Joel's clarification, as a comment in the code, about what
happens if VFs are probed.
As with v1 [1], I've based this on top of today's driver-core-next [2],
because the first patch belongs there, and the second patch applies
cleanly to either driver-core-next or drm-rust-next. So this seems like
the easiest to work with.
[1]] https://lore.kernel.org/20250930220759.288528-1-jhubbard@nvidia.com
[2] https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Zhi Wang <zhiw@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Danilo Krummrich <dakr@kernel.org>
John Hubbard (2):
rust: pci: skip probing VFs if driver doesn't support VFs
gpu: nova-core: declare that VFs are not (yet) supported
drivers/gpu/nova-core/driver.rs | 5 +++++
drivers/pci/pci-driver.c | 3 +++
include/linux/pci.h | 1 +
rust/kernel/pci.rs | 4 ++++
4 files changed, 13 insertions(+)
base-commit: 6d97171ac6585de698df019b0bfea3f123fd8385
--
2.51.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 2:00 [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core John Hubbard
@ 2025-10-02 2:00 ` John Hubbard
2025-10-02 11:49 ` Danilo Krummrich
2025-10-02 12:11 ` Jason Gunthorpe
2025-10-02 2:00 ` [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported John Hubbard
1 sibling, 2 replies; 32+ messages in thread
From: John Hubbard @ 2025-10-02 2:00 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple,
Zhi Wang, Surath Mitra, David Airlie, Simona Vetter,
Alex Williamson, Jason Gunthorpe, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML, John Hubbard
Add a "supports_vf" flag to struct pci_driver to let drivers declare
Virtual Function (VF) support. If a driver does not support VFs, then
the PCI driver core will not probe() any VFs for that driver's devices.
On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to
false: drivers must explicitly opt into VF support.
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Zhi Wang <zhiw@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
drivers/pci/pci-driver.c | 3 +++
include/linux/pci.h | 1 +
rust/kernel/pci.rs | 4 ++++
3 files changed, 8 insertions(+)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 63665240ae87..588666cc7254 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -412,6 +412,9 @@ static int __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
if (drv->probe) {
error = -ENODEV;
+ if (pci_dev->is_virtfn && !drv->supports_vf)
+ return error;
+
id = pci_match_device(drv, pci_dev);
if (id)
error = pci_call_probe(drv, pci_dev, id);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 59876de13860..92510886a086 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -983,6 +983,7 @@ struct pci_driver {
struct device_driver driver;
struct pci_dynids dynids;
bool driver_managed_dma;
+ bool supports_vf; /* Will bind to Virtual Functions */
};
#define to_pci_driver(__drv) \
diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index 7fcc5f6022c1..c5d036770e65 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -47,6 +47,7 @@ unsafe fn register(
(*pdrv.get()).probe = Some(Self::probe_callback);
(*pdrv.get()).remove = Some(Self::remove_callback);
(*pdrv.get()).id_table = T::ID_TABLE.as_ptr();
+ (*pdrv.get()).supports_vf = T::SUPPORTS_VF;
}
// SAFETY: `pdrv` is guaranteed to be a valid `RegType`.
@@ -268,6 +269,9 @@ pub trait Driver: Send {
/// The table of device ids supported by the driver.
const ID_TABLE: IdTable<Self::IdInfo>;
+ /// Whether the driver supports Virtual Functions.
+ const SUPPORTS_VF: bool = false;
+
/// PCI driver probe.
///
/// Called when a new pci device is added or discovered. Implementers should
--
2.51.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported
2025-10-02 2:00 [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core John Hubbard
2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard
@ 2025-10-02 2:00 ` John Hubbard
1 sibling, 0 replies; 32+ messages in thread
From: John Hubbard @ 2025-10-02 2:00 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple,
Zhi Wang, Surath Mitra, David Airlie, Simona Vetter,
Alex Williamson, Jason Gunthorpe, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML, John Hubbard
nova-core does not yet support PCIe Virtual Functions (VFs). Until it
does, declare to the Rust PCI driver core that VFs are not supported.
Because the Rust PCI driver core defaults to Driver::SUPPORTS_VF =
false, this change is not strictly necessary for functionality. Its
purpose is to provide a self-documentating line of code that can be
easily changed, when VF support is being added.
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Zhi Wang <zhiw@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
drivers/gpu/nova-core/driver.rs | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 5d23a91f51dd..4c19b0067862 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -51,6 +51,11 @@ impl pci::Driver for NovaCore {
type IdInfo = ();
const ID_TABLE: pci::IdTable<Self::IdInfo> = &PCI_TABLE;
+ // PCI probe() will report the same device ID for a PF and its associated VFs. This will cause
+ // failures when trying to bind to the VFs, until NovaCore adds support to handle that
+ // situation. Until then, tell the PCI driver core that we don't support VFs.
+ const SUPPORTS_VF: bool = false;
+
fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> {
dev_dbg!(pdev.as_ref(), "Probe Nova Core GPU driver.\n");
--
2.51.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard
@ 2025-10-02 11:49 ` Danilo Krummrich
2025-10-02 12:11 ` Jason Gunthorpe
1 sibling, 0 replies; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 11:49 UTC (permalink / raw)
To: John Hubbard
Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple,
Zhi Wang, Surath Mitra, David Airlie, Simona Vetter,
Alex Williamson, Jason Gunthorpe, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 4:00 AM CEST, John Hubbard wrote:
> Add a "supports_vf" flag to struct pci_driver to let drivers declare
> Virtual Function (VF) support. If a driver does not support VFs, then
> the PCI driver core will not probe() any VFs for that driver's devices.
>
> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to
> false: drivers must explicitly opt into VF support.
>
> Cc: Alexandre Courbot <acourbot@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Joel Fernandes <joelagnelf@nvidia.com>
> Cc: Zhi Wang <zhiw@nvidia.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
> drivers/pci/pci-driver.c | 3 +++
> include/linux/pci.h | 1 +
> rust/kernel/pci.rs | 4 ++++
> 3 files changed, 8 insertions(+)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 63665240ae87..588666cc7254 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -412,6 +412,9 @@ static int __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
> if (drv->probe) {
> error = -ENODEV;
>
> + if (pci_dev->is_virtfn && !drv->supports_vf)
> + return error;
> +
> id = pci_match_device(drv, pci_dev);
> if (id)
> error = pci_call_probe(drv, pci_dev, id);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 59876de13860..92510886a086 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -983,6 +983,7 @@ struct pci_driver {
> struct device_driver driver;
> struct pci_dynids dynids;
> bool driver_managed_dma;
> + bool supports_vf; /* Will bind to Virtual Functions */
I don't see any driver changes in this patch, are we sure this doesn't break any
existing drivers given that this defaults to false?
Obviously, the safe call would be to invert the logic, such that it defaults to
VFs being supported, though I clearly do prefer the opt-in.
Also, in C this always defaults to false, whereas in Rust we have the choice to
make it true by default, hence in C we'd need to invert the semantics, which is
not desirable either.
> };
>
> #define to_pci_driver(__drv) \
> diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
> index 7fcc5f6022c1..c5d036770e65 100644
> --- a/rust/kernel/pci.rs
> +++ b/rust/kernel/pci.rs
> @@ -47,6 +47,7 @@ unsafe fn register(
> (*pdrv.get()).probe = Some(Self::probe_callback);
> (*pdrv.get()).remove = Some(Self::remove_callback);
> (*pdrv.get()).id_table = T::ID_TABLE.as_ptr();
> + (*pdrv.get()).supports_vf = T::SUPPORTS_VF;
> }
>
> // SAFETY: `pdrv` is guaranteed to be a valid `RegType`.
> @@ -268,6 +269,9 @@ pub trait Driver: Send {
> /// The table of device ids supported by the driver.
> const ID_TABLE: IdTable<Self::IdInfo>;
>
> + /// Whether the driver supports Virtual Functions.
> + const SUPPORTS_VF: bool = false;
> +
> /// PCI driver probe.
> ///
> /// Called when a new pci device is added or discovered. Implementers should
> --
> 2.51.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard
2025-10-02 11:49 ` Danilo Krummrich
@ 2025-10-02 12:11 ` Jason Gunthorpe
2025-10-02 12:18 ` Danilo Krummrich
1 sibling, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 12:11 UTC (permalink / raw)
To: John Hubbard
Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote:
> Add a "supports_vf" flag to struct pci_driver to let drivers declare
> Virtual Function (VF) support. If a driver does not support VFs, then
> the PCI driver core will not probe() any VFs for that driver's devices.
>
> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to
> false: drivers must explicitly opt into VF support.
As I said in the other thread - please no.
Linux drivers are expected to run on their VFs. This temporary
weirdness of novacore should not be elevated to a core behavior that
people will misuse.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 12:11 ` Jason Gunthorpe
@ 2025-10-02 12:18 ` Danilo Krummrich
2025-10-02 12:39 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 12:18 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote:
> On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote:
>> Add a "supports_vf" flag to struct pci_driver to let drivers declare
>> Virtual Function (VF) support. If a driver does not support VFs, then
>> the PCI driver core will not probe() any VFs for that driver's devices.
>>
>> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to
>> false: drivers must explicitly opt into VF support.
>
> As I said in the other thread - please no.
>
> Linux drivers are expected to run on their VFs.
The consequence would be that drivers for HW that can export VFs would need to
be rejected upstream if they only support the PF, but no VFs. IMHO, that's an
unreasonable requirement.
Don't get me wrong, I agree that it'd be great if all drivers would (be able to)
support the corresponding VFs.
But in practice that's not always the case, so I'd rather have a common way to
let drivers assert what they support over going back to the "old model" of
probing drivers, where we just rely on probe() to fail.
> This temporary
> weirdness of novacore should not be elevated to a core behavior that
> people will misuse.
It's not just nova-core, please see [1].
[1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF@kernel.org/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 12:18 ` Danilo Krummrich
@ 2025-10-02 12:39 ` Jason Gunthorpe
2025-10-02 13:03 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 12:39 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 02:18:36PM +0200, Danilo Krummrich wrote:
> On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote:
> > On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote:
> >> Add a "supports_vf" flag to struct pci_driver to let drivers declare
> >> Virtual Function (VF) support. If a driver does not support VFs, then
> >> the PCI driver core will not probe() any VFs for that driver's devices.
> >>
> >> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to
> >> false: drivers must explicitly opt into VF support.
> >
> > As I said in the other thread - please no.
> >
> > Linux drivers are expected to run on their VFs.
>
> The consequence would be that drivers for HW that can export VFs would need to
> be rejected upstream if they only support the PF, but no VFs. IMHO, that's an
> unreasonable requirement.
Not rejected, they just need to open code a simple isvf check and fail
during probe if they really have a (hopefully temporary) problem.
This not really a realistic case. Linux running in the VM *should*
have drivers that operate the VF, and those existing drivers *should*
work in the PF context.
Drivers that work in VM but not in a host should not be encouraged!!
AFAICT this is even true for novacore, the driver should "work" but
the VF won't be provisioned today so it should fail startup in some
way. eg "no vram" or something like that.
> > This temporary
> > weirdness of novacore should not be elevated to a core behavior that
> > people will misuse.
>
> It's not just nova-core, please see [1].
>
> [1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF@kernel.org/
I responded there, I don't think the reasons those were added to ICE
and then cargo-culted are very good, not good enough to justify adding
it to the core code.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 12:39 ` Jason Gunthorpe
@ 2025-10-02 13:03 ` Danilo Krummrich
2025-10-02 13:56 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 13:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 2:39 PM CEST, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 02:18:36PM +0200, Danilo Krummrich wrote:
>> On Thu Oct 2, 2025 at 2:11 PM CEST, Jason Gunthorpe wrote:
>> > On Wed, Oct 01, 2025 at 07:00:09PM -0700, John Hubbard wrote:
>> >> Add a "supports_vf" flag to struct pci_driver to let drivers declare
>> >> Virtual Function (VF) support. If a driver does not support VFs, then
>> >> the PCI driver core will not probe() any VFs for that driver's devices.
>> >>
>> >> On the Rust side, add a const "SUPPORTS_VF" Driver trait, defaulting to
>> >> false: drivers must explicitly opt into VF support.
>> >
>> > As I said in the other thread - please no.
>> >
>> > Linux drivers are expected to run on their VFs.
>>
>> The consequence would be that drivers for HW that can export VFs would need to
>> be rejected upstream if they only support the PF, but no VFs. IMHO, that's an
>> unreasonable requirement.
>
> Not rejected, they just need to open code a simple isvf check and fail
> during probe if they really have a (hopefully temporary) problem.
The question is whether it is due to a (temporary) problem, or if it is by
design.
I think it's not unreasonable to have a driver for the PF and a separate driver
for the VFs if they are different enough; the drivers can still share common
code of course.
Surely, you can argue that if they have different enough requirements they
should have different device IDs, but "different enough requirements" is pretty
vague and it's not under our control either.
> This not really a realistic case. Linux running in the VM *should*
> have drivers that operate the VF, and those existing drivers *should*
> work in the PF context.
>
> Drivers that work in VM but not in a host should not be encouraged!!
I agree, we should indeed encourage HW manufacturers to design the HW in a way
that a single driver works in both cases, i.e. less less code to maintain, less
surface for bugs, etc., if that is what you mean.
But, if there is another solution for VFs already, e.g. in the case of nova-core
vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned
I prefer opt-in, but you convinced me that it should rather be opt-out.)
> AFAICT this is even true for novacore, the driver should "work" but
> the VF won't be provisioned today so it should fail startup in some
> way. eg "no vram" or something like that.
>
>> > This temporary
>> > weirdness of novacore should not be elevated to a core behavior that
>> > people will misuse.
>>
>> It's not just nova-core, please see [1].
>>
>> [1] https://lore.kernel.org/lkml/DD7TP31FEE92.2E0AKAHUOHVVF@kernel.org/
>
> I responded there, I don't think the reasons those were added to ICE
> and then cargo-culted are very good, not good enough to justify adding
> it to the core code.
Indeed, the justification of ICE is clearly wrong.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 13:03 ` Danilo Krummrich
@ 2025-10-02 13:56 ` Jason Gunthorpe
2025-10-02 15:11 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 13:56 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 03:03:38PM +0200, Danilo Krummrich wrote:
> I think it's not unreasonable to have a driver for the PF and a separate driver
> for the VFs if they are different enough; the drivers can still share common
> code of course.
This isn't feasible without different PCI IDs. ICE does this for
example where they have two totally different drivers, and of course
two different PCI IDs.
> Surely, you can argue that if they have different enough requirements they
> should have different device IDs, but "different enough requirements" is pretty
> vague and it's not under our control either.
If you want two drivers in Linux you need two PCI IDs.
We can't reliably select different drivers based on VFness because
VFness is wiped out during virtualization.
> But, if there is another solution for VFs already, e.g. in the case of nova-core
> vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned
> I prefer opt-in, but you convinced me that it should rather be
> opt-out.)
I think nova-core has a temporary (OOT even!) issue that should be
resolved - that doesn't justify adding core kernel infrastructure that
will encourage more drivers to go away from our kernel design goals of
drivers working equally in host and VM.
Nova core should work in that it probes, detects an unprovisioned VF
and then fails probe. This edge case could perhaps arise in a VM
anyhow and needs to be handled anyhow.
Also, this is all pointless until nova-core gets an sriov_configure
callback - you can't even turn on SRIOV without that!
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 13:56 ` Jason Gunthorpe
@ 2025-10-02 15:11 ` Danilo Krummrich
2025-10-02 15:23 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 15:11 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 3:56 PM CEST, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 03:03:38PM +0200, Danilo Krummrich wrote:
>
>> I think it's not unreasonable to have a driver for the PF and a separate driver
>> for the VFs if they are different enough; the drivers can still share common
>> code of course.
>
> This isn't feasible without different PCI IDs.
At least on the host you can obviously differentiate them.
>> Surely, you can argue that if they have different enough requirements they
>> should have different device IDs, but "different enough requirements" is pretty
>> vague and it's not under our control either.
>
> If you want two drivers in Linux you need two PCI IDs.
>
> We can't reliably select different drivers based on VFness because
> VFness is wiped out during virtualization.
Sure, but I thought the whole point is that some VFs are not given directly to
the VM, but have some kind of intermediate layer, such as vGPU. I.e. some kind
of hybrid approach between full pass-through and mediated devices?
>> But, if there is another solution for VFs already, e.g. in the case of nova-core
>> vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned
>> I prefer opt-in, but you convinced me that it should rather be
>> opt-out.)
>
> I think nova-core has a temporary (OOT even!) issue that should be
> resolved - that doesn't justify adding core kernel infrastructure that
> will encourage more drivers to go away from our kernel design goals of
> drivers working equally in host and VM.
My understanding is that vGPU will ensure that the device exposed to the VM will
be set up to be (at least mostly) compatible with nova-core's PF code paths?
So, there is a semantical difference between vGPU and nova-core that makes a
differentiation between VF and PF meaningful and justified.
But maybe this understanding is not correct. If so, please educate me.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 15:11 ` Danilo Krummrich
@ 2025-10-02 15:23 ` Jason Gunthorpe
2025-10-02 16:05 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 15:23 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 05:11:01PM +0200, Danilo Krummrich wrote:
> On Thu Oct 2, 2025 at 3:56 PM CEST, Jason Gunthorpe wrote:
> >> Surely, you can argue that if they have different enough requirements they
> >> should have different device IDs, but "different enough requirements" is pretty
> >> vague and it's not under our control either.
> >
> > If you want two drivers in Linux you need two PCI IDs.
> >
> > We can't reliably select different drivers based on VFness because
> > VFness is wiped out during virtualization.
>
> Sure, but I thought the whole point is that some VFs are not given directly to
> the VM, but have some kind of intermediate layer, such as vGPU. I.e. some kind
> of hybrid approach between full pass-through and mediated devices?
We'd call the intermediate layer "mediation", and it is abnormal to
have something significant there because it impacts performance and
increases the hypervisor attack surface. Usually you just give the
whole VF mostly as-is.
The ideal vfio driver is the common one because the raw VF doesn't
need any meddling.
> >> But, if there is another solution for VFs already, e.g. in the case of nova-core
> >> vGPU, why restrict drivers from opt-out of VFs. (In a previous reply I mentioned
> >> I prefer opt-in, but you convinced me that it should rather be
> >> opt-out.)
> >
> > I think nova-core has a temporary (OOT even!) issue that should be
> > resolved - that doesn't justify adding core kernel infrastructure that
> > will encourage more drivers to go away from our kernel design goals of
> > drivers working equally in host and VM.
>
> My understanding is that vGPU will ensure that the device exposed to the VM will
> be set up to be (at least mostly) compatible with nova-core's PF code paths?
This is not what I've been told, the VF driver has significant
programming model differences in the NVIDIA model, and supports
different commands.
If you look at the VFIO driver RFC it basically does no mediation, it
isn't intercepting MMIO - the guest sees the BARs directly. Most of
the code is "profiling" from what I can tell. Some config space
meddling.
I think Zhi could make it more clear what mediation the actual VFIO
part of the driver is even doing and why..
Keep in mind for NVIDIA vGPU this has changed alot over the years - I
am talking here about the SRIOV driver proposed recently that is
specifically designed to substantially remove the hypervisor
mediation.
> So, there is a semantical difference between vGPU and nova-core that makes a
> differentiation between VF and PF meaningful and justified.
Aa was said earlier, there is a register in the MMIO that identifies
if the vGPU programming model is used, that is what nova core should
key on to determine how to act.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 15:23 ` Jason Gunthorpe
@ 2025-10-02 16:05 ` Danilo Krummrich
2025-10-02 17:05 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 16:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote:
> This is not what I've been told, the VF driver has significant
> programming model differences in the NVIDIA model, and supports
> different commands.
Ok, that means there are some more fundamental differences between the host PF
and the "VM PF" code that we have to deal with.
But that doesn't necessarily require that the VF parts of the host have to be in
nova-core as well, i.e. with the information we have we can differentiate
between PF, VF and PF in the VM (indicated by a device register).
> If you look at the VFIO driver RFC it basically does no mediation, it
> isn't intercepting MMIO - the guest sees the BARs directly. Most of
> the code is "profiling" from what I can tell. Some config space
> meddling.
Sure, there is no mediation in that sense, but it needs quite some setup
regardless, no?
I thought there is a significant amount of semantics that is different between
booting the PF and the VF on the host.
Also, the idea was to use a layered approach, i.e. let nova-core serve as an
abstraction layer, where the DRM and VFIO parts can be layered on top of.
Are you suggesting to merge vGPU into nova-core?
(The VF specific firmware interfaces, should be abstracted in nova-core, so,
technically, we will have some vGPU specific code in nova-core.)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 16:05 ` Danilo Krummrich
@ 2025-10-02 17:05 ` Jason Gunthorpe
2025-10-02 17:37 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 17:05 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote:
> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote:
> > This is not what I've been told, the VF driver has significant
> > programming model differences in the NVIDIA model, and supports
> > different commands.
>
> Ok, that means there are some more fundamental differences between the host PF
> and the "VM PF" code that we have to deal with.
That was my understanding.
> But that doesn't necessarily require that the VF parts of the host have to be in
> nova-core as well, i.e. with the information we have we can differentiate
> between PF, VF and PF in the VM (indicated by a device register).
I'm not entirely sure what you mean by this..
The driver to operate the function in "vGPU" mode as indicated by the
register has to be in nova-core, since there is only one device ID.
> > If you look at the VFIO driver RFC it basically does no mediation, it
> > isn't intercepting MMIO - the guest sees the BARs directly. Most of
> > the code is "profiling" from what I can tell. Some config space
> > meddling.
>
> Sure, there is no mediation in that sense, but it needs quite some setup
> regardless, no?
>
> I thought there is a significant amount of semantics that is different between
> booting the PF and the VF on the host.
I think it would be good to have Zhi clarify more of this, but from
what I understand are at least three activites comingled all together:
1) Boot the PF in "vGPU" mode so it can enable SRIOV
2) Enable SRIOV and profile VFs to allocate HW resources to them
3) VFIO variant driver to convert the VF into a "VM PF" with whatever
mediation and enhancement needed
From a broad perspective we in the kernel have put #2 outside VFIO
because all of that is actually run through the PF and doesn't use the
VF at all.
#3 is the vfio driver and I would like it if vfio drivers restrained
themselves to focus on the mediation, live migration and things like
that which are directly related to VFIO..
> Also, the idea was to use a layered approach, i.e. let nova-core
> serve as an abstraction layer, where the DRM and VFIO parts can be
> layered on top of.
Yes, I think everyone is good with some version of this..
A big question in my mind is where do you put #2, and what uapi does
it provide. It has to layer on top of nova-core because it has to use
the PF to do profiling.
I'm not a fan of the vfio based sysfs as uAPI for #2, for reasons
touched on in this thread.
NIC drivers are using fwctl and devlink for profiling, managed by the
PF driver. I think I'd want to here reasons why those interfaces
cannot be used here.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 17:05 ` Jason Gunthorpe
@ 2025-10-02 17:37 ` Danilo Krummrich
2025-10-02 17:40 ` Danilo Krummrich
2025-10-02 17:56 ` Jason Gunthorpe
0 siblings, 2 replies; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 17:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 7:05 PM CEST, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote:
>> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote:
>> > This is not what I've been told, the VF driver has significant
>> > programming model differences in the NVIDIA model, and supports
>> > different commands.
>>
>> Ok, that means there are some more fundamental differences between the host PF
>> and the "VM PF" code that we have to deal with.
>
> That was my understanding.
>
>> But that doesn't necessarily require that the VF parts of the host have to be in
>> nova-core as well, i.e. with the information we have we can differentiate
>> between PF, VF and PF in the VM (indicated by a device register).
>
> I'm not entirely sure what you mean by this..
>
> The driver to operate the function in "vGPU" mode as indicated by the
> register has to be in nova-core, since there is only one device ID.
Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM
have to be that same. But the VF driver on the host can still be a seaparate
one.
>> > If you look at the VFIO driver RFC it basically does no mediation, it
>> > isn't intercepting MMIO - the guest sees the BARs directly. Most of
>> > the code is "profiling" from what I can tell. Some config space
>> > meddling.
>>
>> Sure, there is no mediation in that sense, but it needs quite some setup
>> regardless, no?
>>
>> I thought there is a significant amount of semantics that is different between
>> booting the PF and the VF on the host.
>
> I think it would be good to have Zhi clarify more of this, but from
> what I understand are at least three activites comingled all together:
>
> 1) Boot the PF in "vGPU" mode so it can enable SRIOV
Ok, this might be where the confusion above comes from. When I talk about
nova-core in vGPU mode I mean nova-core running in the VM on the (from VM
perspective) PF.
But you seem to mean nova-core running on the host PF with vGPU on top? That of
course has to be in nova-core.
> 2) Enable SRIOV and profile VFs to allocate HW resources to them
I think that's partially in nova-core and partially in vGPU; nova-core providing
the abstraction of the corresponding firmware / hardware interfaces and vGPU
controlling the semantics of the resource handling?
This is what I thought vGPU has a secondary part for where it binds to nova-core
through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the
VFIO parts and a "per VF resource controller".
> 3) VFIO variant driver to convert the VF into a "VM PF" with whatever
> mediation and enhancement needed
That should be vGPU only land.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 17:37 ` Danilo Krummrich
@ 2025-10-02 17:40 ` Danilo Krummrich
2025-10-02 17:49 ` John Hubbard
2025-10-02 17:56 ` Jason Gunthorpe
1 sibling, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 17:40 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 7:37 PM CEST, Danilo Krummrich wrote:
> On Thu Oct 2, 2025 at 7:05 PM CEST, Jason Gunthorpe wrote:
>> On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote:
>>> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote:
>>> > This is not what I've been told, the VF driver has significant
>>> > programming model differences in the NVIDIA model, and supports
>>> > different commands.
>>>
>>> Ok, that means there are some more fundamental differences between the host PF
>>> and the "VM PF" code that we have to deal with.
>>
>> That was my understanding.
>>
>>> But that doesn't necessarily require that the VF parts of the host have to be in
>>> nova-core as well, i.e. with the information we have we can differentiate
>>> between PF, VF and PF in the VM (indicated by a device register).
>>
>> I'm not entirely sure what you mean by this..
>>
>> The driver to operate the function in "vGPU" mode as indicated by the
>> register has to be in nova-core, since there is only one device ID.
>
> Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM
> have to be that same. But the VF driver on the host can still be a seaparate
> one.
>
>>> > If you look at the VFIO driver RFC it basically does no mediation, it
>>> > isn't intercepting MMIO - the guest sees the BARs directly. Most of
>>> > the code is "profiling" from what I can tell. Some config space
>>> > meddling.
>>>
>>> Sure, there is no mediation in that sense, but it needs quite some setup
>>> regardless, no?
>>>
>>> I thought there is a significant amount of semantics that is different between
>>> booting the PF and the VF on the host.
>>
>> I think it would be good to have Zhi clarify more of this, but from
>> what I understand are at least three activites comingled all together:
>>
>> 1) Boot the PF in "vGPU" mode so it can enable SRIOV
>
> Ok, this might be where the confusion above comes from. When I talk about
> nova-core in vGPU mode I mean nova-core running in the VM on the (from VM
> perspective) PF.
>
> But you seem to mean nova-core running on the host PF with vGPU on top? That of
> course has to be in nova-core.
>
>> 2) Enable SRIOV and profile VFs to allocate HW resources to them
>
> I think that's partially in nova-core and partially in vGPU; nova-core providing
> the abstraction of the corresponding firmware / hardware interfaces and vGPU
> controlling the semantics of the resource handling?
>
> This is what I thought vGPU has a secondary part for where it binds to nova-core
> through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the
> VFIO parts and a "per VF resource controller".
Forgot to add: But I think Zhi explained that this is not necessary and can be
controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
>> 3) VFIO variant driver to convert the VF into a "VM PF" with whatever
>> mediation and enhancement needed
>
> That should be vGPU only land.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 17:40 ` Danilo Krummrich
@ 2025-10-02 17:49 ` John Hubbard
2025-10-02 18:05 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: John Hubbard @ 2025-10-02 17:49 UTC (permalink / raw)
To: Danilo Krummrich, Jason Gunthorpe
Cc: Alexandre Courbot, Joel Fernandes, Timur Tabi, Alistair Popple,
Zhi Wang, Surath Mitra, David Airlie, Simona Vetter,
Alex Williamson, Bjorn Helgaas, Krzysztof Wilczyński,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, nouveau, linux-pci, rust-for-linux, LKML
On 10/2/25 10:40 AM, Danilo Krummrich wrote:
> On Thu Oct 2, 2025 at 7:37 PM CEST, Danilo Krummrich wrote:
>> On Thu Oct 2, 2025 at 7:05 PM CEST, Jason Gunthorpe wrote:
>>> On Thu, Oct 02, 2025 at 06:05:28PM +0200, Danilo Krummrich wrote:
>>>> On Thu Oct 2, 2025 at 5:23 PM CEST, Jason Gunthorpe wrote:
>>>>> This is not what I've been told, the VF driver has significant
>>>>> programming model differences in the NVIDIA model, and supports
>>>>> different commands.
>>>>
>>>> Ok, that means there are some more fundamental differences between the host PF
>>>> and the "VM PF" code that we have to deal with.
>>>
>>> That was my understanding.
>>>
>>>> But that doesn't necessarily require that the VF parts of the host have to be in
>>>> nova-core as well, i.e. with the information we have we can differentiate
>>>> between PF, VF and PF in the VM (indicated by a device register).
>>>
>>> I'm not entirely sure what you mean by this..
>>>
>>> The driver to operate the function in "vGPU" mode as indicated by the
>>> register has to be in nova-core, since there is only one device ID.
>>
>> Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM
>> have to be that same. But the VF driver on the host can still be a seaparate
>> one.
>>
>>>>> If you look at the VFIO driver RFC it basically does no mediation, it
>>>>> isn't intercepting MMIO - the guest sees the BARs directly. Most of
>>>>> the code is "profiling" from what I can tell. Some config space
>>>>> meddling.
>>>>
>>>> Sure, there is no mediation in that sense, but it needs quite some setup
>>>> regardless, no?
>>>>
>>>> I thought there is a significant amount of semantics that is different between
>>>> booting the PF and the VF on the host.
>>>
>>> I think it would be good to have Zhi clarify more of this, but from
>>> what I understand are at least three activites comingled all together:
>>>
>>> 1) Boot the PF in "vGPU" mode so it can enable SRIOV
>>
For this, we could pass a kernel module parameter to nova-core.
>> Ok, this might be where the confusion above comes from. When I talk about
>> nova-core in vGPU mode I mean nova-core running in the VM on the (from VM
>> perspective) PF.
>>
>> But you seem to mean nova-core running on the host PF with vGPU on top? That of
>> course has to be in nova-core.
>>
>>> 2) Enable SRIOV and profile VFs to allocate HW resources to them
>>
>> I think that's partially in nova-core and partially in vGPU; nova-core providing
>> the abstraction of the corresponding firmware / hardware interfaces and vGPU
>> controlling the semantics of the resource handling?
>>
>> This is what I thought vGPU has a secondary part for where it binds to nova-core
>> through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the
>> VFIO parts and a "per VF resource controller".
>
> Forgot to add: But I think Zhi explained that this is not necessary and can be
> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
Yes, this is the direction that I originally (3 whole days ago, haha) had in mind,
after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver
handles the VFs, and use the "is virtual" logic to sort them out.
Looking forward to Zhi's reaction to the other approach that you and Jason
have been debating. This is all very educational to me, as a VFIO newbie. :)
>
>>> 3) VFIO variant driver to convert the VF into a "VM PF" with whatever
>>> mediation and enhancement needed
>>
>> That should be vGPU only land.
>
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 17:37 ` Danilo Krummrich
2025-10-02 17:40 ` Danilo Krummrich
@ 2025-10-02 17:56 ` Jason Gunthorpe
2025-10-02 18:55 ` Danilo Krummrich
1 sibling, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 17:56 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 07:37:45PM +0200, Danilo Krummrich wrote:
> > The driver to operate the function in "vGPU" mode as indicated by the
> > register has to be in nova-core, since there is only one device ID.
>
> Yes, the PF driver on the host and the PF (from VM perspective) driver in the VM
> have to be that same. But the VF driver on the host can still be a seaparate
> one.
In most cases it is going to be bound to a vfio driver..
However, if you actually want a DRM subsystem device on the VF without
a VM I don't know why you'd use a different driver than the one used
by the VM on the very same VF, with the very same register programming
model..
> > I think it would be good to have Zhi clarify more of this, but from
> > what I understand are at least three activites comingled all together:
> >
> > 1) Boot the PF in "vGPU" mode so it can enable SRIOV
>
> Ok, this might be where the confusion above comes from. When I talk about
> nova-core in vGPU mode I mean nova-core running in the VM on the (from VM
> perspective) PF.
I would call this nova-core running on a VF (assigned to a VM)
Not sure "vgpu" is a helpful word here, lets try to talk about what
.ko's and struct device_drivers's the various codes should live in..
> But you seem to mean nova-core running on the host PF with vGPU on top? That of
> course has to be in nova-core.
Yes, #1 would be implemented as part of nova-core.ko and it's
pci_driver. As I understand it around firmware loading nova-core has
to tell the FW if it wants to enable "vGPU" mode or not.
If it doesn't then the sriov_configure op should be inhibited and #2
disabled. If it does then sriov_configure should work, #2 is enabled,
and DRM on the PF is disabled.
> > 2) Enable SRIOV and profile VFs to allocate HW resources to them
>
> I think that's partially in nova-core and partially in vGPU; nova-core providing
> the abstraction of the corresponding firmware / hardware interfaces and vGPU
> controlling the semantics of the resource handling?
> This is what I thought vGPU has a secondary part for where it binds to nova-core
> through the auxiliary bus, i.e. vGPU consisting out of two drivers actually; the
> VFIO parts and a "per VF resource controller".
This is certainly one option, you can put #2 in an aux driver of the
PF in a nova-sriov.ko module that is fully divorced from VFIO. It
might go along with a nova-fwctl.ko module too.
You could also just embed it in nova-core.ko and have it activate when
the PF is booted in "vGPU" mode.
Broadly I would suggest the latter. aux devices make most sense to
cross subsystems. Micro splitting a single driver with aux devices
will make more of a mess than required. Though a good motivating
reason would be if nova-srvio.ko is large.
> > 3) VFIO variant driver to convert the VF into a "VM PF" with whatever
> > mediation and enhancement needed
>
> That should be vGPU only land.
I think it is clear this part should be in a vfio-pci-nova.ko
Then you have two more:
4) A PCI driver in a VM that creates a DRM subsystem device
This is nova-core.ko + nova-drm.ko
5) A VF driver that creates a DRM subsystem device without a VM
Zhi says the device can't do this, but lets assume it could, then I
would expect this to be nova-core.ko + nova-drm.ko, same as #4.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 17:49 ` John Hubbard
@ 2025-10-02 18:05 ` Jason Gunthorpe
2025-10-02 18:09 ` John Hubbard
2025-10-02 18:17 ` Danilo Krummrich
0 siblings, 2 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 18:05 UTC (permalink / raw)
To: John Hubbard
Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote:
> > Forgot to add: But I think Zhi explained that this is not necessary and can be
> > controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
>
> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind,
> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver
> handles the VFs, and use the "is virtual" logic to sort them out.
To be clear, no matter what the VFIO driver bound to the VF should not
become entangled with any aux devices.
The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF
to request the PF's help. Eg for live migration or things of that
nature.
My point here is that generally we don't put profiling code in the
VFIO driver and then use pci_iov_get_pf_drvdata() to access the PF do
actually do the profiling.
The VF cannot/should not control profiling of itself - that would be a
security problem once it is assigned to a VM.
So the profiling resides entirely inside the PF world and should
operate without VFIO. As I've said this design is compatible with VFs
for containers and so on. So it is the strongly preferred design
pattern.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:05 ` Jason Gunthorpe
@ 2025-10-02 18:09 ` John Hubbard
2025-10-02 18:32 ` Jason Gunthorpe
2025-10-02 18:17 ` Danilo Krummrich
1 sibling, 1 reply; 32+ messages in thread
From: John Hubbard @ 2025-10-02 18:09 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On 10/2/25 11:05 AM, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote:
>>> Forgot to add: But I think Zhi explained that this is not necessary and can be
>>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
>>
>> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind,
>> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver
>> handles the VFs, and use the "is virtual" logic to sort them out.
>
> To be clear, no matter what the VFIO driver bound to the VF should not
> become entangled with any aux devices.
I was fine until you said "aux devices". :) What does that mean in this
context?
>
> The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF
> to request the PF's help. Eg for live migration or things of that
> nature.
>
> My point here is that generally we don't put profiling code in the
> VFIO driver and then use pci_iov_get_pf_drvdata() to access the PF do
> actually do the profiling.
>
> The VF cannot/should not control profiling of itself - that would be a
> security problem once it is assigned to a VM.
>
> So the profiling resides entirely inside the PF world and should
> operate without VFIO. As I've said this design is compatible with VFs
> for containers and so on. So it is the strongly preferred design
> pattern.
>
OK, that all makes sense.
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:05 ` Jason Gunthorpe
2025-10-02 18:09 ` John Hubbard
@ 2025-10-02 18:17 ` Danilo Krummrich
2025-10-02 18:31 ` Jason Gunthorpe
1 sibling, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 18:17 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On 10/2/25 8:05 PM, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote:
>>> Forgot to add: But I think Zhi explained that this is not necessary and can be
>>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
>>
>> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind,
>> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver
>> handles the VFs, and use the "is virtual" logic to sort them out.
>
> To be clear, no matter what the VFIO driver bound to the VF should not
> become entangled with any aux devices.
>
> The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF
> to request the PF's help. Eg for live migration or things of that
> nature.
Ick! The VF driver should never mess with the PF driver's private data.
Instead the PF driver should provide an API for the VF driver to get things done
on behalf.
It also has the implication that we need to guarantee that PF driver unbind will
also unbind all VFs, but we need this guarantee anyways. I.e. when the VFIO
driver calls into nova-core we want the guarantee that we're in a scope where
the PF driver is bound.
> My point here is that generally we don't put profiling code in the
> VFIO driver and then use pci_iov_get_pf_drvdata() to access the PF do
> actually do the profiling.
>
> The VF cannot/should not control profiling of itself - that would be a
> security problem once it is assigned to a VM.
As mentioned above, if at all I think the PF driver has to provide an API for that.
> So the profiling resides entirely inside the PF world and should
> operate without VFIO.
Perfectly fine with me, I'm open to both approaches.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:17 ` Danilo Krummrich
@ 2025-10-02 18:31 ` Jason Gunthorpe
2025-10-02 18:42 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 18:31 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 08:17:10PM +0200, Danilo Krummrich wrote:
> On 10/2/25 8:05 PM, Jason Gunthorpe wrote:
> > On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote:
> >>> Forgot to add: But I think Zhi explained that this is not necessary and can be
> >>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
> >>
> >> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind,
> >> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver
> >> handles the VFs, and use the "is virtual" logic to sort them out.
> >
> > To be clear, no matter what the VFIO driver bound to the VF should not
> > become entangled with any aux devices.
> >
> > The VFIO VF driver uses pci_iov_get_pf_drvdata() to reach into the PF
> > to request the PF's help. Eg for live migration or things of that
> > nature.
>
> Ick! The VF driver should never mess with the PF driver's private data.
> Instead the PF driver should provide an API for the VF driver to get things done
> on behalf.
This exactly how this function is used.
The core PF driver provides an API:
struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev)
Which takes in the VF as pdev and internally it invokes:
mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver);
Which asserts the PF is bound to the right driver, which asserts the
format of the drvdata is known, and now we have a proper handle to use
in the rest of API surface the VF driver will use.
By forcing the ops into the signature we strongly encourage people to
follow this design pattern and provide APIs, otherwise you have to
export the ops to call pci_iov_get_pf_drvdata()
There really isn't another good way to obtain the 'struct
mlx5_core_dev' from a simple VF pci_dev.
> It also has the implication that we need to guarantee that PF driver unbind will
> also unbind all VFs, but we need this guarantee anyways.
This is another reason why pci_iov_get_pf_drvdata() exists.
/**
* pci_iov_get_pf_drvdata - Return the drvdata of a PF
* @dev: VF pci_dev
* @pf_driver: Device driver required to own the PF
*
* This must be called from a context that ensures that a VF driver is attached.
* The value returned is invalid once the VF driver completes its remove()
* callback.
*
* Locking is achieved by the driver core. A VF driver cannot be probed until
* pci_enable_sriov() is called and pci_disable_sriov() does not return until
* all VF drivers have completed their remove().
*
* The PF driver must call pci_disable_sriov() before it begins to destroy the
* drvdata.
*/
Meaning nova-core has to guarentee to call pci_disable_sriov() before
remove completes or before a failing probe returns as part of
implementing SRIOV support.
Userspace cannot create VFs until those calls are done.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:09 ` John Hubbard
@ 2025-10-02 18:32 ` Jason Gunthorpe
0 siblings, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 18:32 UTC (permalink / raw)
To: John Hubbard
Cc: Danilo Krummrich, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 11:09:40AM -0700, John Hubbard wrote:
> On 10/2/25 11:05 AM, Jason Gunthorpe wrote:
> > On Thu, Oct 02, 2025 at 10:49:21AM -0700, John Hubbard wrote:
> >>> Forgot to add: But I think Zhi explained that this is not necessary and can be
> >>> controlled by the VFIO driver, i.e. the PCI driver that binds to the VF itself.
> >>
> >> Yes, this is the direction that I originally (3 whole days ago, haha) had in mind,
> >> after talking with Zhi and a few others: nova-core handles PFs, and the VFIO driver
> >> handles the VFs, and use the "is virtual" logic to sort them out.
> >
> > To be clear, no matter what the VFIO driver bound to the VF should not
> > become entangled with any aux devices.
>
> I was fine until you said "aux devices". :) What does that mean in this
> context?
Something using
struct auxiliary_device
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:31 ` Jason Gunthorpe
@ 2025-10-02 18:42 ` Danilo Krummrich
2025-10-02 18:56 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 18:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On 10/2/25 8:31 PM, Jason Gunthorpe wrote:
> This exactly how this function is used.
>
> The core PF driver provides an API:
>
> struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev)
>
> Which takes in the VF as pdev and internally it invokes:
>
> mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver);
Oh, I see, that makes sense then. Thanks for clarifying. I think I already had
in mind how this would look like in the Rust abstraction, and there we don't
need pci_iov_get_pf_drvdata() to achieve the same thing.
> /**
> * pci_iov_get_pf_drvdata - Return the drvdata of a PF
> * @dev: VF pci_dev
> * @pf_driver: Device driver required to own the PF
> *
> * This must be called from a context that ensures that a VF driver is attached.
> * The value returned is invalid once the VF driver completes its remove()
> * callback.
> *
> * Locking is achieved by the driver core. A VF driver cannot be probed until
> * pci_enable_sriov() is called and pci_disable_sriov() does not return until
> * all VF drivers have completed their remove().
> *
> * The PF driver must call pci_disable_sriov() before it begins to destroy the
> * drvdata.
> */
>
> Meaning nova-core has to guarentee to call pci_disable_sriov() before
> remove completes or before a failing probe returns as part of
> implementing SRIOV support.
Yes, I already thought about this. In the context of adding support for SR-IOV
in the Rust abstractions I'm planning on sending an RFC to let the subsystem
provide this guarantee instead (at least under certain conditions).
This will allow us to assert the device to be bound by the type system in the
Rust PCI abstraction, rather than having the driver to provide a guarantee to
call pci_disable_sriov() manually. :)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 17:56 ` Jason Gunthorpe
@ 2025-10-02 18:55 ` Danilo Krummrich
0 siblings, 0 replies; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 18:55 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 7:56 PM CEST, Jason Gunthorpe wrote:
> This is certainly one option, you can put #2 in an aux driver of the
> PF in a nova-sriov.ko module that is fully divorced from VFIO. It
> might go along with a nova-fwctl.ko module too.
>
> You could also just embed it in nova-core.ko and have it activate when
> the PF is booted in "vGPU" mode.
>
> Broadly I would suggest the latter. aux devices make most sense to
> cross subsystems. Micro splitting a single driver with aux devices
> will make more of a mess than required. Though a good motivating
> reason would be if nova-srvio.ko is large.
As mentioned in the other sub-thread, I'm fine with either approach, but I think
I'd also prefer folding it into nova-core.
> Then you have two more:
>
> 4) A PCI driver in a VM that creates a DRM subsystem device
>
> This is nova-core.ko + nova-drm.ko
>
> 5) A VF driver that creates a DRM subsystem device without a VM
>
> Zhi says the device can't do this, but lets assume it could, then I
> would expect this to be nova-core.ko + nova-drm.ko, same as #4.
Indeed, but there'd probably be an overlap between the logic in the VFIO driver
and the logic required in nova-core to make this use-case happen I suspect.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:42 ` Danilo Krummrich
@ 2025-10-02 18:56 ` Jason Gunthorpe
2025-10-02 19:36 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 18:56 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 08:42:58PM +0200, Danilo Krummrich wrote:
> On 10/2/25 8:31 PM, Jason Gunthorpe wrote:
> > This exactly how this function is used.
> >
> > The core PF driver provides an API:
> >
> > struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev)
> >
> > Which takes in the VF as pdev and internally it invokes:
> >
> > mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver);
>
> Oh, I see, that makes sense then. Thanks for clarifying. I think I already had
> in mind how this would look like in the Rust abstraction, and there we don't
> need pci_iov_get_pf_drvdata() to achieve the same thing.
I'm skeptical, there is nothing about rust that should avoid having to
us pci_iov_get_pf_drvdata().. It does a number of safety checks
related to the linux driver model that are not optional.
Don't forget in linux you actually can bind VFIO to the PF, start
SRIOV and then bind VFs to other drivers which then fail
pci_iov_get_pf_drvdata(). Blindly converting a struct device to an
instance memory without this check will be buggy.
> Yes, I already thought about this. In the context of adding support for SR-IOV
> in the Rust abstractions I'm planning on sending an RFC to let the subsystem
> provide this guarantee instead (at least under certain conditions).
Certain conditions may be workable, some drivers seem to have
preferences not to call disable, though I think that is wrong :\
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 18:56 ` Jason Gunthorpe
@ 2025-10-02 19:36 ` Danilo Krummrich
2025-10-02 21:04 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 19:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 8:56 PM CEST, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 08:42:58PM +0200, Danilo Krummrich wrote:
>> On 10/2/25 8:31 PM, Jason Gunthorpe wrote:
>> > This exactly how this function is used.
>> >
>> > The core PF driver provides an API:
>> >
>> > struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev)
>> >
>> > Which takes in the VF as pdev and internally it invokes:
>> >
>> > mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver);
>>
>> Oh, I see, that makes sense then. Thanks for clarifying. I think I already had
>> in mind how this would look like in the Rust abstraction, and there we don't
>> need pci_iov_get_pf_drvdata() to achieve the same thing.
>
> I'm skeptical, there is nothing about rust that should avoid having to
> us pci_iov_get_pf_drvdata().. It does a number of safety checks
> related to the linux driver model that are not optional.
The checks will be the same, but using pci_iov_get_pf_drvdata() directly is not
workable because of how the abstractions are layered.
If we want to obtain the driver's private data from a device outside the scope
of bus callbacks, we always need to ensure that the device is guaranteed to be
bound and we also need to prove the type of the private data, since a device
structure can't be generic over its bound driver.
Usually that's not an issue because other entry points into the driver, e.g.
subsystem callbacks have their own private data through the class device, IRQs
have their own private data in the IRQ registration, etc.
>> Yes, I already thought about this. In the context of adding support for SR-IOV
>> in the Rust abstractions I'm planning on sending an RFC to let the subsystem
>> provide this guarantee instead (at least under certain conditions).
>
> Certain conditions may be workable, some drivers seem to have
> preferences not to call disable, though I think that is wrong :\
I fully agree! I was told that this is because apparently some PF drivers are
only loaded to enable SR-IOV and then removed to shrink the potential attack
surface. Personally, I think that's slightly paranoid, if the driver would not
do anything else than enable / disable SR-IOV, but I think we can work around
this use-case if people really want it.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 19:36 ` Danilo Krummrich
@ 2025-10-02 21:04 ` Jason Gunthorpe
2025-10-02 21:14 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 21:04 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 09:36:17PM +0200, Danilo Krummrich wrote:
> If we want to obtain the driver's private data from a device outside the scope
> of bus callbacks, we always need to ensure that the device is guaranteed to be
> bound and we also need to prove the type of the private data, since a device
> structure can't be generic over its bound driver.
pci_iov_get_pf_drvdata() does both of these things - this is what it
is for. Please don't open code it :(
> > Certain conditions may be workable, some drivers seem to have
> > preferences not to call disable, though I think that is wrong :\
>
> I fully agree! I was told that this is because apparently some PF drivers are
> only loaded to enable SR-IOV and then removed to shrink the potential attack
> surface. Personally, I think that's slightly paranoid, if the driver would not
> do anything else than enable / disable SR-IOV, but I think we can work around
> this use-case if people really want it.
I've heard worse reasons than that. If that is the interest I'd
suggest they should just use VFIO and leave a userspace stub
process..
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 21:04 ` Jason Gunthorpe
@ 2025-10-02 21:14 ` Danilo Krummrich
2025-10-02 21:32 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 21:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On 10/2/25 11:04 PM, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 09:36:17PM +0200, Danilo Krummrich wrote:
>> If we want to obtain the driver's private data from a device outside the scope
>> of bus callbacks, we always need to ensure that the device is guaranteed to be
>> bound and we also need to prove the type of the private data, since a device
>> structure can't be generic over its bound driver.
>
> pci_iov_get_pf_drvdata() does both of these things - this is what it
> is for. Please don't open code it :(
It makes no sense to use it, both of those things will be ensured in a more
generic way in the base device implementation already (which is what I meant
with layering).
Both requirements are not specific to PCI, or the specific VF -> PF use-case.
In order to guarantee soundness both of those things have to be guaranteed for
any access to the driver's private data.
I will send some patches soon, I think this will make it obvious. :)
>>> Certain conditions may be workable, some drivers seem to have
>>> preferences not to call disable, though I think that is wrong :\
>>
>> I fully agree! I was told that this is because apparently some PF drivers are
>> only loaded to enable SR-IOV and then removed to shrink the potential attack
>> surface. Personally, I think that's slightly paranoid, if the driver would not
>> do anything else than enable / disable SR-IOV, but I think we can work around
>> this use-case if people really want it.
>
> I've heard worse reasons than that. If that is the interest I'd
> suggest they should just use VFIO and leave a userspace stub
> process..
I'm not sure I follow your proposal, can you elaborate?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 21:14 ` Danilo Krummrich
@ 2025-10-02 21:32 ` Danilo Krummrich
2025-10-02 23:40 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-02 21:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu Oct 2, 2025 at 11:14 PM CEST, Danilo Krummrich wrote:
> On 10/2/25 11:04 PM, Jason Gunthorpe wrote:
>> On Thu, Oct 02, 2025 at 09:36:17PM +0200, Danilo Krummrich wrote:
>>> If we want to obtain the driver's private data from a device outside the scope
>>> of bus callbacks, we always need to ensure that the device is guaranteed to be
>>> bound and we also need to prove the type of the private data, since a device
>>> structure can't be generic over its bound driver.
>>
>> pci_iov_get_pf_drvdata() does both of these things - this is what it
>> is for. Please don't open code it :(
>
> It makes no sense to use it, both of those things will be ensured in a more
> generic way in the base device implementation already (which is what I meant
> with layering).
>
> Both requirements are not specific to PCI, or the specific VF -> PF use-case.
>
> In order to guarantee soundness both of those things have to be guaranteed for
> any access to the driver's private data.
Actually, let me elaborate a bit more on this:
In C when a driver calls dev_get_drvdata() it asserts two things:
(1) The device is bound to the driver calling this function.
(2) It casts the returned void * to the correct type.
Obviously, relying on this in Rust would be fundamentally unsafe.
Hence, the idea is to implement Device::drvdata_borrow::<T>(), which takes a
&Device<Bound> as argument, which proves that the device must be bound.
The T must be the driver specific driver type, i.e. the type that implements
e.g. the pci::Driver trait.
With that, Device::drvdata_borrow() can internally check whether the asserted
type T matches the unique type ID that we store in the device in probe().
This way we can prove that the device is actually bound and that we cast to the
correct type.
Furthermore, the returned reference to the driver's private data can't out-live
the lifetime of the given &Device<Bound>, so we're also guaranteed that the
device won't be unbound while we still have a reference to the driver's private
data.
So, when we call pdev.physfn().drvdata_borrow::<NovaCore>() the checks are
included already.
> I will send some patches soon, I think this will make it obvious. :)
>>>> Certain conditions may be workable, some drivers seem to have
>>>> preferences not to call disable, though I think that is wrong :\
>>>
>>> I fully agree! I was told that this is because apparently some PF drivers are
>>> only loaded to enable SR-IOV and then removed to shrink the potential attack
>>> surface. Personally, I think that's slightly paranoid, if the driver would not
>>> do anything else than enable / disable SR-IOV, but I think we can work around
>>> this use-case if people really want it.
>>
>> I've heard worse reasons than that. If that is the interest I'd
>> suggest they should just use VFIO and leave a userspace stub
>> process..
>
> I'm not sure I follow your proposal, can you elaborate?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 21:32 ` Danilo Krummrich
@ 2025-10-02 23:40 ` Jason Gunthorpe
2025-10-03 0:57 ` Danilo Krummrich
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-02 23:40 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Thu, Oct 02, 2025 at 11:32:44PM +0200, Danilo Krummrich wrote:
> So, when we call pdev.physfn().drvdata_borrow::<NovaCore>() the checks are
> included already.
I'm not keen on hiding this reasoning inside an physfn() accessor like
this. ie one that returns a Device<Bound>. The reasoning for this is
tricky and special. We have enough cases where physfn won't be a bound
driver. I think it is big stretch just to declare that unconditionally
safe.
There is a reason pci_iov_get_pf_drvdata() has such a big comment..
So I'd rather see you follow the C design and have an explicit helper
function to convert a VF bound device to a PF bound device and check
the owner, basically split up pci_iov_get_pf_drvdata() into a part to
get the struct device and an inline to get the drvdata. Rust still has an
ops pointer it can pass in so it can be consistent with the C code
even if it does another check inside its drvdata_borrow.
This way we keep the reasoning and explanation in one place.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-02 23:40 ` Jason Gunthorpe
@ 2025-10-03 0:57 ` Danilo Krummrich
2025-10-03 11:52 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Danilo Krummrich @ 2025-10-03 0:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Fri Oct 3, 2025 at 1:40 AM CEST, Jason Gunthorpe wrote:
> On Thu, Oct 02, 2025 at 11:32:44PM +0200, Danilo Krummrich wrote:
>
>> So, when we call pdev.physfn().drvdata_borrow::<NovaCore>() the checks are
>> included already.
>
> I'm not keen on hiding this reasoning inside an physfn() accessor like
> this. ie one that returns a Device<Bound>. The reasoning for this is
> tricky and special. We have enough cases where physfn won't be a bound
> driver. I think it is big stretch just to declare that unconditionally
> safe.
In this example physfn() would return a &pci::Device<Bound>.
This is what I mentioned previously; I want the subsystem to guarantee (at least
under certain circumstances) that if the VF device is bound that the PF device
must be bound as well.
> There is a reason pci_iov_get_pf_drvdata() has such a big comment..
>
> So I'd rather see you follow the C design and have an explicit helper
> function to convert a VF bound device to a PF bound device
Well, this is exactly what physfn() does (with the precondition that we can get
the guarantee from the subsystem).
> and check
> the owner, basically split up pci_iov_get_pf_drvdata() into a part to
> get the struct device and an inline to get the drvdata. Rust still has an
> ops pointer it can pass in so it can be consistent with the C code
Which ops pointer are you referring to? Do you mean the struct pci_driver
pointer? If so, no we can't access this one. We could make it accessible, but it
would result into horrible code, wouldn't make it possible to implement the
check generically for any device (which we need anyways) and would have some
other bad implications.
I try to be as consistent as possible with C code, but sometimes it just doesn't
fit at all and would even hurt. This is one of those cases.
To give at least some background: A driver structure (like struct pci_driver) is
embedded in a module structure, which is a global static and intentionally not
directly accessible for drivers.
Even if we'd make it accessible, the driver field within a module structure
depends on the exact implementation, i.e. it depends on whether a module is
declared "by hand", or whether it is generated by a module_driver!() macro (e.g.
module_pci_driver!().
It probably also helps to have a look at samples/rust/rust_driver_auxiliary.rs,
which open codes driver registrations, since it registers two drivers in the
same module for the purpose of illustrating an auxiliary connection, i.e. it
doesn't use a module_driver!() macro.
The struct struct auxiliary_driver resides within the
driver::Registration<auxiliary::Adapter<T>>, where driver::Registration is a
generic type for registering drivers, auxiliary::Adapter defines the auxiliary
specific bits for this registration and T is the driver specific type that
implements the auxiliary::Driver trait, containing the bus callbacks etc.
> even if it does another check inside its drvdata_borrow.
It's the exact same check; just a better fitting implementation.
I.e. instead of checking a pointer which is hard to access, we check if the type
ID we gave to the device matches the driver specific type (the T in
driver::Registration<auxiliary::Adapter<T>>).
Semantically, there is no difference, but it results in less cumbersome and more
flexible code.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs
2025-10-03 0:57 ` Danilo Krummrich
@ 2025-10-03 11:52 ` Jason Gunthorpe
0 siblings, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-10-03 11:52 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Alexandre Courbot, Joel Fernandes, Timur Tabi,
Alistair Popple, Zhi Wang, Surath Mitra, David Airlie,
Simona Vetter, Alex Williamson, Bjorn Helgaas,
Krzysztof Wilczyński, Miguel Ojeda, Alex Gaynor, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Alice Ryhl, Trevor Gross, nouveau, linux-pci, rust-for-linux,
LKML
On Fri, Oct 03, 2025 at 02:57:53AM +0200, Danilo Krummrich wrote:
> Which ops pointer are you referring to? Do you mean the struct pci_driver
> pointer? If so, no we can't access this one. We could make it accessible, but it
> would result into horrible code, wouldn't make it possible to implement the
> check generically for any device (which we need anyways) and would have some
> other bad implications.
Yes pci_driver. You must confirm the attached driver is following the
right SRIOV protocol otherwise you can't claim the result is
bound. This is where we are with PCI today at least.
It sounds like you plan to come with patches changing how SRIOV
enablement works in PCI susbsytem, if that also comes with a way to
detect that the driver is following the rules without using pci_driver
then great.
But that would make me feel more strongly that this should be a core
helper and the existing users should be converted to the new mechanism
so we don't have two approaches here.
> Even if we'd make it accessible, the driver field within a module structure
> depends on the exact implementation, i.e. it depends on whether a module is
> declared "by hand", or whether it is generated by a module_driver!() macro (e.g.
> module_pci_driver!().
It is kind of a bad place to end up, drivers do occasionally use their
driver pointers for little things, like this for example. It is not
common but it is a technique..
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-10-03 11:52 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 2:00 [PATCH v2 0/2] rust: pci: don't probe() VFs in nova-core John Hubbard
2025-10-02 2:00 ` [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs John Hubbard
2025-10-02 11:49 ` Danilo Krummrich
2025-10-02 12:11 ` Jason Gunthorpe
2025-10-02 12:18 ` Danilo Krummrich
2025-10-02 12:39 ` Jason Gunthorpe
2025-10-02 13:03 ` Danilo Krummrich
2025-10-02 13:56 ` Jason Gunthorpe
2025-10-02 15:11 ` Danilo Krummrich
2025-10-02 15:23 ` Jason Gunthorpe
2025-10-02 16:05 ` Danilo Krummrich
2025-10-02 17:05 ` Jason Gunthorpe
2025-10-02 17:37 ` Danilo Krummrich
2025-10-02 17:40 ` Danilo Krummrich
2025-10-02 17:49 ` John Hubbard
2025-10-02 18:05 ` Jason Gunthorpe
2025-10-02 18:09 ` John Hubbard
2025-10-02 18:32 ` Jason Gunthorpe
2025-10-02 18:17 ` Danilo Krummrich
2025-10-02 18:31 ` Jason Gunthorpe
2025-10-02 18:42 ` Danilo Krummrich
2025-10-02 18:56 ` Jason Gunthorpe
2025-10-02 19:36 ` Danilo Krummrich
2025-10-02 21:04 ` Jason Gunthorpe
2025-10-02 21:14 ` Danilo Krummrich
2025-10-02 21:32 ` Danilo Krummrich
2025-10-02 23:40 ` Jason Gunthorpe
2025-10-03 0:57 ` Danilo Krummrich
2025-10-03 11:52 ` Jason Gunthorpe
2025-10-02 17:56 ` Jason Gunthorpe
2025-10-02 18:55 ` Danilo Krummrich
2025-10-02 2:00 ` [PATCH v2 2/2] gpu: nova-core: declare that VFs are not (yet) supported John Hubbard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).