rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Danilo Krummrich <dakr@kernel.org>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Peter Colberg" <pcolberg@redhat.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Miguel Ojeda" <ojeda@kernel.org>,
	"Alex Gaynor" <alex.gaynor@gmail.com>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Abdiel Janulgue" <abdiel.janulgue@gmail.com>,
	"Daniel Almeida" <daniel.almeida@collabora.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Dave Ertman" <david.m.ertman@intel.com>,
	"Ira Weiny" <ira.weiny@intel.com>,
	linux-pci@vger.kernel.org, rust-for-linux@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Zhi Wang" <zhiw@nvidia.com>
Subject: Re: [PATCH 7/8] rust: pci: add physfn(), to return PF device for VF device
Date: Sun, 23 Nov 2025 13:18:23 +0200	[thread overview]
Message-ID: <20251123111823.GD16619@unreal> (raw)
In-Reply-To: <DEFZP148A2NK.29G6N1V1SXT8T@kernel.org>

On Sun, Nov 23, 2025 at 11:07:47PM +1300, Danilo Krummrich wrote:
> On Sun Nov 23, 2025 at 7:34 PM NZDT, Leon Romanovsky wrote:
> > On Sun, Nov 23, 2025 at 11:26:52AM +1300, Danilo Krummrich wrote:
> >> On Sun Nov 23, 2025 at 7:57 AM NZDT, Leon Romanovsky wrote:
> >> > On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote:
> >> >> On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote:
> >> >> > So far I haven't heard a convincing reason for not providing this guarantee. The
> >> >> > only reason not to guarantee this I have heard is that some PF drivers only
> >> >> > enable SR-IOV and hence could be unloaded afterwards. However, I think there is
> >> >> > no strong reason to do so.
> >> >> 
> >> >> I know some people have work flows around this. I think they are
> >> >> wrong. When we "fixed" mlx5 a while back there was some pushback and
> >> >> some weird things did stop working.
> >> >
> >> > I don't think that this is correct. The main use case for keeping VFs,
> >> > while unloading PF driver is to allow reuse this PF for VFIO too. It is
> >> > very natural and common flow for "real" SR-IOV devices which present all
> >> > PF/VFs as truly separate devices.
> >> 
> >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for
> >> VFIO"? What do you do with the PF after driver unload instead? Load another
> >> driver? If so, why separate ones?
> >
> > One of the main use cases for SR-IOV is to provide to VM users/customers
> > devices with performance and security promises as physical ones. In this
> > case, the VMs are created through PF and not bound to any driver. Once
> > customer/user requests VM, that VF is bound to vfio-pci driver and
> > attached to that VM.
> >
> > In many cases, PF is unbound too from its original driver and attached
> > to some other VM. It allows for these VM providers to maximize
> > utilization of their SR-IOV devices.
> >
> > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language).
> > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability
> > and is accessible to an SR-PCIM, a VI, or an SI."
> 
> Hm, that's possible, but do we have cases of this in practice where we bind and
> unbind the same PF multiple times, pass it to different VMs, etc.?

It is very common case, when the goal is to maximize hardware utilization.

> 
> In any case, what we can always do is what I propose in [1], i.e. add a bool to
> struct pci_driver that indicates whether the VFs should be unbound when the PF
> is unbound, such that we can uphold the guarantee of VF bound implies PF bound
> at least conditionally.
> 
> [1] https://lore.kernel.org/all/DEFL4TG0WX1C.2GLH4417EPU3V@kernel.org/
> 
> >> >> So while I support this goal, I don't know if enough people will
> >> >> agree..
> >> >
> >> > I don't see that Bjorn is CCed here, so it is unclear to whom Danilo
> >> > talked to get rationale for this new behaviour.
> >> 
> >> Not sure what you mean, Bjorn is CC'd on the whole series and hence also this
> >> conversation.
> >> 
> >> Otherwise, my proposal to guarantee that the PF is bound as long as the VF(s)
> >> are bound stems from the corresponding beneficial lifecycle implications for the
> >> driver model (some more on this below).
> >> 
> >> >> > With this, the above code will be correct and a driver can use the generic
> >> >> > infrastructure to:
> >> >> > 
> >> >> >   1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>>
> >> >> >   2) Grab the driver's device private data from the returned Device<Bound>
> >> >> > 
> >> >> > Note that 2) (i.e. accessing the driver's device private data with
> >> >> > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is
> >> >> > only implemented for Device<Bound>) and that the returned type actually matches
> >> >> > the type of the object that has been stored.
> >> >> 
> >> >> This is what the core helper does, with the twist that it "validates"
> >> >> the PF driver is working right by checking its driver binding..
> >> >> 
> >> >> > I suggest to have a look at [2] for an example with how this turns out with the
> >> >> > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary
> >> >> > driver calling into its parent, except that the auxiliary bus already guarantees
> >> >> > that the parent is bound when the child is bound.
> >> >> 
> >> >> Aux bus is a little different because it should be used in a way where
> >> >> there are stronger guarantees about what the parent is. Ie the aux bus
> >> >> names should be unique and limited to the type of parent.
> >> >
> >> > Right, aux bus is completely unrelated to real physical PCI bus.
> >> 
> >> Of course the auxiliary and the PCI bus are fundamentally different busses,
> >> *but* they also have commonalities: the driver lifecycle requirements drivers
> >> have to deal with are the same.
> >> 
> >> For instance, if you call from an auxiliary driver into the parent driver to let
> >> the parent driver do some work on behalf, the auxiliary driver has to guarantee
> >> that the parent device is guranteed to remain bound for the duration of the
> >> call, otherwise the parent driver can't call dev_get_drvdata() without risking a
> >> UAF, since a concurrent unbind might free the parent driver's device private
> >> data.
> >> 
> >> The same is true for PCI when a VF driver calls into the PF driver to do some
> >> work on behalf. The VF driver has to make sure that the PF driver remains bound
> >> for the whole duration of the call for the exact same reasons.
> >
> > And this section is not entirely correct. It is one of the possible
> > usages of PF. In many devices, PF and its VFs are independent and don't
> > communicate through kernel at all. There is no need for VF to have PF be
> > bounded to its original driver.
> 
> Of course -- similarly there might be cases where an auxiliary driver might not
> cummunicate with its parent, but that's not the case I'm referring to above.

Not really. Auxbus is a logical in-kernel separation of one physical device.
The promise is that parent auxdevice is tied to its children. It was done
intentionally to make sure that children don't outlive their parent.

It is not the case for SR-IOV.

> 
> >
> >> 
> >> I.e. it is a common driver lifecycle pattern for interactions between drivers in
> >> general.
> >
> > And it is not true as well. In case you need to interact between
> > devices/drivers, it is up to the caller to ensure that this driver isn't
> > unloaded.
> 
> You're mixing two things here. The driver model lifecycle requires that if
> driver A calls into driver B - where B accesses its device private data - that B
> is bound for the full duration of the call.

I'm aware of this, and we are not talking about driver model. Whole
discussion is "if PF can be unbound, while VFs exist". The answer is yes, it can,
both from PCI spec perspective and from operational one.

> 
> This is true for any such interaction, the question of who is responsible to
> uphold this guarantee, e.g. the bus layer or the driver itself is orthogonal.
> 
> Whenever possible, it's best to let the bus layer uphold the guarantee though,
> since it's one pitfall less a driver can fall into.
> 
> >> 
> >> As for Rust specifically, we can actually model such driver lifecycle patterns
> >> with the type system and with that do a lot of the driver lifecycle checks (that
> >> drivers love to ignore :) even at compile time, such that your code doesn't even
> >> compile when you violate the driver lifecycle rules.
> >> 
> >> For the auxiliary bus that's already the case and I'd love to get to this point
> >> with PF <-> VF interactions (in Rust) as well.
> >
> > It is absolutely correct thing to do for auxbus. It is very questionable for SR-IOV.
> 
> At least conditionally (as proposed above), it's an improvement for cases where
> there is PF <-> VF interactions, i.e. why let drivers take care if the bus can
> already do it for them.

I'm not against some opt-in/opt-out solution. I'm against "one size fits
all" proposal, where PF is tied to its original driver as long as VFs
exist.

Thanks

  reply	other threads:[~2025-11-23 11:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 22:19 [PATCH 0/8] rust: pci: add abstractions for SR-IOV capability Peter Colberg
2025-11-19 22:19 ` [PATCH 1/8] rust: pci: add is_virtfn(), to check for VFs Peter Colberg
2025-11-21  3:00   ` kernel test robot
2025-11-21 18:27     ` Peter Colberg
2025-11-19 22:19 ` [PATCH 2/8] rust: pci: add is_physfn(), to check for PFs Peter Colberg
2025-11-21  4:35   ` kernel test robot
2025-11-19 22:19 ` [PATCH 3/8] rust: pci: add {enable,disable}_sriov(), to control SR-IOV capability Peter Colberg
2025-11-21 23:28   ` Jason Gunthorpe
2025-11-19 22:19 ` [PATCH 4/8] rust: pci: add num_vf(), to return number of VFs Peter Colberg
2025-11-19 22:19 ` [PATCH 5/8] rust: pci: add vtable attribute to pci::Driver trait Peter Colberg
2025-11-19 22:19 ` [PATCH 6/8] rust: pci: add bus callback sriov_configure(), to control SR-IOV from sysfs Peter Colberg
2025-11-21  6:00   ` kernel test robot
2025-11-19 22:19 ` [PATCH 7/8] rust: pci: add physfn(), to return PF device for VF device Peter Colberg
2025-11-21  7:57   ` kernel test robot
2025-11-21 23:26   ` Jason Gunthorpe
2025-11-22 10:23     ` Danilo Krummrich
2025-11-22 16:16       ` Jason Gunthorpe
2025-11-22 18:57         ` Leon Romanovsky
2025-11-22 22:26           ` Danilo Krummrich
2025-11-23  6:34             ` Leon Romanovsky
2025-11-23 10:07               ` Danilo Krummrich
2025-11-23 11:18                 ` Leon Romanovsky [this message]
2025-11-22 22:43         ` Danilo Krummrich
2025-11-19 22:19 ` [PATCH 8/8] samples: rust: add SR-IOV driver sample Peter Colberg
2025-11-20  6:41   ` Zhi Wang
2025-11-20 15:49     ` Peter Colberg
2025-11-20  6:32 ` [PATCH 0/8] rust: pci: add abstractions for SR-IOV capability Zhi Wang
2025-11-20 15:03   ` Peter Colberg
2025-11-20 18:34     ` Zhi Wang
2025-11-20 21:16       ` Zhi Wang
2025-11-21 17:05       ` Peter Colberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251123111823.GD16619@unreal \
    --to=leon@kernel.org \
    --cc=a.hindborg@kernel.org \
    --cc=abdiel.janulgue@gmail.com \
    --cc=acourbot@nvidia.com \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=daniel.almeida@collabora.com \
    --cc=david.m.ertman@intel.com \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=ira.weiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=kwilczynski@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lossin@kernel.org \
    --cc=ojeda@kernel.org \
    --cc=pcolberg@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).