From: Jason Gunthorpe <jgg@nvidia.com>
To: Danilo Krummrich <dakr@kernel.org>
Cc: Zhi Wang <zhiw@nvidia.com>,
kvm@vger.kernel.org, nouveau@lists.freedesktop.org,
alex.williamson@redhat.com, kevin.tian@intel.com,
airlied@gmail.com, daniel@ffwll.ch, acurrid@nvidia.com,
cjia@nvidia.com, smitra@nvidia.com, ankita@nvidia.com,
aniketa@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com,
zhiwang@kernel.org
Subject: Re: [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
Date: Tue, 24 Sep 2024 13:41:51 -0300 [thread overview]
Message-ID: <20240924164151.GJ9417@nvidia.com> (raw)
In-Reply-To: <ZvHwzzp2F71W8TAs@pollux.localdomain>
On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote:
> > From the VFIO side I would like to see something like this merged in
> > nearish future as it would bring a previously out of tree approach to
> > be fully intree using our modern infrastructure. This is a big win for
> > the VFIO world.
> >
> > As a commercial product this will be backported extensively to many
> > old kernels and that is harder/impossible if it isn't exclusively in
> > C. So, I think nova needs to co-exist in some way.
>
> We'll surely not support two drivers for the same thing in the long term,
> neither does it make sense, nor is it sustainable.
What is being done here is the normal correct kernel thing to
do. Refactor the shared core code into a module and stick higher level
stuff on top of it. Ideally Nova/Nouveau would exist as peers
implementing DRM subsystem on this shared core infrastructure. We've
done this sort of thing before in other places in the kernel. It has
been proven to work well.
So, I'm not sure why you think there should be two drivers in the long
term? Do you have some technical reason why Nova can't fit into this
modular architecture?
Regardless, assuming Nova will eventually propose merging duplicated
bootup code then I suggest it should be able to fully replace the C
code with a kconfig switch and provide C compatible interfaces for
VFIO. When Rust is sufficiently mature we can consider a deprecation
schedule for the C version.
I agree duplication doesn't make sense, but if it is essential to make
everyone happy then we should do it to accommodate the ongoing Rust
experiment.
> We have a lot of good reasons why we decided to move forward with Nova as a
> successor of Nouveau for GSP-based GPUs in the long term -- I also just held a
> talk about this at LPC.
I know, but this series is adding a VFIO driver to the kernel, and a
complete Nova driver doesn't even exist yet. It is fine to think about
future plans, but let's not get too far ahead of ourselves here..
> For the short/mid term I think it may be reasonable to start with
> Nouveau, but this must be based on some agreements, for instance:
>
> - take responsibility, e.g. commitment to help with maintainance with some of
> NVKM / NVIDIA GPU core (or whatever we want to call it) within Nouveau
I fully expect NVIDIA teams to own this core driver and VFIO parts. I
see there are no changes to the MAINTAINERs file in this RFC, that
will need to be corrected.
> - commitment to help with Nova in general and, once applicable, move the vGPU
> parts over to Nova
I think you will get help with Nova based on its own merit, but I
don't like where you are going with this. Linus has had negative
things to say about this sort of cross-linking and I agree with
him. We should not be trying to extract unrelated promises on Nova as
a condition for progressing a VFIO series. :\
> But I think the very last one naturally happens if we stop further support for
> new HW in Nouveau at some point.
I expect the core code would continue to support new HW going forward
to support the VFIO driver, even if nouveau doesn't use it, until Rust
reaches some full ecosystem readyness for the server space.
There are going to be a lot of users of this code, let's not rush to
harm them please.
Fortunately there is no use case for DRM and VFIO to coexist in a
hypervisor, so this does not turn into a such a technical problem like
most other dual-driver situations.
Jason
next prev parent reply other threads:[~2024-09-24 16:41 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-22 12:49 [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support Zhi Wang
2024-09-22 12:49 ` [RFC 01/29] nvkm/vgpu: introduce NVIDIA vGPU support prelude Zhi Wang
2024-09-26 9:20 ` Greg KH
2024-10-14 9:59 ` Zhi Wang
2024-10-14 11:36 ` Greg KH
2024-09-22 12:49 ` [RFC 02/29] nvkm/vgpu: attach to nvkm as a nvkm client Zhi Wang
2024-09-26 9:21 ` Greg KH
2024-10-14 10:16 ` Zhi Wang
2024-10-14 11:33 ` Greg KH
2024-09-22 12:49 ` [RFC 03/29] nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled Zhi Wang
2024-09-22 12:49 ` [RFC 04/29] nvkm/vgpu: set the VF partition count " Zhi Wang
2024-09-26 22:51 ` Jason Gunthorpe
2024-10-13 18:54 ` Zhi Wang
2024-10-15 12:20 ` Jason Gunthorpe
2024-10-15 15:19 ` Zhi Wang
2024-10-15 16:35 ` Jason Gunthorpe
2024-09-22 12:49 ` [RFC 05/29] nvkm/vgpu: populate GSP_VF_INFO " Zhi Wang
2024-09-26 22:52 ` Jason Gunthorpe
2024-09-22 12:49 ` [RFC 06/29] nvkm/vgpu: set RMSetSriovMode " Zhi Wang
2024-09-26 22:53 ` Jason Gunthorpe
2024-10-14 7:38 ` Zhi Wang
2024-10-15 3:49 ` Christoph Hellwig
2024-10-15 12:23 ` Jason Gunthorpe
2024-09-22 12:49 ` [RFC 07/29] nvkm/gsp: add a notify handler for GSP event GPUACCT_PERFMON_UTIL_SAMPLES Zhi Wang
2024-09-22 12:49 ` [RFC 08/29] nvkm/vgpu: get the size VMMU segment from GSP firmware Zhi Wang
2024-09-22 12:49 ` [RFC 09/29] nvkm/vgpu: introduce the reserved channel allocator Zhi Wang
2024-09-22 12:49 ` [RFC 10/29] nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module Zhi Wang
2024-09-22 12:49 ` [RFC 11/29] nvkm/vgpu: introduce GSP RM client alloc and free for vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 12/29] nvkm/vgpu: introduce GSP RM control interface " Zhi Wang
2024-09-22 12:49 ` [RFC 13/29] nvkm: move chid.h to nvkm/engine Zhi Wang
2024-09-22 12:49 ` [RFC 14/29] nvkm/vgpu: introduce channel allocation for vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 15/29] nvkm/vgpu: introduce FB memory " Zhi Wang
2024-09-22 12:49 ` [RFC 16/29] nvkm/vgpu: introduce BAR1 map routines for vGPUs Zhi Wang
2024-09-22 12:49 ` [RFC 17/29] nvkm/vgpu: introduce engine bitmap for vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 18/29] nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm Zhi Wang
2024-09-26 22:56 ` Jason Gunthorpe
2024-10-14 8:32 ` Zhi Wang
2024-10-15 12:27 ` Jason Gunthorpe
2024-10-15 15:14 ` Zhi Wang
2024-10-14 8:36 ` Zhi Wang
2024-09-22 12:49 ` [RFC 19/29] vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude Zhi Wang
2024-09-22 12:49 ` [RFC 20/29] vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager Zhi Wang
2024-09-22 12:49 ` [RFC 21/29] vfio/vgpu_mgr: introduce vGPU type uploading Zhi Wang
2024-09-22 12:49 ` [RFC 22/29] vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs Zhi Wang
2024-09-22 12:49 ` [RFC 23/29] vfio/vgpu_mgr: allocate vGPU channels " Zhi Wang
2024-09-22 12:49 ` [RFC 24/29] vfio/vgpu_mgr: allocate mgmt heap " Zhi Wang
2024-09-22 12:49 ` [RFC 25/29] vfio/vgpu_mgr: map mgmt heap when creating a vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 26/29] vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs Zhi Wang
2024-09-22 12:49 ` [RFC 27/29] vfio/vgpu_mgr: bootload the new vGPU Zhi Wang
2024-09-25 0:31 ` Dave Airlie
2024-09-22 12:49 ` [RFC 28/29] vfio/vgpu_mgr: introduce vGPU host RPC channel Zhi Wang
2024-09-22 12:49 ` [RFC 29/29] vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver Zhi Wang
2024-09-22 13:11 ` [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support Zhi Wang
2024-09-23 8:38 ` Danilo Krummrich
2024-09-24 19:49 ` Zhi Wang
2024-09-23 6:22 ` Tian, Kevin
2024-09-23 15:02 ` Jason Gunthorpe
2024-09-26 6:43 ` Tian, Kevin
2024-09-26 12:55 ` Jason Gunthorpe
2024-09-26 22:57 ` Jason Gunthorpe
2024-09-27 0:13 ` Tian, Kevin
2024-09-23 8:49 ` Danilo Krummrich
2024-09-23 15:01 ` Jason Gunthorpe
2024-09-23 22:50 ` Danilo Krummrich
2024-09-24 16:41 ` Jason Gunthorpe [this message]
2024-09-24 19:56 ` Danilo Krummrich
2024-09-24 22:52 ` Dave Airlie
2024-09-24 23:47 ` Jason Gunthorpe
2024-09-25 0:18 ` Dave Airlie
2024-09-25 1:29 ` Jason Gunthorpe
2024-09-25 0:53 ` Jason Gunthorpe
2024-09-25 1:08 ` Dave Airlie
2024-09-25 15:28 ` Jason Gunthorpe
2024-09-25 10:55 ` Danilo Krummrich
2024-09-26 9:14 ` Greg KH
2024-09-26 12:42 ` Jason Gunthorpe
2024-09-26 12:54 ` Greg KH
2024-09-26 13:07 ` Danilo Krummrich
2024-09-26 14:40 ` Jason Gunthorpe
2024-09-26 18:07 ` Andy Ritger
2024-09-26 22:23 ` Danilo Krummrich
2024-09-26 22:42 ` Danilo Krummrich
2024-09-27 12:51 ` Jason Gunthorpe
2024-09-27 14:22 ` Danilo Krummrich
2024-09-27 15:27 ` Jason Gunthorpe
2024-09-30 15:59 ` Danilo Krummrich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240924164151.GJ9417@nvidia.com \
--to=jgg@nvidia.com \
--cc=acurrid@nvidia.com \
--cc=airlied@gmail.com \
--cc=alex.williamson@redhat.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=dakr@kernel.org \
--cc=daniel@ffwll.ch \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=nouveau@lists.freedesktop.org \
--cc=smitra@nvidia.com \
--cc=targupta@nvidia.com \
--cc=zhiw@nvidia.com \
--cc=zhiwang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox