Re: [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Dave Airlie <airlied@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>, Zhi Wang <zhiw@nvidia.com>,
	kvm@vger.kernel.org, nouveau@lists.freedesktop.org,
	alex.williamson@redhat.com, kevin.tian@intel.com,
	daniel@ffwll.ch, acurrid@nvidia.com, cjia@nvidia.com,
	smitra@nvidia.com, ankita@nvidia.com, aniketa@nvidia.com,
	kwankhede@nvidia.com, targupta@nvidia.com, zhiwang@kernel.org
Subject: Re: [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support
Date: Wed, 25 Sep 2024 12:28:00 -0300	[thread overview]
Message-ID: <20240925152800.GT9417@nvidia.com> (raw)
In-Reply-To: <CAPM=9txix6tO7B+kRtsNXSVPfLGU4vbfga=pt9yqmszecuEbyw@mail.gmail.com>

On Wed, Sep 25, 2024 at 11:08:40AM +1000, Dave Airlie wrote:
> On Wed, 25 Sept 2024 at 10:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Sep 24, 2024 at 09:56:58PM +0200, Danilo Krummrich wrote:
> >
> > > Currently - and please correct me if I'm wrong - you make it sound to me as if
> > > you're not willing to respect the decisions that have been taken by Nouveau and
> > > DRM maintainers.
> >
> > I've never said anything about your work, go do Nova, have fun.
> >
> > I'm just not agreeing to being forced into taking Rust dependencies in
> > VFIO because Nova is participating in the Rust Experiment.
> >
> > I think the reasonable answer is to accept some code duplication, or
> > try to consolidate around a small C core. I understad this is
> > different than you may have planned so far for Nova, but all projects
> > are subject to community feedback, especially when faced with new
> > requirements.
> >
> > I think this discussion is getting a little overheated, there is lots
> > of space here for everyone to do their things. Let's not get too
> > excited.
> 
> How do you intend to solve the stable ABI problem caused by the GSP firmware?
> 
> If you haven't got an answer to that, that is reasonable, you can talk
> about VFIO and DRM and who is in charge all you like, but it doesn't
> matter.

I suggest the same answer everyone else building HW in the kernel
operates under. You get to update your driver with your new HW once
per generation.

Not once per FW release, once per generation. That is a similar level
of burden to maintain as most drivers. It is not as good as the
excellence Mellanox does (no SW change for a new HW generation), but
it is still good.

I would apply this logic to Nova as well, no reason to be supporting
random ABI changes coming out every month(s).

> Fundamentally the problem is the unstable API exposure isn't something
> you can build a castle on top of, the nova idea is to use rust to
> solve a fundamental problem with the NVIDIA driver design process
> forces on us (vfio included), 

I firmly believe you can't solve a stable ABI problem with language
features in an OS. The ABI is totally unstable, it will change
semantically, the order and nature of functions you need will
change. New HW will need new behaviors and semantics.

Language support can certainly handle the mindless churn that ideally
shouldn't even be happening in the first place.

The way you solve this is at the root, in the FW. Don't churn
everything. I'm a big believer and supporter of the Mellanox
super-stable approach that has really proven how valuable this concept
is to everyone.

So I agree with you, the extreme unstableness is not OK in upstream,
it needs to slow down a lot to be acceptable. I don't necessarily
agree to Mellanox like gold standard as the bar, but certainly must be
way better than it is now.

FWIW when I discussed the VFIO patches I was given some impression
there would not be high levels of ABI churn on the VFIO side, and that
there was awareness and understanding of this issue on Zhi's side.

Jason

next prev parent reply	other threads:[~2024-09-25 15:28 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-22 12:49 [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support Zhi Wang
2024-09-22 12:49 ` [RFC 01/29] nvkm/vgpu: introduce NVIDIA vGPU support prelude Zhi Wang
2024-09-26  9:20   ` Greg KH
2024-10-14  9:59     ` Zhi Wang
2024-10-14 11:36       ` Greg KH
2024-09-22 12:49 ` [RFC 02/29] nvkm/vgpu: attach to nvkm as a nvkm client Zhi Wang
2024-09-26  9:21   ` Greg KH
2024-10-14 10:16     ` Zhi Wang
2024-10-14 11:33       ` Greg KH
2024-09-22 12:49 ` [RFC 03/29] nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled Zhi Wang
2024-09-22 12:49 ` [RFC 04/29] nvkm/vgpu: set the VF partition count " Zhi Wang
2024-09-26 22:51   ` Jason Gunthorpe
2024-10-13 18:54     ` Zhi Wang
2024-10-15 12:20       ` Jason Gunthorpe
2024-10-15 15:19         ` Zhi Wang
2024-10-15 16:35           ` Jason Gunthorpe
2024-09-22 12:49 ` [RFC 05/29] nvkm/vgpu: populate GSP_VF_INFO " Zhi Wang
2024-09-26 22:52   ` Jason Gunthorpe
2024-09-22 12:49 ` [RFC 06/29] nvkm/vgpu: set RMSetSriovMode " Zhi Wang
2024-09-26 22:53   ` Jason Gunthorpe
2024-10-14  7:38     ` Zhi Wang
2024-10-15  3:49       ` Christoph Hellwig
2024-10-15 12:23       ` Jason Gunthorpe
2024-09-22 12:49 ` [RFC 07/29] nvkm/gsp: add a notify handler for GSP event GPUACCT_PERFMON_UTIL_SAMPLES Zhi Wang
2024-09-22 12:49 ` [RFC 08/29] nvkm/vgpu: get the size VMMU segment from GSP firmware Zhi Wang
2024-09-22 12:49 ` [RFC 09/29] nvkm/vgpu: introduce the reserved channel allocator Zhi Wang
2024-09-22 12:49 ` [RFC 10/29] nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module Zhi Wang
2024-09-22 12:49 ` [RFC 11/29] nvkm/vgpu: introduce GSP RM client alloc and free for vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 12/29] nvkm/vgpu: introduce GSP RM control interface " Zhi Wang
2024-09-22 12:49 ` [RFC 13/29] nvkm: move chid.h to nvkm/engine Zhi Wang
2024-09-22 12:49 ` [RFC 14/29] nvkm/vgpu: introduce channel allocation for vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 15/29] nvkm/vgpu: introduce FB memory " Zhi Wang
2024-09-22 12:49 ` [RFC 16/29] nvkm/vgpu: introduce BAR1 map routines for vGPUs Zhi Wang
2024-09-22 12:49 ` [RFC 17/29] nvkm/vgpu: introduce engine bitmap for vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 18/29] nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm Zhi Wang
2024-09-26 22:56   ` Jason Gunthorpe
2024-10-14  8:32     ` Zhi Wang
2024-10-15 12:27       ` Jason Gunthorpe
2024-10-15 15:14         ` Zhi Wang
2024-10-14  8:36     ` Zhi Wang
2024-09-22 12:49 ` [RFC 19/29] vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude Zhi Wang
2024-09-22 12:49 ` [RFC 20/29] vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager Zhi Wang
2024-09-22 12:49 ` [RFC 21/29] vfio/vgpu_mgr: introduce vGPU type uploading Zhi Wang
2024-09-22 12:49 ` [RFC 22/29] vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs Zhi Wang
2024-09-22 12:49 ` [RFC 23/29] vfio/vgpu_mgr: allocate vGPU channels " Zhi Wang
2024-09-22 12:49 ` [RFC 24/29] vfio/vgpu_mgr: allocate mgmt heap " Zhi Wang
2024-09-22 12:49 ` [RFC 25/29] vfio/vgpu_mgr: map mgmt heap when creating a vGPU Zhi Wang
2024-09-22 12:49 ` [RFC 26/29] vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs Zhi Wang
2024-09-22 12:49 ` [RFC 27/29] vfio/vgpu_mgr: bootload the new vGPU Zhi Wang
2024-09-25  0:31   ` Dave Airlie
2024-09-22 12:49 ` [RFC 28/29] vfio/vgpu_mgr: introduce vGPU host RPC channel Zhi Wang
2024-09-22 12:49 ` [RFC 29/29] vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver Zhi Wang
2024-09-22 13:11 ` [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support Zhi Wang
2024-09-23  8:38   ` Danilo Krummrich
2024-09-24 19:49     ` Zhi Wang
2024-09-23  6:22 ` Tian, Kevin
2024-09-23 15:02   ` Jason Gunthorpe
2024-09-26  6:43     ` Tian, Kevin
2024-09-26 12:55       ` Jason Gunthorpe
2024-09-26 22:57         ` Jason Gunthorpe
2024-09-27  0:13           ` Tian, Kevin
2024-09-23  8:49 ` Danilo Krummrich
2024-09-23 15:01   ` Jason Gunthorpe
2024-09-23 22:50     ` Danilo Krummrich
2024-09-24 16:41       ` Jason Gunthorpe
2024-09-24 19:56         ` Danilo Krummrich
2024-09-24 22:52           ` Dave Airlie
2024-09-24 23:47             ` Jason Gunthorpe
2024-09-25  0:18               ` Dave Airlie
2024-09-25  1:29                 ` Jason Gunthorpe
2024-09-25  0:53           ` Jason Gunthorpe
2024-09-25  1:08             ` Dave Airlie
2024-09-25 15:28               ` Jason Gunthorpe [this message]
2024-09-25 10:55             ` Danilo Krummrich
2024-09-26  9:14     ` Greg KH
2024-09-26 12:42       ` Jason Gunthorpe
2024-09-26 12:54         ` Greg KH
2024-09-26 13:07           ` Danilo Krummrich
2024-09-26 14:40           ` Jason Gunthorpe
2024-09-26 18:07             ` Andy Ritger
2024-09-26 22:23               ` Danilo Krummrich
2024-09-26 22:42             ` Danilo Krummrich
2024-09-27 12:51               ` Jason Gunthorpe
2024-09-27 14:22                 ` Danilo Krummrich
2024-09-27 15:27                   ` Jason Gunthorpe
2024-09-30 15:59                     ` Danilo Krummrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240925152800.GT9417@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acurrid@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=daniel@ffwll.ch \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=smitra@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=zhiw@nvidia.com \
    --cc=zhiwang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox