Re: [RFC v2 03/14] vfio/nvidia-vgpu: introduce vGPU type uploading

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Danilo Krummrich" <dakr@kernel.org>
To: "Zhi Wang" <zhiw@nvidia.com>
Cc: <kvm@vger.kernel.org>, <alex.williamson@redhat.com>,
	<kevin.tian@intel.com>, <jgg@nvidia.com>, <airlied@gmail.com>,
	<daniel@ffwll.ch>, <acurrid@nvidia.com>, <cjia@nvidia.com>,
	<smitra@nvidia.com>, <ankita@nvidia.com>, <aniketa@nvidia.com>,
	<kwankhede@nvidia.com>, <targupta@nvidia.com>,
	<zhiwang@kernel.org>
Subject: Re: [RFC v2 03/14] vfio/nvidia-vgpu: introduce vGPU type uploading
Date: Thu, 04 Sep 2025 11:37:00 +0200	[thread overview]
Message-ID: <DCJWXVLI2GWB.3UBHWIZCZXKD2@kernel.org> (raw)
In-Reply-To: <20250903221111.3866249-4-zhiw@nvidia.com>

On Thu Sep 4, 2025 at 12:11 AM CEST, Zhi Wang wrote:
> diff --git a/drivers/vfio/pci/nvidia-vgpu/include/nvrm/gsp.h b/drivers/vfio/pci/nvidia-vgpu/include/nvrm/gsp.h
> new file mode 100644
> index 000000000000..c3fb7b299533
> --- /dev/null
> +++ b/drivers/vfio/pci/nvidia-vgpu/include/nvrm/gsp.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: MIT */
> +#ifndef __NVRM_GSP_H__
> +#define __NVRM_GSP_H__
> +
> +#include <nvrm/nvtypes.h>
> +
> +/* Excerpt of RM headers from https://github.com/NVIDIA/open-gpu-kernel-modules/tree/570 */
> +
> +#define NV2080_CTRL_CMD_GSP_GET_FEATURES (0x20803601)
> +
> +typedef struct NV2080_CTRL_GSP_GET_FEATURES_PARAMS {
> +	NvU32  gspFeatures;
> +	NvBool bValid;
> +	NvBool bDefaultGspRmGpu;
> +	NvU8   firmwareVersion[GSP_MAX_BUILD_VERSION_LENGTH];
> +} NV2080_CTRL_GSP_GET_FEATURES_PARAMS;
> +
> +#endif

<snip>

> +static struct version supported_version_list[] = {
> +	{ 18, 1, "570.144" },
> +};

nova-core won't provide any firmware specific APIs, it is meant to serve as a
hardware and firmware abstraction layer for higher level drivers, such as vGPU
or nova-drm.

As a general rule the interface between nova-core and higher level drivers must
not leak any hardware or firmware specific details, but work on a higher level
abstraction layer.

Now, I recognize that at some point it might be necessary to do some kind of
versioning in this API anyways. For instance, when the semantics of the firmware
API changes too significantly.

However, this would be a separte API where nova-core, at the initial handshake,
then asks clients to use e.g. v2 of the nova-core API, still hiding any firmware
and hardware details from the client.

Some more general notes, since I also had a look at the nova-core <-> vGPU
interface patches in your tree (even though I'm aware that they're not part of
the RFC of course):

The interface for the general lifecycle management for any clients attaching to
nova-core (VGPU, nova-drm) should be common and not specific to vGPU. (The same
goes for interfaces that will be used by vGPU and nova-drm.)

The interface nova-core provides for that should be designed in Rust, so we can
take advantage of all the features the type system provides us with connecting
to Rust clients (nova-drm).

For vGPU, we can then monomorphize those types into the corresponding C
structures and provide the corresponding functions very easily.

Doing it the other way around would be a very bad idea, since the Rust type
system is much more powerful and hence it'd be very hard to avoid introducing
limitations on the Rust side of things.

Hence, I recommend to start with some patches defining the API in nova-core for
the general lifecycle (in Rust), so we can take it from there.

Another note: I don't see any use of the auxiliary bus in vGPU, any clients
should attach via the auxiliary bus API, it provides proper matching where
there's more than on compatible GPU in the system. nova-core already registers
an auxiliary device for each bound PCI device.

Please don't re-implement what the auxiliary bus already does for us.

- Danilo

next prev parent reply	other threads:[~2025-09-04  9:37 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-03 22:10 [RFC v2 00/14] Introduce NVIDIA GPU Virtualization (vGPU) Support Zhi Wang
2025-09-03 22:10 ` [RFC v2 01/14] vfio/nvidia-vgpu: introduce vGPU lifecycle management prelude Zhi Wang
2025-09-03 22:10 ` [RFC v2 02/14] vfio/nvidia-vgpu: allocate GSP RM client for NVIDIA vGPU manager Zhi Wang
2025-09-03 22:11 ` [RFC v2 03/14] vfio/nvidia-vgpu: introduce vGPU type uploading Zhi Wang
2025-09-04  9:37   ` Danilo Krummrich [this message]
2025-09-04  9:41     ` Danilo Krummrich
2025-09-04 12:15       ` Jason Gunthorpe
2025-09-04 12:45         ` Danilo Krummrich
2025-09-04 13:58           ` Jason Gunthorpe
2025-09-04 15:43       ` Zhi Wang
2025-09-06 10:34         ` Danilo Krummrich
2025-09-03 22:11 ` [RFC v2 04/14] vfio/nvidia-vgpu: allocate vGPU channels when creating vGPUs Zhi Wang
2025-09-03 22:11 ` [RFC v2 05/14] vfio/nvidia-vgpu: allocate vGPU FB memory " Zhi Wang
2025-09-03 22:11 ` [RFC v2 06/14] vfio/nvidia-vgpu: allocate mgmt heap " Zhi Wang
2025-09-03 22:11 ` [RFC v2 07/14] vfio/nvidia-vgpu: map mgmt heap when creating a vGPU Zhi Wang
2025-09-03 22:11 ` [RFC v2 08/14] vfio/nvidia-vgpu: allocate GSP RM client when creating vGPUs Zhi Wang
2025-09-03 22:11 ` [RFC v2 09/14] vfio/nvidia-vgpu: bootload the new vGPU Zhi Wang
2025-09-03 22:11 ` [RFC v2 10/14] vfio/nvidia-vgpu: introduce vGPU host RPC channel Zhi Wang
2025-09-03 22:36   ` Timur Tabi
2025-09-03 22:11 ` [RFC v2 11/14] vfio/nvidia-vgpu: introduce NVIDIA vGPU VFIO variant driver Zhi Wang
2025-09-03 22:11 ` [RFC v2 12/14] vfio/nvidia-vgpu: scrub the guest FB memory of a vGPU Zhi Wang
2025-09-03 22:11 ` [RFC v2 13/14] vfio/nvidia-vgpu: introduce vGPU logging Zhi Wang
2025-09-03 22:11 ` [RFC v2 14/14] vfio/nvidia-vgpu: add a kernel doc to introduce NVIDIA vGPU Zhi Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DCJWXVLI2GWB.3UBHWIZCZXKD2@kernel.org \
    --to=dakr@kernel.org \
    --cc=acurrid@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=daniel@ffwll.ch \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=smitra@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=zhiw@nvidia.com \
    --cc=zhiwang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox