public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards
@ 2025-01-17 23:37 ankita
  2025-01-17 23:37 ` [PATCH v4 1/3] vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem ankita
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: ankita @ 2025-01-17 23:37 UTC (permalink / raw)
  To: ankita, jgg, alex.williamson, yishaih, shameerali.kolothum.thodi,
	kevin.tian, zhiw
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, apopple,
	jhubbard, danw, anuaggarwal, mochs, kvm, linux-kernel

From: Ankit Agrawal <ankita@nvidia.com>

NVIDIA's recently introduced Grace Blackwell (GB) Superchip in
continuation with the Grace Hopper (GH) superchip that provides a
cache coherent access to CPU and GPU to each other's memory with
an internal proprietary chip-to-chip (C2C) cache coherent interconnect.
The in-tree nvgrace-gpu driver manages the GH devices. The intention
is to extend the support to the new Grace Blackwell boards.

There is a HW defect on GH to support the Multi-Instance GPU (MIG)
feature [1] that necessiated the presence of a 1G carved out from
the device memory and mapped uncached. The 1G region is shown as a
fake BAR (comprising region 2 and 3) to workaround the issue.

The GB systems differ from GH systems in the following aspects.
1. The aforementioned HW defect is fixed on GB systems.
2. There is a usable BAR1 (region 2 and 3) on GB systems for the
GPUdirect RDMA feature [2].

This patch series accommodate those GB changes by showing the real
physical device BAR1 (region2 and 3) to the VM instead of the fake
one. This takes care of both the differences.

The presence of the fix for the HW defect is communicated by the
firmware through a DVSEC PCI config register. The module reads
this to take a different codepath on GB vs GH.

To improve system bootup time, HBM training is moved out of UEFI
in GB system. Poll for the register indicating the training state.
Also check the C2C link status if it is ready. Fail the probe if
either fails.

Applied over next-20241220 and the required KVM patch (under review
on the mailing list) to map the GPU device memory as cacheable [3].
Tested on the Grace Blackwell platform by booting up VM, loading
NVIDIA module [4] and running nvidia-smi in the VM.

To run CUDA workloads, there is a dependency on the IOMMUFD and the
Nested Page Table patches being worked on separately by Nicolin Chen.
(nicolinc@nvidia.com). NVIDIA has provided git repositories which
includes all the requisite kernel [5] and Qemu [6] patches in case
one wants to try.

Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
Link: https://docs.nvidia.com/cuda/gpudirect-rdma/ [2]
Link: https://lore.kernel.org/all/20241118131958.4609-2-ankita@nvidia.com/ [3]
Link: https://github.com/NVIDIA/open-gpu-kernel-modules [4]
Link: https://github.com/NVIDIA/NV-Kernels/tree/6.8_ghvirt [5]
Link: https://github.com/NVIDIA/QEMU/tree/6.8_ghvirt_iommufd_vcmdq [6]

v3 -> v4
* Added code to enable and restore device memory regions before reading
  BAR0 registers as per Alex Williamson's suggestion.

v2 -> v3
* Incorporated Alex Williamson's suggestion to simplify patch 2/3.
* Updated the code in 3/3 to use time_after() and other miscellaneous
  suggestions from Alex Williamson.

v1 -> v2
* Rebased to next-20241220.

v3:
Link: https://lore.kernel.org/all/20250117152334.2786-1-ankita@nvidia.com/

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Ankit Agrawal (3):
  vfio/nvgrace-gpu: Read dvsec register to determine need for uncached
    resmem
  vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM
  vfio/nvgrace-gpu: Check the HBM training and C2C link status

 drivers/vfio/pci/nvgrace-gpu/main.c | 160 +++++++++++++++++++++++-----
 drivers/vfio/pci/vfio_pci_core.c    |   2 +
 2 files changed, 138 insertions(+), 24 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-01-21  1:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 23:37 [PATCH v4 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards ankita
2025-01-17 23:37 ` [PATCH v4 1/3] vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem ankita
2025-01-20  7:09   ` Tian, Kevin
2025-01-20 17:01     ` Ankit Agrawal
2025-01-21  1:20       ` Tian, Kevin
2025-01-17 23:37 ` [PATCH v4 2/3] vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM ankita
2025-01-20  7:29   ` Tian, Kevin
2025-01-20 17:13     ` Ankit Agrawal
2025-01-17 23:37 ` [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status ankita
2025-01-18  1:52   ` Alex Williamson
2025-01-20  2:24     ` Ankit Agrawal
2025-01-20  3:12       ` Alex Williamson
2025-01-20  3:22         ` Alex Williamson
2025-01-20  3:35           ` Ankit Agrawal
2025-01-20  7:04           ` Tian, Kevin
2025-01-20 13:12           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox