public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: <ankita@nvidia.com>
Cc: <jgg@nvidia.com>, <yishaih@nvidia.com>,
	<shameerali.kolothum.thodi@huawei.com>, <kevin.tian@intel.com>,
	<zhiw@nvidia.com>, <aniketa@nvidia.com>, <cjia@nvidia.com>,
	<kwankhede@nvidia.com>, <targupta@nvidia.com>,
	<vsethi@nvidia.com>, <acurrid@nvidia.com>, <apopple@nvidia.com>,
	<jhubbard@nvidia.com>, <danw@nvidia.com>, <kjaju@nvidia.com>,
	<udhoke@nvidia.com>, <dnigam@nvidia.com>, <nandinid@nvidia.com>,
	<anuaggarwal@nvidia.com>, <mochs@nvidia.com>,
	<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 0/4] vfio/nvgrace-gpu: Enable grace blackwell boards
Date: Fri, 24 Jan 2025 15:05:03 -0700	[thread overview]
Message-ID: <20250124150503.24a39cea.alex.williamson@redhat.com> (raw)
In-Reply-To: <20250124183102.3976-1-ankita@nvidia.com>

On Fri, 24 Jan 2025 18:30:58 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> NVIDIA's recently introduced Grace Blackwell (GB) Superchip in
> continuation with the Grace Hopper (GH) superchip that provides a
> cache coherent access to CPU and GPU to each other's memory with
> an internal proprietary chip-to-chip (C2C) cache coherent interconnect.
> The in-tree nvgrace-gpu driver manages the GH devices. The intention
> is to extend the support to the new Grace Blackwell boards.
> 
> There is a HW defect on GH to support the Multi-Instance GPU (MIG)
> feature [1] that necessiated the presence of a 1G carved out from
> the device memory and mapped uncached. The 1G region is shown as a
> fake BAR (comprising region 2 and 3) to workaround the issue.
> 
> The GB systems differ from GH systems in the following aspects.
> 1. The aforementioned HW defect is fixed on GB systems.
> 2. There is a usable BAR1 (region 2 and 3) on GB systems for the
> GPUdirect RDMA feature [2].
> 
> This patch series accommodate those GB changes by showing the real
> physical device BAR1 (region2 and 3) to the VM instead of the fake
> one. This takes care of both the differences.
> 
> The presence of the fix for the HW defect is communicated by the
> firmware through a DVSEC PCI config register. The module reads
> this to take a different codepath on GB vs GH.
> 
> To improve system bootup time, HBM training is moved out of UEFI
> in GB system. Poll for the register indicating the training state.
> Also check the C2C link status if it is ready. Fail the probe if
> either fails.
> 
> Applied over next-20241220 and the required KVM patch (under review
> on the mailing list) to map the GPU device memory as cacheable [3].
> Tested on the Grace Blackwell platform by booting up VM, loading
> NVIDIA module [4] and running nvidia-smi in the VM.
> 
> To run CUDA workloads, there is a dependency on the IOMMUFD and the
> Nested Page Table patches being worked on separately by Nicolin Chen.
> (nicolinc@nvidia.com). NVIDIA has provided git repositories which
> includes all the requisite kernel [5] and Qemu [6] patches in case
> one wants to try.
> 
> Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
> Link: https://docs.nvidia.com/cuda/gpudirect-rdma/ [2]
> Link: https://lore.kernel.org/all/20241118131958.4609-2-ankita@nvidia.com/ [3]
> Link: https://github.com/NVIDIA/open-gpu-kernel-modules [4]
> Link: https://github.com/NVIDIA/NV-Kernels/tree/6.8_ghvirt [5]
> Link: https://github.com/NVIDIA/QEMU/tree/6.8_ghvirt_iommufd_vcmdq [6]
> 
> v5 -> v6

LGTM.  I'll give others who have reviewed this a short opportunity to
take a final look.  We're already in the merge window but I think we're
just wrapping up some loose ends and I don't see any benefit to holding
it back, so pending comments from others, I'll plan to include it early
next week.  Thanks,

Alex

> * Updated the code based on Alex Williamson's suggestion to move the
>   device id enablement to a new patch and using KBUILD_MODNAME
>   in place of "vfio-pci"
> 
> v4 -> v5
> * Added code to enable the BAR0 region as per Alex Williamson's suggestion.
> * Updated code based on Kevin Tian's suggestion to replace the variable
>   with the semantic representing the presence of MIG bug. Also reorg the
>   code to return early for blackwell without any resmem processing.
> * Code comments updates.
> 
> v3 -> v4
> * Added code to enable and restore device memory regions before reading
>   BAR0 registers as per Alex Williamson's suggestion.
> 
> v2 -> v3
> * Incorporated Alex Williamson's suggestion to simplify patch 2/3.
> * Updated the code in 3/3 to use time_after() and other miscellaneous
>   suggestions from Alex Williamson.
> 
> v1 -> v2
> * Rebased to next-20241220.
> 
> v5:
> Link: https://lore.kernel.org/all/20250123174854.3338-1-ankita@nvidia.com/
> 
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> 
> Ankit Agrawal (4):
>   vfio/nvgrace-gpu: Read dvsec register to determine need for uncached
>     resmem
>   vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM
>   vfio/nvgrace-gpu: Check the HBM training and C2C link status
>   vfio/nvgrace-gpu: Add GB200 SKU to the devid table
> 
>  drivers/vfio/pci/nvgrace-gpu/main.c | 169 ++++++++++++++++++++++++----
>  1 file changed, 147 insertions(+), 22 deletions(-)
> 


  parent reply	other threads:[~2025-01-24 22:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-24 18:30 [PATCH v6 0/4] vfio/nvgrace-gpu: Enable grace blackwell boards ankita
2025-01-24 18:30 ` [PATCH v6 1/4] vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem ankita
2025-01-24 18:31 ` [PATCH v6 2/4] vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM ankita
2025-01-24 18:31 ` [PATCH v6 3/4] vfio/nvgrace-gpu: Check the HBM training and C2C link status ankita
2025-01-24 18:31 ` [PATCH v6 4/4] vfio/nvgrace-gpu: Add GB200 SKU to the devid table ankita
2025-01-24 22:05 ` Alex Williamson [this message]
2025-01-29  2:18   ` [PATCH v6 0/4] vfio/nvgrace-gpu: Enable grace blackwell boards Ankit Agrawal
2025-01-28  2:03 ` Matt Ochs
2025-01-28  5:11   ` Ankit Agrawal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250124150503.24a39cea.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=acurrid@nvidia.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=anuaggarwal@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=dnigam@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=nandinid@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=targupta@nvidia.com \
    --cc=udhoke@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox