From: <ankita@nvidia.com>
To: <ankita@nvidia.com>, <jgg@nvidia.com>,
<alex.williamson@redhat.com>, <yishaih@nvidia.com>,
<shameerali.kolothum.thodi@huawei.com>, <kevin.tian@intel.com>,
<zhiw@nvidia.com>
Cc: <aniketa@nvidia.com>, <cjia@nvidia.com>, <kwankhede@nvidia.com>,
<targupta@nvidia.com>, <vsethi@nvidia.com>, <acurrid@nvidia.com>,
<apopple@nvidia.com>, <jhubbard@nvidia.com>, <danw@nvidia.com>,
<anuaggarwal@nvidia.com>, <mochs@nvidia.com>,
<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH v1 2/3] vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM
Date: Sun, 6 Oct 2024 10:27:21 +0000 [thread overview]
Message-ID: <20241006102722.3991-3-ankita@nvidia.com> (raw)
In-Reply-To: <20241006102722.3991-1-ankita@nvidia.com>
From: Ankit Agrawal <ankita@nvidia.com>
There is a HW defect on Grace Hopper (GH) to support the
Multi-Instance GPU (MIG) feature [1] that necessiated the presence
of a 1G region carved out from the device memory and mapped as
uncached. The 1G region is shown as a fake BAR (comprising region 2 and 3)
to workaround the issue.
The Grace Blackwell systems (GB) differ from GH systems in the following
aspects:
1. The aforementioned HW defect is fixed on GB systems.
2. There is a usable BAR1 (region 2 and 3) on GB systems for the
GPUdirect RDMA feature [2].
This patch accommodate those GB changes by showing the 64b physical
device BAR1 (region2 and 3) to the VM instead of the fake one. This
takes care of both the differences.
Moreover, the entire device memory is exposed on GB as cacheable to
the VM as there is no carveout required.
Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [1]
Link: https://docs.nvidia.com/cuda/gpudirect-rdma/ [2]
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
drivers/vfio/pci/nvgrace-gpu/main.c | 32 +++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index c23db6eaf979..e3a7eceb6228 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -72,7 +72,7 @@ nvgrace_gpu_memregion(int index,
if (index == USEMEM_REGION_INDEX)
return &nvdev->usemem;
- if (index == RESMEM_REGION_INDEX)
+ if (!nvdev->has_mig_hw_bug_fix && index == RESMEM_REGION_INDEX)
return &nvdev->resmem;
return NULL;
@@ -715,6 +715,16 @@ static const struct vfio_device_ops nvgrace_gpu_pci_core_ops = {
.detach_ioas = vfio_iommufd_physical_detach_ioas,
};
+static void
+nvgrace_gpu_init_nvdev_struct(struct pci_dev *pdev,
+ struct nvgrace_gpu_pci_core_device *nvdev,
+ u64 memphys, u64 memlength)
+{
+ nvdev->usemem.memphys = memphys;
+ nvdev->usemem.memlength = memlength;
+ nvdev->usemem.bar_size = roundup_pow_of_two(nvdev->usemem.memlength);
+}
+
static int
nvgrace_gpu_fetch_memory_property(struct pci_dev *pdev,
u64 *pmemphys, u64 *pmemlength)
@@ -752,9 +762,9 @@ nvgrace_gpu_fetch_memory_property(struct pci_dev *pdev,
}
static int
-nvgrace_gpu_init_nvdev_struct(struct pci_dev *pdev,
- struct nvgrace_gpu_pci_core_device *nvdev,
- u64 memphys, u64 memlength)
+nvgrace_gpu_nvdev_struct_workaround(struct pci_dev *pdev,
+ struct nvgrace_gpu_pci_core_device *nvdev,
+ u64 memphys, u64 memlength)
{
int ret = 0;
@@ -864,10 +874,16 @@ static int nvgrace_gpu_probe(struct pci_dev *pdev,
* Device memory properties are identified in the host ACPI
* table. Set the nvgrace_gpu_pci_core_device structure.
*/
- ret = nvgrace_gpu_init_nvdev_struct(pdev, nvdev,
- memphys, memlength);
- if (ret)
- goto out_put_vdev;
+ if (nvdev->has_mig_hw_bug_fix) {
+ nvgrace_gpu_init_nvdev_struct(pdev, nvdev,
+ memphys, memlength);
+ } else {
+ ret = nvgrace_gpu_nvdev_struct_workaround(pdev, nvdev,
+ memphys,
+ memlength);
+ if (ret)
+ goto out_put_vdev;
+ }
}
ret = vfio_pci_core_register_device(&nvdev->core_device);
--
2.34.1
next prev parent reply other threads:[~2024-10-06 10:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-06 10:27 [PATCH v1 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards ankita
2024-10-06 10:27 ` [PATCH v1 1/3] vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem ankita
2024-10-06 10:27 ` ankita [this message]
2024-10-06 10:27 ` [PATCH v1 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status ankita
2024-10-07 14:19 ` [PATCH v1 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards Alex Williamson
2024-10-07 16:37 ` Ankit Agrawal
2024-10-07 21:16 ` Alex Williamson
2024-10-08 7:22 ` Ankit Agrawal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241006102722.3991-3-ankita@nvidia.com \
--to=ankita@nvidia.com \
--cc=acurrid@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=aniketa@nvidia.com \
--cc=anuaggarwal@nvidia.com \
--cc=apopple@nvidia.com \
--cc=cjia@nvidia.com \
--cc=danw@nvidia.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=yishaih@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox