From: Alex Williamson <alex@shazbot.org>
To: <ankita@nvidia.com>
Cc: <vsethi@nvidia.com>, <jgg@nvidia.com>, <mochs@nvidia.com>,
<jgg@ziepe.ca>, <skolothumtho@nvidia.com>, <cjia@nvidia.com>,
<zhiw@nvidia.com>, <kjaju@nvidia.com>, <yishaih@nvidia.com>,
<kevin.tian@intel.com>, <kvm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>,
alex@shazbot.org
Subject: Re: [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM
Date: Wed, 4 Mar 2026 11:09:00 -0700 [thread overview]
Message-ID: <20260304110900.47151cc8@shazbot.org> (raw)
In-Reply-To: <20260223155514.152435-6-ankita@nvidia.com>
On Mon, 23 Feb 2026 15:55:04 +0000
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> The Extended GPU Memory (EGM) feature that enables the GPU to access
> the system memory allocations within and across nodes through high
> bandwidth path on Grace Based systems. The GPU can utilize the
> system memory located on the same socket or from a different socket
> or even on a different node in a multi-node system [1].
>
> When the EGM mode is enabled through SBIOS, the host system memory is
> partitioned into 2 parts: One partition for the Host OS usage
> called Hypervisor region, and a second Hypervisor-Invisible (HI) region
> for the VM. Only the hypervisor region is part of the host EFI map
> and is thus visible to the host OS on bootup. Since the entire VM
> sysmem is eligible for EGM allocations within the VM, the HI partition
> is interchangeably called as EGM region in the series. This HI/EGM region
> range base SPA and size is exposed through the ACPI DSDT properties.
>
> Whilst the EGM region is accessible on the host, it is not added to
> the kernel. The HI region is assigned to a VM by mapping the QEMU VMA
> to the SPA using remap_pfn_range().
>
> The following figure shows the memory map in the virtualization
> environment.
>
> |---- Sysmem ----| |--- GPU mem ---| VM Memory
> | | | |
> |IPA <-> SPA map | |IPA <-> SPA map|
> | | | |
> |--- HI / EGM ---|-- Host Mem --| |--- GPU mem ---| Host Memory
>
> Introduce a new nvgrace-egm auxiliary driver module to manage and
> map the HI/EGM region in the Grace Blackwell systems. This binds to
> the auxiliary device created by the parent nvgrace-gpu (in-tree
> module for device assignment) / nvidia-vgpu-vfio (out-of-tree open
> source module for SRIOV vGPU) to manage the EGM region for the VM.
> Note that there is a unique EGM region per socket and the auxiliary
> device gets created for every region. The parent module fetches the
> EGM region information from the ACPI tables and populate to the data
> structures shared with the auxiliary nvgrace-egm module.
>
> nvgrace-egm module handles the following:
Or it will eventually, not in this commit.
> 1. Fetch the EGM memory properties (base HPA, length, proximity domain)
> from the parent device shared EGM region structure.
> 2. Create a char device that can be used as memory-backend-file by Qemu
> for the VM and implement file operations. The char device is /dev/egmX,
> where X is the PXM node ID of the EGM being mapped fetched in 1.
> 3. Zero the EGM memory on first device open().
> 4. Map the QEMU VMA to the EGM region using remap_pfn_range.
> 5. Cleaning up state and destroying the chardev on device unbind.
> 6. Handle presence of retired ECC pages on the EGM region.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> MAINTAINERS | 6 ++++++
> drivers/vfio/pci/nvgrace-gpu/Kconfig | 12 ++++++++++++
> drivers/vfio/pci/nvgrace-gpu/Makefile | 3 +++
> drivers/vfio/pci/nvgrace-gpu/egm.c | 22 ++++++++++++++++++++++
> drivers/vfio/pci/nvgrace-gpu/main.c | 1 +
> 5 files changed, 44 insertions(+)
> create mode 100644 drivers/vfio/pci/nvgrace-gpu/egm.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5b3d86de9ec0..1fc551d7d667 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27384,6 +27384,12 @@ F: drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> F: drivers/vfio/pci/nvgrace-gpu/main.c
> F: include/linux/nvgrace-egm.h
>
> +VFIO NVIDIA GRACE EGM DRIVER
> +M: Ankit Agrawal <ankita@nvidia.com>
> +L: kvm@vger.kernel.org
> +S: Supported
> +F: drivers/vfio/pci/nvgrace-gpu/egm.c
I'm not sure a separate MAINTAINERS entry is warranted here, these are
intertwined, even if constructed to allow this EGM driver to be used by
an out-of-tree driver. It's also an unclean split, with Makefile and
Kconfig dependencies under the nvgrace-gpu heading. It should probably
be self contained in a separate sub-dir to justify a new MAINTAINERS
entry.
> +
> VFIO PCI DEVICE SPECIFIC DRIVERS
> R: Jason Gunthorpe <jgg@nvidia.com>
> R: Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Kconfig b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> index a7f624b37e41..7989d8d1c377 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Kconfig
> +++ b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> @@ -1,8 +1,20 @@
> # SPDX-License-Identifier: GPL-2.0-only
> +config NVGRACE_EGM
> + tristate "EGM driver for NVIDIA Grace Hopper and Blackwell Superchip"
> + depends on ARM64 || (COMPILE_TEST && 64BIT)
> + depends on NVGRACE_GPU_VFIO_PCI
> + help
> + Extended GPU Memory (EGM) support for the GPU in the NVIDIA Grace
> + based chips required to avail the CPU memory as additional
> + cross-node/cross-socket memory for GPU using KVM/qemu.
> +
> + If you don't know what to do here, say N.
> +
> config NVGRACE_GPU_VFIO_PCI
> tristate "VFIO support for the GPU in the NVIDIA Grace Hopper Superchip"
> depends on ARM64 || (COMPILE_TEST && 64BIT)
> select VFIO_PCI_CORE
> + select NVGRACE_EGM
This should be dropped, it creates a circular dependency where we
cannot actually unselect NVGRACE_EGM with NVGRACE_GPU_VFIO_PCI
selected.
> help
> VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is
> required to assign the GPU device to userspace using KVM/qemu/etc.
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Makefile b/drivers/vfio/pci/nvgrace-gpu/Makefile
> index e72cc6739ef8..d0d191be56b9 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Makefile
> +++ b/drivers/vfio/pci/nvgrace-gpu/Makefile
> @@ -1,3 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
> nvgrace-gpu-vfio-pci-y := main.o egm_dev.o
> +
> +obj-$(CONFIG_NVGRACE_EGM) += nvgrace-egm.o
> +nvgrace-egm-y := egm.o
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm.c b/drivers/vfio/pci/nvgrace-gpu/egm.c
> new file mode 100644
> index 000000000000..999808807019
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved
2026
> + */
> +
> +#include <linux/vfio_pci_core.h>
Premature?
> +
> +static int __init nvgrace_egm_init(void)
> +{
> + return 0;
> +}
> +
> +static void __exit nvgrace_egm_cleanup(void)
> +{
> +}
> +
> +module_init(nvgrace_egm_init);
> +module_exit(nvgrace_egm_cleanup);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Ankit Agrawal <ankita@nvidia.com>");
> +MODULE_DESCRIPTION("NVGRACE EGM - Module to support Extended GPU Memory on NVIDIA Grace Based systems");
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index b356e941340a..0bb427cca31f 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -1410,3 +1410,4 @@ MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Ankit Agrawal <ankita@nvidia.com>");
> MODULE_AUTHOR("Aniket Agashe <aniketa@nvidia.com>");
> MODULE_DESCRIPTION("VFIO NVGRACE GPU PF - User Level driver for NVIDIA devices with CPU coherently accessible device memory");
> +MODULE_SOFTDEP("pre: nvgrace-egm");
Premature and wrong if necessary. AIUI the aux device created should
generate uevents and modules loaded automatically via device tables.
Thanks,
Alex
next prev parent reply other threads:[~2026-03-04 18:09 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 15:54 [PATCH RFC v2 00/15] Add virtualization support for EGM ankita
2026-02-23 15:55 ` [PATCH RFC v2 01/15] vfio/nvgrace-gpu: Expand module_pci_driver to allow custom module init ankita
2026-02-23 15:55 ` [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for EGM ankita
2026-02-26 14:28 ` Shameer Kolothum Thodi
2026-03-04 0:13 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions ankita
2026-02-26 14:55 ` Shameer Kolothum Thodi
2026-03-04 17:14 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 04/15] vfio/nvgrace-gpu: Introduce functions to fetch and save EGM info ankita
2026-02-26 15:12 ` Shameer Kolothum Thodi
2026-03-04 17:37 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM ankita
2026-03-04 18:09 ` Alex Williamson [this message]
2026-02-23 15:55 ` [PATCH RFC v2 06/15] vfio/nvgrace-egm: Introduce egm class and register char device numbers ankita
2026-03-04 18:56 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 07/15] vfio/nvgrace-egm: Register auxiliary driver ops ankita
2026-03-04 19:06 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 08/15] vfio/nvgrace-egm: Expose EGM region as char device ankita
2026-02-26 17:08 ` Shameer Kolothum Thodi
2026-03-04 20:16 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 09/15] vfio/nvgrace-egm: Add chardev ops for EGM management ankita
2026-03-04 22:04 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before handing out to VM ankita
2026-02-26 18:15 ` Shameer Kolothum Thodi
2026-02-26 18:56 ` Jason Gunthorpe
2026-02-26 19:29 ` Shameer Kolothum Thodi
2026-03-04 22:14 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 11/15] vfio/nvgrace-egm: Fetch EGM region retired pages list ankita
2026-03-04 22:37 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 12/15] vfio/nvgrace-egm: Introduce ioctl to share retired pages ankita
2026-03-04 23:00 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 13/15] vfio/nvgrace-egm: expose the egm size through sysfs ankita
2026-03-04 23:22 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 14/15] vfio/nvgrace-gpu: Add link from pci to EGM ankita
2026-03-04 23:37 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 15/15] vfio/nvgrace-egm: register EGM PFNMAP range with memory_failure ankita
2026-03-04 23:48 ` Alex Williamson
2026-03-05 17:33 ` [PATCH RFC v2 00/15] Add virtualization support for EGM Alex Williamson
2026-03-11 6:47 ` Ankit Agrawal
2026-03-11 20:37 ` Alex Williamson
2026-03-12 13:51 ` Ankit Agrawal
2026-03-12 14:59 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260304110900.47151cc8@shazbot.org \
--to=alex@shazbot.org \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=kevin.tian@intel.com \
--cc=kjaju@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=skolothumtho@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=yishaih@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox