Re: [PATCH RFC v2 00/15] Add virtualization support for EGM

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Williamson <alex@shazbot.org>
To: Ankit Agrawal <ankita@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>, Matt Ochs <mochs@nvidia.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	Shameer Kolothum Thodi <skolothumtho@nvidia.com>,
	Neo Jia <cjia@nvidia.com>, Zhi Wang <zhiw@nvidia.com>,
	Krishnakant Jaju <kjaju@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	alex@shazbot.org
Subject: Re: [PATCH RFC v2 00/15] Add virtualization support for EGM
Date: Thu, 12 Mar 2026 08:59:04 -0600	[thread overview]
Message-ID: <20260312085904.42a98f16@shazbot.org> (raw)
In-Reply-To: <SA1PR12MB7199673D9DE79C11DB6B9E33B044A@SA1PR12MB7199.namprd12.prod.outlook.com>

On Thu, 12 Mar 2026 13:51:20 +0000
Ankit Agrawal <ankita@nvidia.com> wrote:

> >> > nvgrace-gpu is manipulating sysfs
> >> > on devices owned by nvgrace-egm, we don't have mechanisms to manage the
> >> > aux device relative to the state of the GPU, we're trying to add a
> >> > driver that can bind to device created by an out-of-tree driver, and
> >> > we're inventing new uAPIs on the chardev for things that already exist
> >> > for vfio regions.  
> >>
> >> Sorry for the confusion. The nvgrace-egm would not bind to the device
> >> created by the out-of-tree driver. We would have a separate out-of-tree
> >> equivalent of nvgrace-egm to bind to the device by the out-of-tree vfio
> >> driver. Maybe we can consider exposing a register/unregister APIs from
> >> nvgrace-egm where a module (in-tree nvgrace / out-of-tree) can register
> >> a pdev and nvgrace-egm can use to fetch the region info.  
> >
> > Ok, this wasn't clear to me, but does that also mean that if some GPUs
> > are managed by nvgrace-gpu and others by out-of-tree drivers that the
> > in-kernel and out-of-tree equivalent drivers are both installing
> > chardevs as /dev/egmXX?  Playing in the same space is ugly, but what
> > happens when the 2 GPUs per socket are split between drivers and they
> > both try to added the same chardev?  
> 
> But that would be an unsupported configuration. It is expected that all the
> GPUs on the system and the EGM char devices to be attached to the same
> VM for full functionality. So either all the devices (GPU and EGM chardev)
> would be bound to nvgrace or to the out-of-tree module. Please refer sec 8.1
> https://docs.nvidia.com/multi-node-nvlink-systems/partition-guide-v1-2.pdf
> Perhaps I should add this information in the commit message.

Just because it can be documented as a policy doesn't make it an
agreeable architecture.

> > However, I'd then ask the question why we're associating EGM to the GPU
> > PCI driver at all.  For instance, why should nvgrace-gpu spawn aux
> > devices to feed into an nvgrace-egm driver, and duplicate that whole
> > thing in an out-of-tree driver, when we could just have one in-kernel
> > platform(?) driver walk ACPI, find these ranges, and expose them as
> > chardev entirely independent of the PCI driver bound to the GPU?  
> 
> So a new platform driver to walk through the ACPI and look for EGM properties
> and create EGM char devs? 
> 
> Maybe it is okay, but given that all the 4 EGM properties are under the GPU's
> ACPI node and there being no independent ACPI _HID device identity, it sounds
> a bit off to me. Do we have a precedent like that?
> 
> But as I mentioned above, the expectation is that the EGM devices and the GPU
> devices to be assigned to the same VM. So would it not make sense that we
> keep the association between the EGM devices and the GPU devices?

You're telling me that the EGM access is 100% independent of any state
related to the GPU, so why would we tie the lifecycle of these aux
devices to any particular driver for the GPU or re-implement it across
multiple drivers?  That doesn't make sense to me.  Thanks,

Alex

     prev parent reply	other threads:[~2026-03-12 14:59 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 15:54 [PATCH RFC v2 00/15] Add virtualization support for EGM ankita
2026-02-23 15:55 ` [PATCH RFC v2 01/15] vfio/nvgrace-gpu: Expand module_pci_driver to allow custom module init ankita
2026-02-23 15:55 ` [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for EGM ankita
2026-02-26 14:28   ` Shameer Kolothum Thodi
2026-03-04  0:13   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions ankita
2026-02-26 14:55   ` Shameer Kolothum Thodi
2026-03-04 17:14     ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 04/15] vfio/nvgrace-gpu: Introduce functions to fetch and save EGM info ankita
2026-02-26 15:12   ` Shameer Kolothum Thodi
2026-03-04 17:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM ankita
2026-03-04 18:09   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 06/15] vfio/nvgrace-egm: Introduce egm class and register char device numbers ankita
2026-03-04 18:56   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 07/15] vfio/nvgrace-egm: Register auxiliary driver ops ankita
2026-03-04 19:06   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 08/15] vfio/nvgrace-egm: Expose EGM region as char device ankita
2026-02-26 17:08   ` Shameer Kolothum Thodi
2026-03-04 20:16   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 09/15] vfio/nvgrace-egm: Add chardev ops for EGM management ankita
2026-03-04 22:04   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before handing out to VM ankita
2026-02-26 18:15   ` Shameer Kolothum Thodi
2026-02-26 18:56     ` Jason Gunthorpe
2026-02-26 19:29       ` Shameer Kolothum Thodi
2026-03-04 22:14   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 11/15] vfio/nvgrace-egm: Fetch EGM region retired pages list ankita
2026-03-04 22:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 12/15] vfio/nvgrace-egm: Introduce ioctl to share retired pages ankita
2026-03-04 23:00   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 13/15] vfio/nvgrace-egm: expose the egm size through sysfs ankita
2026-03-04 23:22   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 14/15] vfio/nvgrace-gpu: Add link from pci to EGM ankita
2026-03-04 23:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 15/15] vfio/nvgrace-egm: register EGM PFNMAP range with memory_failure ankita
2026-03-04 23:48   ` Alex Williamson
2026-03-05 17:33 ` [PATCH RFC v2 00/15] Add virtualization support for EGM Alex Williamson
2026-03-11  6:47   ` Ankit Agrawal
2026-03-11 20:37     ` Alex Williamson
2026-03-12 13:51       ` Ankit Agrawal
2026-03-12 14:59         ` Alex Williamson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260312085904.42a98f16@shazbot.org \
    --to=alex@shazbot.org \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=skolothumtho@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.