public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: Ankit Agrawal <ankita@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>, Matt Ochs <mochs@nvidia.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	Shameer Kolothum Thodi <skolothumtho@nvidia.com>,
	Neo Jia <cjia@nvidia.com>, Zhi Wang <zhiw@nvidia.com>,
	Krishnakant Jaju <kjaju@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	alex@shazbot.org
Subject: Re: [PATCH RFC v2 00/15] Add virtualization support for EGM
Date: Thu, 12 Mar 2026 08:59:04 -0600	[thread overview]
Message-ID: <20260312085904.42a98f16@shazbot.org> (raw)
In-Reply-To: <SA1PR12MB7199673D9DE79C11DB6B9E33B044A@SA1PR12MB7199.namprd12.prod.outlook.com>

On Thu, 12 Mar 2026 13:51:20 +0000
Ankit Agrawal <ankita@nvidia.com> wrote:

> >> > nvgrace-gpu is manipulating sysfs
> >> > on devices owned by nvgrace-egm, we don't have mechanisms to manage the
> >> > aux device relative to the state of the GPU, we're trying to add a
> >> > driver that can bind to device created by an out-of-tree driver, and
> >> > we're inventing new uAPIs on the chardev for things that already exist
> >> > for vfio regions.  
> >>
> >> Sorry for the confusion. The nvgrace-egm would not bind to the device
> >> created by the out-of-tree driver. We would have a separate out-of-tree
> >> equivalent of nvgrace-egm to bind to the device by the out-of-tree vfio
> >> driver. Maybe we can consider exposing a register/unregister APIs from
> >> nvgrace-egm where a module (in-tree nvgrace / out-of-tree) can register
> >> a pdev and nvgrace-egm can use to fetch the region info.  
> >
> > Ok, this wasn't clear to me, but does that also mean that if some GPUs
> > are managed by nvgrace-gpu and others by out-of-tree drivers that the
> > in-kernel and out-of-tree equivalent drivers are both installing
> > chardevs as /dev/egmXX?  Playing in the same space is ugly, but what
> > happens when the 2 GPUs per socket are split between drivers and they
> > both try to added the same chardev?  
> 
> But that would be an unsupported configuration. It is expected that all the
> GPUs on the system and the EGM char devices to be attached to the same
> VM for full functionality. So either all the devices (GPU and EGM chardev)
> would be bound to nvgrace or to the out-of-tree module. Please refer sec 8.1
> https://docs.nvidia.com/multi-node-nvlink-systems/partition-guide-v1-2.pdf
> Perhaps I should add this information in the commit message.

Just because it can be documented as a policy doesn't make it an
agreeable architecture.

> > However, I'd then ask the question why we're associating EGM to the GPU
> > PCI driver at all.  For instance, why should nvgrace-gpu spawn aux
> > devices to feed into an nvgrace-egm driver, and duplicate that whole
> > thing in an out-of-tree driver, when we could just have one in-kernel
> > platform(?) driver walk ACPI, find these ranges, and expose them as
> > chardev entirely independent of the PCI driver bound to the GPU?  
> 
> So a new platform driver to walk through the ACPI and look for EGM properties
> and create EGM char devs? 
> 
> Maybe it is okay, but given that all the 4 EGM properties are under the GPU's
> ACPI node and there being no independent ACPI _HID device identity, it sounds
> a bit off to me. Do we have a precedent like that?
> 
> But as I mentioned above, the expectation is that the EGM devices and the GPU
> devices to be assigned to the same VM. So would it not make sense that we
> keep the association between the EGM devices and the GPU devices?

You're telling me that the EGM access is 100% independent of any state
related to the GPU, so why would we tie the lifecycle of these aux
devices to any particular driver for the GPU or re-implement it across
multiple drivers?  That doesn't make sense to me.  Thanks,

Alex

      reply	other threads:[~2026-03-12 14:59 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 15:54 [PATCH RFC v2 00/15] Add virtualization support for EGM ankita
2026-02-23 15:55 ` [PATCH RFC v2 01/15] vfio/nvgrace-gpu: Expand module_pci_driver to allow custom module init ankita
2026-02-23 15:55 ` [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for EGM ankita
2026-02-26 14:28   ` Shameer Kolothum Thodi
2026-03-04  0:13   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions ankita
2026-02-26 14:55   ` Shameer Kolothum Thodi
2026-03-04 17:14     ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 04/15] vfio/nvgrace-gpu: Introduce functions to fetch and save EGM info ankita
2026-02-26 15:12   ` Shameer Kolothum Thodi
2026-03-04 17:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM ankita
2026-03-04 18:09   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 06/15] vfio/nvgrace-egm: Introduce egm class and register char device numbers ankita
2026-03-04 18:56   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 07/15] vfio/nvgrace-egm: Register auxiliary driver ops ankita
2026-03-04 19:06   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 08/15] vfio/nvgrace-egm: Expose EGM region as char device ankita
2026-02-26 17:08   ` Shameer Kolothum Thodi
2026-03-04 20:16   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 09/15] vfio/nvgrace-egm: Add chardev ops for EGM management ankita
2026-03-04 22:04   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before handing out to VM ankita
2026-02-26 18:15   ` Shameer Kolothum Thodi
2026-02-26 18:56     ` Jason Gunthorpe
2026-02-26 19:29       ` Shameer Kolothum Thodi
2026-03-04 22:14   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 11/15] vfio/nvgrace-egm: Fetch EGM region retired pages list ankita
2026-03-04 22:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 12/15] vfio/nvgrace-egm: Introduce ioctl to share retired pages ankita
2026-03-04 23:00   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 13/15] vfio/nvgrace-egm: expose the egm size through sysfs ankita
2026-03-04 23:22   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 14/15] vfio/nvgrace-gpu: Add link from pci to EGM ankita
2026-03-04 23:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 15/15] vfio/nvgrace-egm: register EGM PFNMAP range with memory_failure ankita
2026-03-04 23:48   ` Alex Williamson
2026-03-05 17:33 ` [PATCH RFC v2 00/15] Add virtualization support for EGM Alex Williamson
2026-03-11  6:47   ` Ankit Agrawal
2026-03-11 20:37     ` Alex Williamson
2026-03-12 13:51       ` Ankit Agrawal
2026-03-12 14:59         ` Alex Williamson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260312085904.42a98f16@shazbot.org \
    --to=alex@shazbot.org \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=skolothumtho@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox