Re: [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Williamson <alex@shazbot.org>
To: Shameer Kolothum Thodi <skolothumtho@nvidia.com>
Cc: Ankit Agrawal <ankita@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Matt Ochs <mochs@nvidia.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>, Neo Jia <cjia@nvidia.com>,
	Zhi Wang <zhiw@nvidia.com>, Krishnakant Jaju <kjaju@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	alex@shazbot.org
Subject: Re: [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions
Date: Wed, 4 Mar 2026 10:14:09 -0700	[thread overview]
Message-ID: <20260304101409.3c069046@shazbot.org> (raw)
In-Reply-To: <CH3PR12MB75483F6074324471868CC05EAB72A@CH3PR12MB7548.namprd12.prod.outlook.com>

On Thu, 26 Feb 2026 14:55:37 +0000
Shameer Kolothum Thodi <skolothumtho@nvidia.com> wrote:

> > -----Original Message-----
> > From: Ankit Agrawal <ankita@nvidia.com>
> > Sent: 23 February 2026 15:55
> > To: Ankit Agrawal <ankita@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>;
> > Jason Gunthorpe <jgg@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> > jgg@ziepe.ca; Shameer Kolothum Thodi <skolothumtho@nvidia.com>;
> > alex@shazbot.org
> > Cc: Neo Jia <cjia@nvidia.com>; Zhi Wang <zhiw@nvidia.com>; Krishnakant
> > Jaju <kjaju@nvidia.com>; Yishai Hadas <yishaih@nvidia.com>;
> > kevin.tian@intel.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org
> > Subject: [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with
> > the EGM regions
> > 
> > From: Ankit Agrawal <ankita@nvidia.com>
> > 
> > Grace Blackwell systems could have multiple GPUs on a socket and
> > thus are associated with the corresponding EGM region for that
> > socket. Track the GPUs as a list.
> > 
> > On the device probe, the device pci_dev struct is added to a
> > linked list of the appropriate EGM region.
> > 
> > Similarly on device remove, the pci_dev struct for the GPU
> > is removed from the EGM region.
> > 
> > Since the GPUs on a socket have the same EGM region, they have
> > the have the same set of EGM region information. Skip the EGM
> > region information fetch if already done through a differnt
> > GPU on the same socket.
> > 
> > Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> > ---
> >  drivers/vfio/pci/nvgrace-gpu/egm_dev.c | 29 ++++++++++++++++++++
> >  drivers/vfio/pci/nvgrace-gpu/egm_dev.h |  4 +++
> >  drivers/vfio/pci/nvgrace-gpu/main.c    | 37 +++++++++++++++++++++++---
> >  include/linux/nvgrace-egm.h            |  6 +++++
> >  4 files changed, 72 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> > b/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> > index faf658723f7a..0bf95688a486 100644
> > --- a/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> > +++ b/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> > @@ -17,6 +17,33 @@ int nvgrace_gpu_has_egm_property(struct pci_dev
> > *pdev, u64 *pegmpxm)
> >  					pegmpxm);
> >  }
> > 
> > +int add_gpu(struct nvgrace_egm_dev *egm_dev, struct pci_dev *pdev)
> > +{
> > +	struct gpu_node *node;
> > +
> > +	node = kzalloc(sizeof(*node), GFP_KERNEL);
> > +	if (!node)
> > +		return -ENOMEM;
> > +
> > +	node->pdev = pdev;
> > +
> > +	list_add_tail(&node->list, &egm_dev->gpus);
> > +
> > +	return 0;
> > +}
> > +
> > +void remove_gpu(struct nvgrace_egm_dev *egm_dev, struct pci_dev *pdev)
> > +{
> > +	struct gpu_node *node, *tmp;
> > +
> > +	list_for_each_entry_safe(node, tmp, &egm_dev->gpus, list) {  
> 
> Looks like this gpu list also will require a lock.

+1

> Can we get rid of this gpu list by having a refcount_t in struct nvgrace_egm_dev?

+1

In this implementation, a reference count seems sufficient and the
egm_dev list could be moved to egm_dev.c, where a get_or_create
function could handle the de-dupe and refcount and a put function could
deference and free.

We'd only need reference to the GPU pci_dev if we needed to invalidate
mappings across a GPU reset, or perhaps if we were exposing multiple
EGM devices per socket, one for each GPU route.

> > +		if (node->pdev == pdev) {
> > +			list_del(&node->list);
> > +			kfree(node);
> > +		}

Also why do we continue searching the list after a match is found?
Thanks,

Alex

> > +	}
> > +}
> > +
> >  static void nvgrace_gpu_release_aux_device(struct device *device)
> >  {
> >  	struct auxiliary_device *aux_dev = container_of(device, struct
> > auxiliary_device, dev);
> > @@ -37,6 +64,8 @@ nvgrace_gpu_create_aux_device(struct pci_dev *pdev,
> > const char *name,
> >  		goto create_err;
> > 
> >  	egm_dev->egmpxm = egmpxm;
> > +	INIT_LIST_HEAD(&egm_dev->gpus);
> > +
> >  	egm_dev->aux_dev.id = egmpxm;
> >  	egm_dev->aux_dev.name = name;
> >  	egm_dev->aux_dev.dev.release = nvgrace_gpu_release_aux_device;
> > diff --git a/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> > b/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> > index c00f5288f4e7..1635753c9e50 100644
> > --- a/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> > +++ b/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> > @@ -10,6 +10,10 @@
> > 
> >  int nvgrace_gpu_has_egm_property(struct pci_dev *pdev, u64 *pegmpxm);
> > 
> > +int add_gpu(struct nvgrace_egm_dev *egm_dev, struct pci_dev *pdev);
> > +
> > +void remove_gpu(struct nvgrace_egm_dev *egm_dev, struct pci_dev *pdev);
> > +
> >  struct nvgrace_egm_dev *
> >  nvgrace_gpu_create_aux_device(struct pci_dev *pdev, const char *name,
> >  			      u64 egmphys);
> > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-
> > gpu/main.c
> > index 23028e6e7192..3dd0c57e5789 100644
> > --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> > @@ -77,9 +77,10 @@ static struct list_head egm_dev_list;
> > 
> >  static int nvgrace_gpu_create_egm_aux_device(struct pci_dev *pdev)
> >  {
> > -	struct nvgrace_egm_dev_entry *egm_entry;
> > +	struct nvgrace_egm_dev_entry *egm_entry = NULL;
> >  	u64 egmpxm;
> >  	int ret = 0;
> > +	bool is_new_region = false;
> > 
> >  	/*
> >  	 * EGM is an optional feature enabled in SBIOS. If disabled, there
> > @@ -90,6 +91,19 @@ static int nvgrace_gpu_create_egm_aux_device(struct
> > pci_dev *pdev)
> >  	if (nvgrace_gpu_has_egm_property(pdev, &egmpxm))
> >  		goto exit;
> > 
> > +	list_for_each_entry(egm_entry, &egm_dev_list, list) {
> > +		/*
> > +		 * A system could have multiple GPUs associated with an
> > +		 * EGM region and will have the same set of EGM region
> > +		 * information. Skip the EGM region information fetch if
> > +		 * already done through a differnt GPU on the same socket.
> > +		 */
> > +		if (egm_entry->egm_dev->egmpxm == egmpxm)
> > +			goto add_gpu;
> > +	}
> > +
> > +	is_new_region = true;
> > +
> >  	egm_entry = kzalloc(sizeof(*egm_entry), GFP_KERNEL);
> >  	if (!egm_entry)
> >  		return -ENOMEM;
> > @@ -98,13 +112,24 @@ static int
> > nvgrace_gpu_create_egm_aux_device(struct pci_dev *pdev)
> >  		nvgrace_gpu_create_aux_device(pdev,
> > NVGRACE_EGM_DEV_NAME,
> >  					      egmpxm);
> >  	if (!egm_entry->egm_dev) {
> > -		kvfree(egm_entry);
> >  		ret = -EINVAL;
> > -		goto exit;
> > +		goto free_egm_entry;
> >  	}
> > 
> > -	list_add_tail(&egm_entry->list, &egm_dev_list);
> > +add_gpu:
> > +	ret = add_gpu(egm_entry->egm_dev, pdev);
> > +	if (ret)
> > +		goto free_dev;
> > 
> > +	if (is_new_region)
> > +		list_add_tail(&egm_entry->list, &egm_dev_list);  
> 
> So this is where you address the previous patch comment I suppose...
> If so, need to change the commit description there.
> 
> > +	return 0;
> > +
> > +free_dev:
> > +	if (is_new_region)
> > +		auxiliary_device_destroy(&egm_entry->egm_dev->aux_dev);
> > +free_egm_entry:
> > +	kvfree(egm_entry);  
> 
> Suppose the add_gpu() above fails, then you will end up here with an existing 
> egm_entry which might be in use.
> 
> Thanks,
> Shameer
> 
>

next prev parent reply	other threads:[~2026-03-04 17:14 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 15:54 [PATCH RFC v2 00/15] Add virtualization support for EGM ankita
2026-02-23 15:55 ` [PATCH RFC v2 01/15] vfio/nvgrace-gpu: Expand module_pci_driver to allow custom module init ankita
2026-02-23 15:55 ` [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for EGM ankita
2026-02-26 14:28   ` Shameer Kolothum Thodi
2026-03-04  0:13   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions ankita
2026-02-26 14:55   ` Shameer Kolothum Thodi
2026-03-04 17:14     ` Alex Williamson [this message]
2026-02-23 15:55 ` [PATCH RFC v2 04/15] vfio/nvgrace-gpu: Introduce functions to fetch and save EGM info ankita
2026-02-26 15:12   ` Shameer Kolothum Thodi
2026-03-04 17:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM ankita
2026-03-04 18:09   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 06/15] vfio/nvgrace-egm: Introduce egm class and register char device numbers ankita
2026-03-04 18:56   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 07/15] vfio/nvgrace-egm: Register auxiliary driver ops ankita
2026-03-04 19:06   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 08/15] vfio/nvgrace-egm: Expose EGM region as char device ankita
2026-02-26 17:08   ` Shameer Kolothum Thodi
2026-03-04 20:16   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 09/15] vfio/nvgrace-egm: Add chardev ops for EGM management ankita
2026-03-04 22:04   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before handing out to VM ankita
2026-02-26 18:15   ` Shameer Kolothum Thodi
2026-02-26 18:56     ` Jason Gunthorpe
2026-02-26 19:29       ` Shameer Kolothum Thodi
2026-03-04 22:14   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 11/15] vfio/nvgrace-egm: Fetch EGM region retired pages list ankita
2026-03-04 22:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 12/15] vfio/nvgrace-egm: Introduce ioctl to share retired pages ankita
2026-03-04 23:00   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 13/15] vfio/nvgrace-egm: expose the egm size through sysfs ankita
2026-03-04 23:22   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 14/15] vfio/nvgrace-gpu: Add link from pci to EGM ankita
2026-03-04 23:37   ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 15/15] vfio/nvgrace-egm: register EGM PFNMAP range with memory_failure ankita
2026-03-04 23:48   ` Alex Williamson
2026-03-05 17:33 ` [PATCH RFC v2 00/15] Add virtualization support for EGM Alex Williamson
2026-03-11  6:47   ` Ankit Agrawal
2026-03-11 20:37     ` Alex Williamson
2026-03-12 13:51       ` Ankit Agrawal
2026-03-12 14:59         ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260304101409.3c069046@shazbot.org \
    --to=alex@shazbot.org \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=skolothumtho@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.