From: Alex Williamson <alex@shazbot.org>
To: <ankita@nvidia.com>
Cc: <vsethi@nvidia.com>, <jgg@nvidia.com>, <mochs@nvidia.com>,
<jgg@ziepe.ca>, <skolothumtho@nvidia.com>, <cjia@nvidia.com>,
<zhiw@nvidia.com>, <kjaju@nvidia.com>, <yishaih@nvidia.com>,
<kevin.tian@intel.com>, <kvm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>,
alex@shazbot.org
Subject: Re: [PATCH RFC v2 12/15] vfio/nvgrace-egm: Introduce ioctl to share retired pages
Date: Wed, 4 Mar 2026 16:00:55 -0700 [thread overview]
Message-ID: <20260304160055.38ea91be@shazbot.org> (raw)
In-Reply-To: <20260223155514.152435-13-ankita@nvidia.com>
On Mon, 23 Feb 2026 15:55:11 +0000
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> nvgrace-egm module stores the list of retired page offsets to be made
> available for usermode processes. Introduce an ioctl to share the
> information with the userspace.
>
> The ioctl is called by usermode apps such as QEMU to get the retired
> page offsets. The usermode apps are expected to take appropriate action
> to communicate the list to the VM.
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> MAINTAINERS | 1 +
> drivers/vfio/pci/nvgrace-gpu/egm.c | 67 ++++++++++++++++++++++++++++++
> include/uapi/linux/egm.h | 28 +++++++++++++
> 3 files changed, 96 insertions(+)
> create mode 100644 include/uapi/linux/egm.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1fc551d7d667..94cf15a1e82c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27389,6 +27389,7 @@ M: Ankit Agrawal <ankita@nvidia.com>
> L: kvm@vger.kernel.org
> S: Supported
> F: drivers/vfio/pci/nvgrace-gpu/egm.c
> +F: include/uapi/linux/egm.h
>
> VFIO PCI DEVICE SPECIFIC DRIVERS
> R: Jason Gunthorpe <jgg@nvidia.com>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm.c b/drivers/vfio/pci/nvgrace-gpu/egm.c
> index 077de3833046..918979d8fcd4 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/egm.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm.c
> @@ -5,6 +5,7 @@
>
> #include <linux/vfio_pci_core.h>
> #include <linux/nvgrace-egm.h>
> +#include <linux/egm.h>
>
> #define MAX_EGM_NODES 4
>
> @@ -119,11 +120,77 @@ static int nvgrace_egm_mmap(struct file *file, struct vm_area_struct *vma)
> vma->vm_page_prot);
> }
>
> +static long nvgrace_egm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> + unsigned long minsz = offsetofend(struct egm_retired_pages_list, count);
> + struct egm_retired_pages_list info;
> + void __user *uarg = (void __user *)arg;
> + struct chardev *egm_chardev = file->private_data;
> +
> + if (copy_from_user(&info, uarg, minsz))
> + return -EFAULT;
> +
> + if (info.argsz < minsz || !egm_chardev)
> + return -EINVAL;
How could we get here with !egm_chardev?
> +
> + switch (cmd) {
> + case EGM_RETIRED_PAGES_LIST:
> + int ret;
> + unsigned long retired_page_struct_size = sizeof(struct egm_retired_pages_info);
> + struct egm_retired_pages_info tmp;
> + struct h_node *cur_page;
> + struct hlist_node *tmp_node;
> + unsigned long bkt;
> + int count = 0, index = 0;
No brackets for inline declarations. Ordering could be improved.
> +
> + hash_for_each_safe(egm_chardev->htbl, bkt, tmp_node, cur_page, node)
> + count++;
Why not keep track of the count as they're added?
Neither loop here needs the _safe variant here since we're not removing
entries.
> +
> + if (info.argsz < (minsz + count * retired_page_struct_size)) {
> + info.argsz = minsz + count * retired_page_struct_size;
> + info.count = 0;
vfio returns success when there's not enough space for compatibility
for new capabilities. For a new ioctl just set argsz and count and
return -ENOSPC.
> + goto done;
> + } else {
We don't need an else if the previous branch unconditionally goes
somewhere else.
> + hash_for_each_safe(egm_chardev->htbl, bkt, tmp_node, cur_page, node) {
> + /*
> + * This check fails if there was an ECC error
> + * after the usermode app read the count of
> + * bad pages through this ioctl.
> + */
> + if (minsz + index * retired_page_struct_size >= info.argsz) {
> + info.argsz = minsz + index * retired_page_struct_size;
> + info.count = index;
If only we had locking to prevent such races...
> + goto done;
> + }
> +
> + tmp.offset = cur_page->mem_offset;
> + tmp.size = PAGE_SIZE;
Is firmware recording 4K or 64K pages in this table?
The above comment alludes runtime ECC faults, are those a different
page size from the granularity firmware reports in the table?
> +
> + ret = copy_to_user(uarg + minsz +
> + index * retired_page_struct_size,
> + &tmp, retired_page_struct_size);
> + if (ret)
> + return -EFAULT;
> + index++;
> + }
> +
> + info.count = index;
> + }
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> +done:
> + return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0;
> +}
> +
> static const struct file_operations file_ops = {
> .owner = THIS_MODULE,
> .open = nvgrace_egm_open,
> .release = nvgrace_egm_release,
> .mmap = nvgrace_egm_mmap,
> + .unlocked_ioctl = nvgrace_egm_ioctl,
> };
>
> static void egm_chardev_release(struct device *dev)
> diff --git a/include/uapi/linux/egm.h b/include/uapi/linux/egm.h
> new file mode 100644
> index 000000000000..4d3a2304d4f0
> --- /dev/null
> +++ b/include/uapi/linux/egm.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved
2026
> + */
> +
> +#ifndef _UAPI_LINUX_EGM_H
> +#define _UAPI_LINUX_EGM_H
> +
> +#include <linux/types.h>
> +
> +#define EGM_TYPE ('E')
Arbitrarily chosen? Update ioctl-number.rst?
> +
> +struct egm_retired_pages_info {
> + __aligned_u64 offset;
> + __aligned_u64 size;
> +};
> +
> +struct egm_retired_pages_list {
> + __u32 argsz;
> + /* out */
> + __u32 count;
> + /* out */
> + struct egm_retired_pages_info retired_pages[];
> +};
I imagine you want some uapi description of this ioctl. Thanks,
Alex
> +
> +#define EGM_RETIRED_PAGES_LIST _IO(EGM_TYPE, 100)
> +
> +#endif /* _UAPI_LINUX_EGM_H */
next prev parent reply other threads:[~2026-03-04 23:01 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 15:54 [PATCH RFC v2 00/15] Add virtualization support for EGM ankita
2026-02-23 15:55 ` [PATCH RFC v2 01/15] vfio/nvgrace-gpu: Expand module_pci_driver to allow custom module init ankita
2026-02-23 15:55 ` [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for EGM ankita
2026-02-26 14:28 ` Shameer Kolothum Thodi
2026-03-04 0:13 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 03/15] vfio/nvgrace-gpu: track GPUs associated with the EGM regions ankita
2026-02-26 14:55 ` Shameer Kolothum Thodi
2026-03-04 17:14 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 04/15] vfio/nvgrace-gpu: Introduce functions to fetch and save EGM info ankita
2026-02-26 15:12 ` Shameer Kolothum Thodi
2026-03-04 17:37 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM ankita
2026-03-04 18:09 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 06/15] vfio/nvgrace-egm: Introduce egm class and register char device numbers ankita
2026-03-04 18:56 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 07/15] vfio/nvgrace-egm: Register auxiliary driver ops ankita
2026-03-04 19:06 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 08/15] vfio/nvgrace-egm: Expose EGM region as char device ankita
2026-02-26 17:08 ` Shameer Kolothum Thodi
2026-03-04 20:16 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 09/15] vfio/nvgrace-egm: Add chardev ops for EGM management ankita
2026-03-04 22:04 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before handing out to VM ankita
2026-02-26 18:15 ` Shameer Kolothum Thodi
2026-02-26 18:56 ` Jason Gunthorpe
2026-02-26 19:29 ` Shameer Kolothum Thodi
2026-03-04 22:14 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 11/15] vfio/nvgrace-egm: Fetch EGM region retired pages list ankita
2026-03-04 22:37 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 12/15] vfio/nvgrace-egm: Introduce ioctl to share retired pages ankita
2026-03-04 23:00 ` Alex Williamson [this message]
2026-02-23 15:55 ` [PATCH RFC v2 13/15] vfio/nvgrace-egm: expose the egm size through sysfs ankita
2026-03-04 23:22 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 14/15] vfio/nvgrace-gpu: Add link from pci to EGM ankita
2026-03-04 23:37 ` Alex Williamson
2026-02-23 15:55 ` [PATCH RFC v2 15/15] vfio/nvgrace-egm: register EGM PFNMAP range with memory_failure ankita
2026-03-04 23:48 ` Alex Williamson
2026-03-05 17:33 ` [PATCH RFC v2 00/15] Add virtualization support for EGM Alex Williamson
2026-03-11 6:47 ` Ankit Agrawal
2026-03-11 20:37 ` Alex Williamson
2026-03-12 13:51 ` Ankit Agrawal
2026-03-12 14:59 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260304160055.38ea91be@shazbot.org \
--to=alex@shazbot.org \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=jgg@nvidia.com \
--cc=jgg@ziepe.ca \
--cc=kevin.tian@intel.com \
--cc=kjaju@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mochs@nvidia.com \
--cc=skolothumtho@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=yishaih@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox