public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: <ankita@nvidia.com>
To: <ankita@nvidia.com>, <jgg@ziepe.ca>, <yishaih@nvidia.com>,
	<skolothumtho@nvidia.com>, <kevin.tian@intel.com>,
	<alex@shazbot.org>, <aniketa@nvidia.com>, <vsethi@nvidia.com>,
	<mochs@nvidia.com>
Cc: <Yunxiang.Li@amd.com>, <yi.l.liu@intel.com>,
	<zhangdongdong@eswincomputing.com>, <avihaih@nvidia.com>,
	<bhelgaas@google.com>, <peterx@redhat.com>, <pstanner@redhat.com>,
	<apopple@nvidia.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <cjia@nvidia.com>,
	<kwankhede@nvidia.com>, <targupta@nvidia.com>, <zhiw@nvidia.com>,
	<danw@nvidia.com>, <dnigam@nvidia.com>, <kjaju@nvidia.com>
Subject: [PATCH v5 3/7] vfio/nvgrace-gpu: Add support for huge pfnmap
Date: Mon, 24 Nov 2025 11:59:22 +0000	[thread overview]
Message-ID: <20251124115926.119027-4-ankita@nvidia.com> (raw)
In-Reply-To: <20251124115926.119027-1-ankita@nvidia.com>

From: Ankit Agrawal <ankita@nvidia.com>

NVIDIA's Grace based systems have large device memory. The device
memory is mapped as VM_PFNMAP in the VMM VMA. The nvgrace-gpu
module could make use of the huge PFNMAP support added in mm [1].

To achieve this, nvgrace-gpu module is updated to implement huge_fault ops.
The implementation establishes mapping according to the order request.
Note that if the PFN or the VMA address is unaligned to the order, the
mapping fallbacks to the PTE level.

Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1]

cc: Alex Williamson <alex@shazbot.org>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Vikram Sethi <vsethi@nvidia.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 drivers/vfio/pci/nvgrace-gpu/main.c | 43 +++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index f74f3d8e1ebe..c84c01954c9e 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -130,32 +130,58 @@ static void nvgrace_gpu_close_device(struct vfio_device *core_vdev)
 	vfio_pci_core_close_device(core_vdev);
 }
 
-static vm_fault_t nvgrace_gpu_vfio_pci_fault(struct vm_fault *vmf)
+static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
+						  unsigned int order)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct nvgrace_gpu_pci_core_device *nvdev = vma->vm_private_data;
 	int index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 	struct mem_region *memregion;
-	unsigned long pgoff, pfn;
+	unsigned long pgoff, pfn, addr;
 
 	memregion = nvgrace_gpu_memregion(index, nvdev);
 	if (!memregion)
 		return ret;
 
-	pgoff = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+	addr = vmf->address & ~((PAGE_SIZE << order) - 1);
+	pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
 	pfn = PHYS_PFN(memregion->memphys) + pgoff;
 
+	if (order && (addr < vma->vm_start ||
+		      addr + (PAGE_SIZE << order) > vma->vm_end ||
+		      pfn & ((1 << order) - 1)))
+		return VM_FAULT_FALLBACK;
+
 	scoped_guard(rwsem_read, &nvdev->core_device.memory_lock)
-		ret = vmf_insert_pfn(vmf->vma, vmf->address, pfn);
+		ret = vfio_pci_vmf_insert_pfn(vmf, pfn, order);
 
 	return ret;
 }
 
+static vm_fault_t nvgrace_gpu_vfio_pci_fault(struct vm_fault *vmf)
+{
+	return nvgrace_gpu_vfio_pci_huge_fault(vmf, 0);
+}
+
 static const struct vm_operations_struct nvgrace_gpu_vfio_pci_mmap_ops = {
 	.fault = nvgrace_gpu_vfio_pci_fault,
+#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP
+	.huge_fault = nvgrace_gpu_vfio_pci_huge_fault,
+#endif
 };
 
+static size_t nvgrace_gpu_aligned_devmem_size(size_t memlength)
+{
+#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
+	return ALIGN(memlength, PMD_SIZE);
+#endif
+#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP
+	return ALIGN(memlength, PUD_SIZE);
+#endif
+	return memlength;
+}
+
 static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
 			    struct vm_area_struct *vma)
 {
@@ -185,10 +211,10 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
 		return -EOVERFLOW;
 
 	/*
-	 * Check that the mapping request does not go beyond available device
-	 * memory size
+	 * Check that the mapping request does not go beyond the exposed
+	 * device memory size.
 	 */
-	if (end > memregion->memlength)
+	if (end > nvgrace_gpu_aligned_devmem_size(memregion->memlength))
 		return -EINVAL;
 
 	vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
@@ -258,7 +284,8 @@ nvgrace_gpu_ioctl_get_region_info(struct vfio_device *core_vdev,
 
 	sparse->nr_areas = 1;
 	sparse->areas[0].offset = 0;
-	sparse->areas[0].size = memregion->memlength;
+	sparse->areas[0].size =
+		nvgrace_gpu_aligned_devmem_size(memregion->memlength);
 	sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
 	sparse->header.version = 1;
 
-- 
2.34.1


  parent reply	other threads:[~2025-11-24 11:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 11:59 [PATCH v5 0/7] vfio/nvgrace-gpu: Support huge PFNMAP and wait for GPU ready post reset ankita
2025-11-24 11:59 ` [PATCH v5 1/7] vfio/nvgrace-gpu: Use faults to map device memory ankita
2025-11-24 17:09   ` Shameer Kolothum
2025-11-24 17:16     ` Jason Gunthorpe
2025-11-24 17:16   ` Jason Gunthorpe
2025-11-24 11:59 ` [PATCH v5 2/7] vfio: export function to map the VMA ankita
2025-11-24 17:22   ` Shameer Kolothum
2025-11-24 11:59 ` ankita [this message]
2025-11-24 15:32   ` [PATCH v5 3/7] vfio/nvgrace-gpu: Add support for huge pfnmap Alex Williamson
2025-11-24 15:40     ` Ankit Agrawal
2025-11-24 18:08   ` Shameer Kolothum
2025-11-24 18:15   ` Jason Gunthorpe
2025-11-24 11:59 ` [PATCH v5 4/7] vfio: use vfio_pci_core_setup_barmap to map bar in mmap ankita
2025-11-24 18:10   ` Shameer Kolothum
2025-11-24 11:59 ` [PATCH v5 5/7] vfio/nvgrace-gpu: split the code to wait for GPU ready ankita
2025-11-24 15:32   ` Alex Williamson
2025-11-24 15:39     ` Ankit Agrawal
2025-11-24 18:22   ` Shameer Kolothum
2025-11-25  2:53     ` Ankit Agrawal
2025-11-24 11:59 ` [PATCH v5 6/7] vfio/nvgrace-gpu: Inform devmem unmapped after reset ankita
2025-11-24 18:16   ` Jason Gunthorpe
2025-11-25  2:52     ` Ankit Agrawal
2025-11-24 11:59 ` [PATCH v5 7/7] vfio/nvgrace-gpu: wait for the GPU mem to be ready ankita
2025-11-24 18:41   ` Shameer Kolothum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251124115926.119027-4-ankita@nvidia.com \
    --to=ankita@nvidia.com \
    --cc=Yunxiang.Li@amd.com \
    --cc=alex@shazbot.org \
    --cc=aniketa@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=avihaih@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=dnigam@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=pstanner@redhat.com \
    --cc=skolothumtho@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yi.l.liu@intel.com \
    --cc=yishaih@nvidia.com \
    --cc=zhangdongdong@eswincomputing.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox