patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection
@ 2025-04-14 13:46 Jason Gunthorpe
  2025-04-15  7:41 ` Tian, Kevin
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2025-04-14 13:46 UTC (permalink / raw)
  To: Alex Williamson, iommu, kvm; +Cc: patches

VFIO is looking to enable an optimization where it can rely on a fast
unmap operation that returned the size of a larger IOPTE.

Due to how the test was constructed this would only ever succeed on the
AMDv1 page table that supported an 8k contiguous size. Nothing else
supports this.

Alex says the performance win was fairly minor, so lets remove this
code. Always use iommu_iova_to_phys() to extent contiguous pages.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_iommu_type1.c | 49 +--------------------------------
 1 file changed, 1 insertion(+), 48 deletions(-)

v2:
 - Just remove all the fgsp support
v1: https://patch.msgid.link/r/0-v1-0eed68063e59+93d-vfio_fgsp_jgg@nvidia.com

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0ac56072af9f23..afc1449335c308 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -80,7 +80,6 @@ struct vfio_domain {
 	struct iommu_domain	*domain;
 	struct list_head	next;
 	struct list_head	group_list;
-	bool			fgsp : 1;	/* Fine-grained super pages */
 	bool			enforce_cache_coherency : 1;
 };
 
@@ -1095,8 +1094,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 		 * may require hardware cache flushing, try to find the
 		 * largest contiguous physical memory chunk to unmap.
 		 */
-		for (len = PAGE_SIZE;
-		     !domain->fgsp && iova + len < end; len += PAGE_SIZE) {
+		for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) {
 			next = iommu_iova_to_phys(domain->domain, iova + len);
 			if (next != phys + len)
 				break;
@@ -1833,49 +1831,6 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	return ret;
 }
 
-/*
- * We change our unmap behavior slightly depending on whether the IOMMU
- * supports fine-grained superpages.  IOMMUs like AMD-Vi will use a superpage
- * for practically any contiguous power-of-two mapping we give it.  This means
- * we don't need to look for contiguous chunks ourselves to make unmapping
- * more efficient.  On IOMMUs with coarse-grained super pages, like Intel VT-d
- * with discrete 2M/1G/512G/1T superpages, identifying contiguous chunks
- * significantly boosts non-hugetlbfs mappings and doesn't seem to hurt when
- * hugetlbfs is in use.
- */
-static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head *regions)
-{
-	int ret, order = get_order(PAGE_SIZE * 2);
-	struct vfio_iova *region;
-	struct page *pages;
-	dma_addr_t start;
-
-	pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
-	if (!pages)
-		return;
-
-	list_for_each_entry(region, regions, list) {
-		start = ALIGN(region->start, PAGE_SIZE * 2);
-		if (start >= region->end || (region->end - start < PAGE_SIZE * 2))
-			continue;
-
-		ret = iommu_map(domain->domain, start, page_to_phys(pages), PAGE_SIZE * 2,
-				IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE,
-				GFP_KERNEL_ACCOUNT);
-		if (!ret) {
-			size_t unmapped = iommu_unmap(domain->domain, start, PAGE_SIZE);
-
-			if (unmapped == PAGE_SIZE)
-				iommu_unmap(domain->domain, start + PAGE_SIZE, PAGE_SIZE);
-			else
-				domain->fgsp = true;
-		}
-		break;
-	}
-
-	__free_pages(pages, order);
-}
-
 static struct vfio_iommu_group *find_iommu_group(struct vfio_domain *domain,
 						 struct iommu_group *iommu_group)
 {
@@ -2314,8 +2269,6 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		}
 	}
 
-	vfio_test_domain_fgsp(domain, &iova_copy);
-
 	/* replay mappings on new domains */
 	ret = vfio_iommu_replay(iommu, domain);
 	if (ret)

base-commit: c4a104a53e4f7ba76f6372c5034125a24fd7f137
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection
  2025-04-14 13:46 [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection Jason Gunthorpe
@ 2025-04-15  7:41 ` Tian, Kevin
  2025-04-16  2:25 ` Alejandro Jimenez
  2025-05-20 15:39 ` Alex Williamson
  2 siblings, 0 replies; 4+ messages in thread
From: Tian, Kevin @ 2025-04-15  7:41 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, iommu@lists.linux.dev,
	kvm@vger.kernel.org
  Cc: patches@lists.linux.dev

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, April 14, 2025 9:47 PM
> 
> VFIO is looking to enable an optimization where it can rely on a fast
> unmap operation that returned the size of a larger IOPTE.
> 
> Due to how the test was constructed this would only ever succeed on the
> AMDv1 page table that supported an 8k contiguous size. Nothing else
> supports this.
> 
> Alex says the performance win was fairly minor, so lets remove this
> code. Always use iommu_iova_to_phys() to extent contiguous pages.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection
  2025-04-14 13:46 [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection Jason Gunthorpe
  2025-04-15  7:41 ` Tian, Kevin
@ 2025-04-16  2:25 ` Alejandro Jimenez
  2025-05-20 15:39 ` Alex Williamson
  2 siblings, 0 replies; 4+ messages in thread
From: Alejandro Jimenez @ 2025-04-16  2:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, iommu, kvm; +Cc: patches



On 4/14/25 9:46 AM, Jason Gunthorpe wrote:
> VFIO is looking to enable an optimization where it can rely on a fast
> unmap operation that returned the size of a larger IOPTE.
> 
> Due to how the test was constructed this would only ever succeed on the
> AMDv1 page table that supported an 8k contiguous size. Nothing else
> supports this.
> 
> Alex says the performance win was fairly minor, so lets remove this
> code. Always use iommu_iova_to_phys() to extent contiguous pages.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

Verified on AMD Zen4 host, basic sanity test booting KVM guest with 16 VFs.

Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection
  2025-04-14 13:46 [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection Jason Gunthorpe
  2025-04-15  7:41 ` Tian, Kevin
  2025-04-16  2:25 ` Alejandro Jimenez
@ 2025-05-20 15:39 ` Alex Williamson
  2 siblings, 0 replies; 4+ messages in thread
From: Alex Williamson @ 2025-05-20 15:39 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, kvm, patches

On Mon, 14 Apr 2025 10:46:39 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> VFIO is looking to enable an optimization where it can rely on a fast
> unmap operation that returned the size of a larger IOPTE.
> 
> Due to how the test was constructed this would only ever succeed on the
> AMDv1 page table that supported an 8k contiguous size. Nothing else
> supports this.
> 
> Alex says the performance win was fairly minor, so lets remove this
> code. Always use iommu_iova_to_phys() to extent contiguous pages.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 49 +--------------------------------
>  1 file changed, 1 insertion(+), 48 deletions(-)

Applied to vfio next branch for v6.16.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-20 15:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 13:46 [PATCH v2] vfio/type1: Remove Fine Grained Superpages detection Jason Gunthorpe
2025-04-15  7:41 ` Tian, Kevin
2025-04-16  2:25 ` Alejandro Jimenez
2025-05-20 15:39 ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).