linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/23] iommu: Further abstract iommu-pages
@ 2025-02-25 19:39 Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 01/23] iommu/terga: Do not use struct page as the handle for as->pd memory Jason Gunthorpe
                   ` (24 more replies)
  0 siblings, 25 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

This is part of the consolidated iommu page table work, it brings the
allocator I previously sketched to all drivers.

iommu-pages is a small abstraction for allocating page table pages that
iommu drivers use. It has a few properties that distinguish it from the
other allocators in the kernel:

 - Allocations are always power of two, and always physically aligned to
   their size
 - Allocations can be threaded on a list, in atomic contexts without
   memory allocations. The list is only used for freeing batch of pages
   (ie after IOTLB flush as the mm does)
 - Allocations are accounted for in the secondary page table counters
 - Allocations can sometimes be less than a full CPU page, 1/4 and 1/16 are
   some common sub page allocation sizes.
 - Allocations can sometimes be multiple CPU pages
 - In future I'd like atomic-safe RCU free of the page lists, as the mm does

Make the API tighter and leak fewer internal details to the
callers. Particularly this series aims to remove all struct page usage
related to iommu-pages memory from all drivers, this is things such as:

 struct page
 virt_to_page()
 page_to_virt()
 page_to_pfn()
 pfn_to_page()
 dma_map_page()
 page_address()
 page->lru

Once drivers no longer use struct page convert iommu-pages to folios and
use a private memory descriptor. This should help prepare iommu for
Matthew's memdesc project and clears the way to using more space in the
struct page for iommu-pages features in future.

Improve the API to work directly on sizes instead of order, the drivers
generally have HW specs and code paths that already have specific sizes.
Pass those sizes down into the allocator to remove some boiler plate
get_order() in drivers. This is cleanup to be ready for a possible sub
page allocator some day.

This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pages

v3:
 - Fix comments
 - Rename __iommu_free_page() to __iommu_free_desc()
 - Retain the max IMR table size comment in vt-d
v2: https://patch.msgid.link/r/0-v2-545d29711869+a76b5-iommu_pages_jgg@nvidia.com
 - Use struct tegra_pd instead of u32 *
 - Use dma_unmap_single() instead of unmap_page in tegra
 - Fix Tegra SIZE_PT typo
 - Use iommu_free_pages() as the free function name instead of
   iommu_free_page()
 - Remove numa_node_id() and use the NUMA_NO_NODE path with numa_mem_id()
 - Make all the allocation APIs use size only, fully remove order and lg2
   versions
 - Round up the riscv queue size, use SZ_4K in riscv not PAGE_SIZE
 - Convert AMD to use kvalloc for CPU memory and fix the device table to
   round up to 4K
 - Use PAGE_ALIGN instead of get_order in AMD iommu_alloc_4k_pages()
v1: https://patch.msgid.link/r/0-v1-416f64558c7c+2a5-iommu_pages_jgg@nvidia.com

Jason Gunthorpe (23):
  iommu/terga: Do not use struct page as the handle for as->pd memory
  iommu/tegra: Do not use struct page as the handle for pts
  iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages()
  iommu/pages: Make iommu_put_pages_list() work with high order
    allocations
  iommu/pages: Remove the order argument to iommu_free_pages()
  iommu/pages: Remove iommu_free_page()
  iommu/pages: De-inline the substantial functions
  iommu/vtd: Use virt_to_phys()
  iommu/pages: Formalize the freelist API
  iommu/riscv: Convert to use struct iommu_pages_list
  iommu/amd: Convert to use struct iommu_pages_list
  iommu: Change iommu_iotlb_gather to use iommu_page_list
  iommu/pages: Remove iommu_put_pages_list_old and the _Generic
  iommu/pages: Move from struct page to struct ioptdesc and folio
  iommu/pages: Move the __GFP_HIGHMEM checks into the common code
  iommu/pages: Allow sub page sizes to be passed into the allocator
  iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()
  iommu/amd: Use roundup_pow_two() instead of get_order()
  iommu/riscv: Update to use iommu_alloc_pages_node_lg2()
  iommu: Update various drivers to pass in lg2sz instead of order to
    iommu pages
  iommu/pages: Remove iommu_alloc_page/pages()
  iommu/pages: Remove iommu_alloc_page_node()
  iommu/pages: Remove iommu_alloc_pages_node()

 drivers/iommu/Makefile              |   1 +
 drivers/iommu/amd/amd_iommu_types.h |   8 --
 drivers/iommu/amd/init.c            |  91 +++++-------
 drivers/iommu/amd/io_pgtable.c      |  38 +++--
 drivers/iommu/amd/io_pgtable_v2.c   |  12 +-
 drivers/iommu/amd/iommu.c           |   6 +-
 drivers/iommu/amd/ppr.c             |   2 +-
 drivers/iommu/dma-iommu.c           |   9 +-
 drivers/iommu/exynos-iommu.c        |  12 +-
 drivers/iommu/intel/dmar.c          |  10 +-
 drivers/iommu/intel/iommu.c         |  52 +++----
 drivers/iommu/intel/iommu.h         |  26 +---
 drivers/iommu/intel/irq_remapping.c |  12 +-
 drivers/iommu/intel/pasid.c         |  13 +-
 drivers/iommu/intel/pasid.h         |   1 -
 drivers/iommu/intel/prq.c           |   7 +-
 drivers/iommu/io-pgtable-arm.c      |   9 +-
 drivers/iommu/io-pgtable-dart.c     |  23 +--
 drivers/iommu/iommu-pages.c         | 117 +++++++++++++++
 drivers/iommu/iommu-pages.h         | 211 +++++++++-------------------
 drivers/iommu/riscv/iommu.c         |  43 +++---
 drivers/iommu/rockchip-iommu.c      |  14 +-
 drivers/iommu/sun50i-iommu.c        |   6 +-
 drivers/iommu/tegra-smmu.c          | 111 ++++++++-------
 include/linux/iommu.h               |  16 ++-
 25 files changed, 426 insertions(+), 424 deletions(-)
 create mode 100644 drivers/iommu/iommu-pages.c


base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
-- 
2.43.0



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 01/23] iommu/terga: Do not use struct page as the handle for as->pd memory
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 02/23] iommu/tegra: Do not use struct page as the handle for pts Jason Gunthorpe
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Instead use the virtual address. Change from dma_map_page() to
dma_map_single() which works directly on a KVA. Add a type for the pd
table level for clarity.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/tegra-smmu.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 7f633bb5efef16..b6e61f5c0861b0 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -51,6 +51,8 @@ struct tegra_smmu {
 	struct iommu_device iommu;	/* IOMMU Core code handle */
 };
 
+struct tegra_pd;
+
 struct tegra_smmu_as {
 	struct iommu_domain domain;
 	struct tegra_smmu *smmu;
@@ -58,7 +60,7 @@ struct tegra_smmu_as {
 	spinlock_t lock;
 	u32 *count;
 	struct page **pts;
-	struct page *pd;
+	struct tegra_pd *pd;
 	dma_addr_t pd_dma;
 	unsigned id;
 	u32 attr;
@@ -155,6 +157,10 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, unsigned long offset)
 #define SMMU_PDE_ATTR		(SMMU_PDE_READABLE | SMMU_PDE_WRITABLE | \
 				 SMMU_PDE_NONSECURE)
 
+struct tegra_pd {
+	u32 val[SMMU_NUM_PDE];
+};
+
 static unsigned int iova_pd_index(unsigned long iova)
 {
 	return (iova >> SMMU_PDE_SHIFT) & (SMMU_NUM_PDE - 1);
@@ -284,7 +290,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 
 	as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
 
-	as->pd = __iommu_alloc_pages(GFP_KERNEL | __GFP_DMA, 0);
+	as->pd = iommu_alloc_page(GFP_KERNEL | __GFP_DMA);
 	if (!as->pd) {
 		kfree(as);
 		return NULL;
@@ -292,7 +298,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 
 	as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
 	if (!as->count) {
-		__iommu_free_pages(as->pd, 0);
+		iommu_free_page(as->pd);
 		kfree(as);
 		return NULL;
 	}
@@ -300,7 +306,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 	as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
 	if (!as->pts) {
 		kfree(as->count);
-		__iommu_free_pages(as->pd, 0);
+		iommu_free_page(as->pd);
 		kfree(as);
 		return NULL;
 	}
@@ -417,8 +423,8 @@ static int tegra_smmu_as_prepare(struct tegra_smmu *smmu,
 		goto unlock;
 	}
 
-	as->pd_dma = dma_map_page(smmu->dev, as->pd, 0, SMMU_SIZE_PD,
-				  DMA_TO_DEVICE);
+	as->pd_dma =
+		dma_map_single(smmu->dev, as->pd, SMMU_SIZE_PD, DMA_TO_DEVICE);
 	if (dma_mapping_error(smmu->dev, as->pd_dma)) {
 		err = -ENOMEM;
 		goto unlock;
@@ -450,7 +456,7 @@ static int tegra_smmu_as_prepare(struct tegra_smmu *smmu,
 	return 0;
 
 err_unmap:
-	dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD, DMA_TO_DEVICE);
+	dma_unmap_single(smmu->dev, as->pd_dma, SMMU_SIZE_PD, DMA_TO_DEVICE);
 unlock:
 	mutex_unlock(&smmu->lock);
 
@@ -469,7 +475,7 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu *smmu,
 
 	tegra_smmu_free_asid(smmu, as->id);
 
-	dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD, DMA_TO_DEVICE);
+	dma_unmap_single(smmu->dev, as->pd_dma, SMMU_SIZE_PD, DMA_TO_DEVICE);
 
 	as->smmu = NULL;
 
@@ -548,11 +554,11 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
 {
 	unsigned int pd_index = iova_pd_index(iova);
 	struct tegra_smmu *smmu = as->smmu;
-	u32 *pd = page_address(as->pd);
+	struct tegra_pd *pd = as->pd;
 	unsigned long offset = pd_index * sizeof(*pd);
 
 	/* Set the page directory entry first */
-	pd[pd_index] = value;
+	pd->val[pd_index] = value;
 
 	/* The flush the page directory entry from caches */
 	dma_sync_single_range_for_device(smmu->dev, as->pd_dma, offset,
@@ -577,14 +583,12 @@ static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, unsigned long iova,
 	unsigned int pd_index = iova_pd_index(iova);
 	struct tegra_smmu *smmu = as->smmu;
 	struct page *pt_page;
-	u32 *pd;
 
 	pt_page = as->pts[pd_index];
 	if (!pt_page)
 		return NULL;
 
-	pd = page_address(as->pd);
-	*dmap = smmu_pde_to_dma(smmu, pd[pd_index]);
+	*dmap = smmu_pde_to_dma(smmu, as->pd->val[pd_index]);
 
 	return tegra_smmu_pte_offset(pt_page, iova);
 }
@@ -619,9 +623,7 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova,
 
 		*dmap = dma;
 	} else {
-		u32 *pd = page_address(as->pd);
-
-		*dmap = smmu_pde_to_dma(smmu, pd[pde]);
+		*dmap = smmu_pde_to_dma(smmu, as->pd->val[pde]);
 	}
 
 	return tegra_smmu_pte_offset(as->pts[pde], iova);
@@ -645,8 +647,7 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
 	 */
 	if (--as->count[pde] == 0) {
 		struct tegra_smmu *smmu = as->smmu;
-		u32 *pd = page_address(as->pd);
-		dma_addr_t pte_dma = smmu_pde_to_dma(smmu, pd[pde]);
+		dma_addr_t pte_dma = smmu_pde_to_dma(smmu, as->pd->val[pde]);
 
 		tegra_smmu_set_pde(as, iova, 0);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 02/23] iommu/tegra: Do not use struct page as the handle for pts
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 01/23] iommu/terga: Do not use struct page as the handle for as->pd memory Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages() Jason Gunthorpe
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Instead use the virtual address and dma_map_single() like as->pd
uses. Introduce a small struct tegra_pt instead of void * to have some
clarity what is using this API and add compile safety during the
conversion.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/tegra-smmu.c | 74 ++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index b6e61f5c0861b0..c134647292fb22 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -52,6 +52,7 @@ struct tegra_smmu {
 };
 
 struct tegra_pd;
+struct tegra_pt;
 
 struct tegra_smmu_as {
 	struct iommu_domain domain;
@@ -59,7 +60,7 @@ struct tegra_smmu_as {
 	unsigned int use_count;
 	spinlock_t lock;
 	u32 *count;
-	struct page **pts;
+	struct tegra_pt **pts;
 	struct tegra_pd *pd;
 	dma_addr_t pd_dma;
 	unsigned id;
@@ -161,6 +162,10 @@ struct tegra_pd {
 	u32 val[SMMU_NUM_PDE];
 };
 
+struct tegra_pt {
+	u32 val[SMMU_NUM_PTE];
+};
+
 static unsigned int iova_pd_index(unsigned long iova)
 {
 	return (iova >> SMMU_PDE_SHIFT) & (SMMU_NUM_PDE - 1);
@@ -570,11 +575,9 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
 	smmu_flush(smmu);
 }
 
-static u32 *tegra_smmu_pte_offset(struct page *pt_page, unsigned long iova)
+static u32 *tegra_smmu_pte_offset(struct tegra_pt *pt, unsigned long iova)
 {
-	u32 *pt = page_address(pt_page);
-
-	return pt + iova_pt_index(iova);
+	return &pt->val[iova_pt_index(iova)];
 }
 
 static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, unsigned long iova,
@@ -582,19 +585,19 @@ static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, unsigned long iova,
 {
 	unsigned int pd_index = iova_pd_index(iova);
 	struct tegra_smmu *smmu = as->smmu;
-	struct page *pt_page;
+	struct tegra_pt *pt;
 
-	pt_page = as->pts[pd_index];
-	if (!pt_page)
+	pt = as->pts[pd_index];
+	if (!pt)
 		return NULL;
 
 	*dmap = smmu_pde_to_dma(smmu, as->pd->val[pd_index]);
 
-	return tegra_smmu_pte_offset(pt_page, iova);
+	return tegra_smmu_pte_offset(pt, iova);
 }
 
 static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova,
-		       dma_addr_t *dmap, struct page *page)
+		       dma_addr_t *dmap, struct tegra_pt *pt)
 {
 	unsigned int pde = iova_pd_index(iova);
 	struct tegra_smmu *smmu = as->smmu;
@@ -602,21 +605,21 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova,
 	if (!as->pts[pde]) {
 		dma_addr_t dma;
 
-		dma = dma_map_page(smmu->dev, page, 0, SMMU_SIZE_PT,
-				   DMA_TO_DEVICE);
+		dma = dma_map_single(smmu->dev, pt, SMMU_SIZE_PT,
+				     DMA_TO_DEVICE);
 		if (dma_mapping_error(smmu->dev, dma)) {
-			__iommu_free_pages(page, 0);
+			iommu_free_page(pt);
 			return NULL;
 		}
 
 		if (!smmu_dma_addr_valid(smmu, dma)) {
-			dma_unmap_page(smmu->dev, dma, SMMU_SIZE_PT,
-				       DMA_TO_DEVICE);
-			__iommu_free_pages(page, 0);
+			dma_unmap_single(smmu->dev, dma, SMMU_SIZE_PT,
+					 DMA_TO_DEVICE);
+			iommu_free_page(pt);
 			return NULL;
 		}
 
-		as->pts[pde] = page;
+		as->pts[pde] = pt;
 
 		tegra_smmu_set_pde(as, iova, SMMU_MK_PDE(dma, SMMU_PDE_ATTR |
 							      SMMU_PDE_NEXT));
@@ -639,7 +642,7 @@ static void tegra_smmu_pte_get_use(struct tegra_smmu_as *as, unsigned long iova)
 static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
 {
 	unsigned int pde = iova_pd_index(iova);
-	struct page *page = as->pts[pde];
+	struct tegra_pt *pt = as->pts[pde];
 
 	/*
 	 * When no entries in this page table are used anymore, return the
@@ -651,8 +654,9 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
 
 		tegra_smmu_set_pde(as, iova, 0);
 
-		dma_unmap_page(smmu->dev, pte_dma, SMMU_SIZE_PT, DMA_TO_DEVICE);
-		__iommu_free_pages(page, 0);
+		dma_unmap_single(smmu->dev, pte_dma, SMMU_SIZE_PT,
+				 DMA_TO_DEVICE);
+		iommu_free_page(pt);
 		as->pts[pde] = NULL;
 	}
 }
@@ -672,16 +676,16 @@ static void tegra_smmu_set_pte(struct tegra_smmu_as *as, unsigned long iova,
 	smmu_flush(smmu);
 }
 
-static struct page *as_get_pde_page(struct tegra_smmu_as *as,
-				    unsigned long iova, gfp_t gfp,
-				    unsigned long *flags)
+static struct tegra_pt *as_get_pde_page(struct tegra_smmu_as *as,
+					unsigned long iova, gfp_t gfp,
+					unsigned long *flags)
 {
 	unsigned int pde = iova_pd_index(iova);
-	struct page *page = as->pts[pde];
+	struct tegra_pt *pt = as->pts[pde];
 
 	/* at first check whether allocation needs to be done at all */
-	if (page)
-		return page;
+	if (pt)
+		return pt;
 
 	/*
 	 * In order to prevent exhaustion of the atomic memory pool, we
@@ -691,7 +695,7 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	if (gfpflags_allow_blocking(gfp))
 		spin_unlock_irqrestore(&as->lock, *flags);
 
-	page = __iommu_alloc_pages(gfp | __GFP_DMA, 0);
+	pt = iommu_alloc_page(gfp | __GFP_DMA);
 
 	if (gfpflags_allow_blocking(gfp))
 		spin_lock_irqsave(&as->lock, *flags);
@@ -702,13 +706,13 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	 * if allocation succeeded and the allocation failure isn't fatal.
 	 */
 	if (as->pts[pde]) {
-		if (page)
-			__iommu_free_pages(page, 0);
+		if (pt)
+			iommu_free_page(pt);
 
-		page = as->pts[pde];
+		pt = as->pts[pde];
 	}
 
-	return page;
+	return pt;
 }
 
 static int
@@ -718,15 +722,15 @@ __tegra_smmu_map(struct iommu_domain *domain, unsigned long iova,
 {
 	struct tegra_smmu_as *as = to_smmu_as(domain);
 	dma_addr_t pte_dma;
-	struct page *page;
+	struct tegra_pt *pt;
 	u32 pte_attrs;
 	u32 *pte;
 
-	page = as_get_pde_page(as, iova, gfp, flags);
-	if (!page)
+	pt = as_get_pde_page(as, iova, gfp, flags);
+	if (!pt)
 		return -ENOMEM;
 
-	pte = as_get_pte(as, iova, &pte_dma, page);
+	pte = as_get_pte(as, iova, &pte_dma, pt);
 	if (!pte)
 		return -ENOMEM;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 01/23] iommu/terga: Do not use struct page as the handle for as->pd memory Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 02/23] iommu/tegra: Do not use struct page as the handle for pts Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  6:25   ` Baolu Lu
  2025-03-12 11:43   ` Mostafa Saleh
  2025-02-25 19:39 ` [PATCH v3 04/23] iommu/pages: Make iommu_put_pages_list() work with high order allocations Jason Gunthorpe
                   ` (21 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

These were only used by tegra-smmu and leaked the struct page out of the
API. Delete them since tega-smmu has been converted to the other APIs.

In the process flatten the call tree so we have fewer one line functions
calling other one line functions.. iommu_alloc_pages_node() is the real
allocator and everything else can just call it directly.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-pages.h | 49 ++++++-------------------------------
 1 file changed, 7 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 82ebf00330811c..0ca2437989a0e1 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -46,40 +46,6 @@ static inline void __iommu_free_account(struct page *page, int order)
 	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
 }
 
-/**
- * __iommu_alloc_pages - allocate a zeroed page of a given order.
- * @gfp: buddy allocator flags
- * @order: page order
- *
- * returns the head struct page of the allocated page.
- */
-static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
-{
-	struct page *page;
-
-	page = alloc_pages(gfp | __GFP_ZERO, order);
-	if (unlikely(!page))
-		return NULL;
-
-	__iommu_alloc_account(page, order);
-
-	return page;
-}
-
-/**
- * __iommu_free_pages - free page of a given order
- * @page: head struct page of the page
- * @order: page order
- */
-static inline void __iommu_free_pages(struct page *page, int order)
-{
-	if (!page)
-		return;
-
-	__iommu_free_account(page, order);
-	__free_pages(page, order);
-}
-
 /**
  * iommu_alloc_pages_node - allocate a zeroed page of a given order from
  * specific NUMA node.
@@ -110,12 +76,7 @@ static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
  */
 static inline void *iommu_alloc_pages(gfp_t gfp, int order)
 {
-	struct page *page = __iommu_alloc_pages(gfp, order);
-
-	if (unlikely(!page))
-		return NULL;
-
-	return page_address(page);
+	return iommu_alloc_pages_node(numa_node_id(), gfp, order);
 }
 
 /**
@@ -138,7 +99,7 @@ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
  */
 static inline void *iommu_alloc_page(gfp_t gfp)
 {
-	return iommu_alloc_pages(gfp, 0);
+	return iommu_alloc_pages_node(numa_node_id(), gfp, 0);
 }
 
 /**
@@ -148,10 +109,14 @@ static inline void *iommu_alloc_page(gfp_t gfp)
  */
 static inline void iommu_free_pages(void *virt, int order)
 {
+	struct page *page;
+
 	if (!virt)
 		return;
 
-	__iommu_free_pages(virt_to_page(virt), order);
+	page = virt_to_page(virt);
+	__iommu_free_account(page, order);
+	__free_pages(page, order);
 }
 
 /**
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 04/23] iommu/pages: Make iommu_put_pages_list() work with high order allocations
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  6:28   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages() Jason Gunthorpe
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

alloc_pages_node(, order) needs to be paired with __free_pages(, order) to
free all the allocated pages. For order != 0 the return from
alloc_pages_node() is just a page list, it hasn't been formed into a
folio.

However iommu_put_pages_list() just calls put_page() on the head page of
an allocation, which will end up leaking the tail pages if order != 0.

Fix this by using __GFP_COMP to create a high order folio and then always
use put_page() to free the full high order folio.

__iommu_free_account() can get the order of the allocation via
folio_order(), which corrects the accounting of high order allocations in
iommu_put_pages_list(). This is the same technique slub uses.

As far as I can tell, none of the places using high order allocations are
also using the free list, so this not a current bug.

Fixes: 06c375053cef ("iommu/vt-d: add wrapper functions for page allocations")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-pages.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 0ca2437989a0e1..26b91940bdc146 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -38,8 +38,9 @@ static inline void __iommu_alloc_account(struct page *page, int order)
  * @page: head struct page of the page.
  * @order: order of the page
  */
-static inline void __iommu_free_account(struct page *page, int order)
+static inline void __iommu_free_account(struct page *page)
 {
+	unsigned int order = folio_order(page_folio(page));
 	const long pgcnt = 1l << order;
 
 	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt);
@@ -57,7 +58,8 @@ static inline void __iommu_free_account(struct page *page, int order)
  */
 static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
 {
-	struct page *page = alloc_pages_node(nid, gfp | __GFP_ZERO, order);
+	struct page *page =
+		alloc_pages_node(nid, gfp | __GFP_ZERO | __GFP_COMP, order);
 
 	if (unlikely(!page))
 		return NULL;
@@ -115,8 +117,8 @@ static inline void iommu_free_pages(void *virt, int order)
 		return;
 
 	page = virt_to_page(virt);
-	__iommu_free_account(page, order);
-	__free_pages(page, order);
+	__iommu_free_account(page);
+	put_page(page);
 }
 
 /**
@@ -143,7 +145,7 @@ static inline void iommu_put_pages_list(struct list_head *page)
 		struct page *p = list_entry(page->prev, struct page, lru);
 
 		list_del(&p->lru);
-		__iommu_free_account(p, 0);
+		__iommu_free_account(p);
 		put_page(p);
 	}
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 04/23] iommu/pages: Make iommu_put_pages_list() work with high order allocations Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  6:32   ` Baolu Lu
  2025-03-12 11:43   ` Mostafa Saleh
  2025-02-25 19:39 ` [PATCH v3 06/23] iommu/pages: Remove iommu_free_page() Jason Gunthorpe
                   ` (19 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Now that we have a folio under the allocation iommu_free_pages() can know
the order of the original allocation and do the correct thing to free it.

The next patch will rename iommu_free_page() to iommu_free_pages() so we
have naming consistency with iommu_alloc_pages_node().

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/init.c            | 28 +++++++++++-----------------
 drivers/iommu/amd/ppr.c             |  2 +-
 drivers/iommu/exynos-iommu.c        |  8 ++++----
 drivers/iommu/intel/irq_remapping.c |  4 ++--
 drivers/iommu/intel/pasid.c         |  3 +--
 drivers/iommu/intel/pasid.h         |  1 -
 drivers/iommu/intel/prq.c           |  4 ++--
 drivers/iommu/io-pgtable-arm.c      |  4 ++--
 drivers/iommu/io-pgtable-dart.c     | 10 ++++------
 drivers/iommu/iommu-pages.h         |  9 +++++----
 drivers/iommu/riscv/iommu.c         |  6 ++----
 drivers/iommu/sun50i-iommu.c        |  2 +-
 12 files changed, 35 insertions(+), 46 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c5cd92edada061..f47ff0e0c75f4e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -653,8 +653,7 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	iommu_free_pages(pci_seg->dev_table,
-			 get_order(pci_seg->dev_table_size));
+	iommu_free_pages(pci_seg->dev_table);
 	pci_seg->dev_table = NULL;
 }
 
@@ -671,8 +670,7 @@ static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	iommu_free_pages(pci_seg->rlookup_table,
-			 get_order(pci_seg->rlookup_table_size));
+	iommu_free_pages(pci_seg->rlookup_table);
 	pci_seg->rlookup_table = NULL;
 }
 
@@ -691,8 +689,7 @@ static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_se
 static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
 	kmemleak_free(pci_seg->irq_lookup_table);
-	iommu_free_pages(pci_seg->irq_lookup_table,
-			 get_order(pci_seg->rlookup_table_size));
+	iommu_free_pages(pci_seg->irq_lookup_table);
 	pci_seg->irq_lookup_table = NULL;
 }
 
@@ -716,8 +713,7 @@ static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
 
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	iommu_free_pages(pci_seg->alias_table,
-			 get_order(pci_seg->alias_table_size));
+	iommu_free_pages(pci_seg->alias_table);
 	pci_seg->alias_table = NULL;
 }
 
@@ -826,7 +822,7 @@ static void iommu_disable_command_buffer(struct amd_iommu *iommu)
 
 static void __init free_command_buffer(struct amd_iommu *iommu)
 {
-	iommu_free_pages(iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
+	iommu_free_pages(iommu->cmd_buf);
 }
 
 void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
@@ -838,7 +834,7 @@ void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
 	if (buf &&
 	    check_feature(FEATURE_SNP) &&
 	    set_memory_4k((unsigned long)buf, (1 << order))) {
-		iommu_free_pages(buf, order);
+		iommu_free_pages(buf);
 		buf = NULL;
 	}
 
@@ -882,14 +878,14 @@ static void iommu_disable_event_buffer(struct amd_iommu *iommu)
 
 static void __init free_event_buffer(struct amd_iommu *iommu)
 {
-	iommu_free_pages(iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
+	iommu_free_pages(iommu->evt_buf);
 }
 
 static void free_ga_log(struct amd_iommu *iommu)
 {
 #ifdef CONFIG_IRQ_REMAP
-	iommu_free_pages(iommu->ga_log, get_order(GA_LOG_SIZE));
-	iommu_free_pages(iommu->ga_log_tail, get_order(8));
+	iommu_free_pages(iommu->ga_log);
+	iommu_free_pages(iommu->ga_log_tail);
 #endif
 }
 
@@ -2781,8 +2777,7 @@ static void early_enable_iommus(void)
 
 		for_each_pci_segment(pci_seg) {
 			if (pci_seg->old_dev_tbl_cpy != NULL) {
-				iommu_free_pages(pci_seg->old_dev_tbl_cpy,
-						 get_order(pci_seg->dev_table_size));
+				iommu_free_pages(pci_seg->old_dev_tbl_cpy);
 				pci_seg->old_dev_tbl_cpy = NULL;
 			}
 		}
@@ -2795,8 +2790,7 @@ static void early_enable_iommus(void)
 		pr_info("Copied DEV table from previous kernel.\n");
 
 		for_each_pci_segment(pci_seg) {
-			iommu_free_pages(pci_seg->dev_table,
-					 get_order(pci_seg->dev_table_size));
+			iommu_free_pages(pci_seg->dev_table);
 			pci_seg->dev_table = pci_seg->old_dev_tbl_cpy;
 		}
 
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index 7c67d69f0b8cad..e6767c057d01fa 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -48,7 +48,7 @@ void amd_iommu_enable_ppr_log(struct amd_iommu *iommu)
 
 void __init amd_iommu_free_ppr_log(struct amd_iommu *iommu)
 {
-	iommu_free_pages(iommu->ppr_log, get_order(PPR_LOG_SIZE));
+	iommu_free_pages(iommu->ppr_log);
 }
 
 /*
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index c666ecab955d21..1019e08b43b71c 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -932,9 +932,9 @@ static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device *dev)
 	return &domain->domain;
 
 err_lv2ent:
-	iommu_free_pages(domain->lv2entcnt, 1);
+	iommu_free_pages(domain->lv2entcnt);
 err_counter:
-	iommu_free_pages(domain->pgtable, 2);
+	iommu_free_pages(domain->pgtable);
 err_pgtable:
 	kfree(domain);
 	return NULL;
@@ -975,8 +975,8 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain)
 					phys_to_virt(base));
 		}
 
-	iommu_free_pages(domain->pgtable, 2);
-	iommu_free_pages(domain->lv2entcnt, 1);
+	iommu_free_pages(domain->pgtable);
+	iommu_free_pages(domain->lv2entcnt);
 	kfree(domain);
 }
 
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index ad795c772f21b5..d6b796f8f100cd 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -620,7 +620,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 out_free_bitmap:
 	bitmap_free(bitmap);
 out_free_pages:
-	iommu_free_pages(ir_table_base, INTR_REMAP_PAGE_ORDER);
+	iommu_free_pages(ir_table_base);
 out_free_table:
 	kfree(ir_table);
 
@@ -641,7 +641,7 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
 			irq_domain_free_fwnode(fn);
 			iommu->ir_domain = NULL;
 		}
-		iommu_free_pages(iommu->ir_table->base, INTR_REMAP_PAGE_ORDER);
+		iommu_free_pages(iommu->ir_table->base);
 		bitmap_free(iommu->ir_table->bitmap);
 		kfree(iommu->ir_table);
 		iommu->ir_table = NULL;
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index fb59a7d35958f5..00da94b1c4c907 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -67,7 +67,6 @@ int intel_pasid_alloc_table(struct device *dev)
 	}
 
 	pasid_table->table = dir;
-	pasid_table->order = order;
 	pasid_table->max_pasid = 1 << (order + PAGE_SHIFT + 3);
 	info->pasid_table = pasid_table;
 
@@ -100,7 +99,7 @@ void intel_pasid_free_table(struct device *dev)
 		iommu_free_page(table);
 	}
 
-	iommu_free_pages(pasid_table->table, pasid_table->order);
+	iommu_free_pages(pasid_table->table);
 	kfree(pasid_table);
 }
 
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 668d8ece6b143c..fd0fd1a0df84cc 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -47,7 +47,6 @@ struct pasid_entry {
 /* The representative of a PASID table */
 struct pasid_table {
 	void			*table;		/* pasid table pointer */
-	int			order;		/* page order of pasid table */
 	u32			max_pasid;	/* max pasid */
 };
 
diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
index c2d792db52c3e2..01ecafed31453c 100644
--- a/drivers/iommu/intel/prq.c
+++ b/drivers/iommu/intel/prq.c
@@ -338,7 +338,7 @@ int intel_iommu_enable_prq(struct intel_iommu *iommu)
 	dmar_free_hwirq(irq);
 	iommu->pr_irq = 0;
 free_prq:
-	iommu_free_pages(iommu->prq, PRQ_ORDER);
+	iommu_free_pages(iommu->prq);
 	iommu->prq = NULL;
 
 	return ret;
@@ -361,7 +361,7 @@ int intel_iommu_finish_prq(struct intel_iommu *iommu)
 		iommu->iopf_queue = NULL;
 	}
 
-	iommu_free_pages(iommu->prq, PRQ_ORDER);
+	iommu_free_pages(iommu->prq);
 	iommu->prq = NULL;
 
 	return 0;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 7632c80edea63a..62df2528d020b2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -300,7 +300,7 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	if (cfg->free)
 		cfg->free(cookie, pages, size);
 	else
-		iommu_free_pages(pages, order);
+		iommu_free_pages(pages);
 
 	return NULL;
 }
@@ -316,7 +316,7 @@ static void __arm_lpae_free_pages(void *pages, size_t size,
 	if (cfg->free)
 		cfg->free(cookie, pages, size);
 	else
-		iommu_free_pages(pages, get_order(size));
+		iommu_free_pages(pages);
 }
 
 static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index c004640640ee50..7efcaea0bd5c86 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -262,7 +262,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 
 		pte = dart_install_table(cptep, ptep, 0, data);
 		if (pte)
-			iommu_free_pages(cptep, get_order(tblsz));
+			iommu_free_pages(cptep);
 
 		/* L2 table is present (now) */
 		pte = READ_ONCE(*ptep);
@@ -423,8 +423,7 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 
 out_free_data:
 	while (--i >= 0) {
-		iommu_free_pages(data->pgd[i],
-				 get_order(DART_GRANULE(data)));
+		iommu_free_pages(data->pgd[i]);
 	}
 	kfree(data);
 	return NULL;
@@ -433,7 +432,6 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 static void apple_dart_free_pgtable(struct io_pgtable *iop)
 {
 	struct dart_io_pgtable *data = io_pgtable_to_data(iop);
-	int order = get_order(DART_GRANULE(data));
 	dart_iopte *ptep, *end;
 	int i;
 
@@ -445,9 +443,9 @@ static void apple_dart_free_pgtable(struct io_pgtable *iop)
 			dart_iopte pte = *ptep++;
 
 			if (pte)
-				iommu_free_pages(iopte_deref(pte, data), order);
+				iommu_free_pages(iopte_deref(pte, data));
 		}
-		iommu_free_pages(data->pgd[i], order);
+		iommu_free_pages(data->pgd[i]);
 	}
 
 	kfree(data);
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 26b91940bdc146..88587da1782b94 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -105,11 +105,12 @@ static inline void *iommu_alloc_page(gfp_t gfp)
 }
 
 /**
- * iommu_free_pages - free page of a given order
+ * iommu_free_pages - free pages
  * @virt: virtual address of the page to be freed.
- * @order: page order
+ *
+ * The page must have have been allocated by iommu_alloc_pages_node()
  */
-static inline void iommu_free_pages(void *virt, int order)
+static inline void iommu_free_pages(void *virt)
 {
 	struct page *page;
 
@@ -127,7 +128,7 @@ static inline void iommu_free_pages(void *virt, int order)
  */
 static inline void iommu_free_page(void *virt)
 {
-	iommu_free_pages(virt, 0);
+	iommu_free_pages(virt);
 }
 
 /**
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 8f049d4a0e2cb8..1868468d018a28 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -48,14 +48,13 @@ static DEFINE_IDA(riscv_iommu_pscids);
 /* Device resource-managed allocations */
 struct riscv_iommu_devres {
 	void *addr;
-	int order;
 };
 
 static void riscv_iommu_devres_pages_release(struct device *dev, void *res)
 {
 	struct riscv_iommu_devres *devres = res;
 
-	iommu_free_pages(devres->addr, devres->order);
+	iommu_free_pages(devres->addr);
 }
 
 static int riscv_iommu_devres_pages_match(struct device *dev, void *res, void *p)
@@ -80,12 +79,11 @@ static void *riscv_iommu_get_pages(struct riscv_iommu_device *iommu, int order)
 			      sizeof(struct riscv_iommu_devres), GFP_KERNEL);
 
 	if (unlikely(!devres)) {
-		iommu_free_pages(addr, order);
+		iommu_free_pages(addr);
 		return NULL;
 	}
 
 	devres->addr = addr;
-	devres->order = order;
 
 	devres_add(iommu->dev, devres);
 
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 8d8f11854676c0..6385560dbc3fb0 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -713,7 +713,7 @@ static void sun50i_iommu_domain_free(struct iommu_domain *domain)
 {
 	struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain);
 
-	iommu_free_pages(sun50i_domain->dt, get_order(DT_SIZE));
+	iommu_free_pages(sun50i_domain->dt);
 	sun50i_domain->dt = NULL;
 
 	kfree(sun50i_domain);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 06/23] iommu/pages: Remove iommu_free_page()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  6:34   ` Baolu Lu
  2025-03-12 11:44   ` Mostafa Saleh
  2025-02-25 19:39 ` [PATCH v3 07/23] iommu/pages: De-inline the substantial functions Jason Gunthorpe
                   ` (18 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Use iommu_free_pages() instead.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/init.c          |  2 +-
 drivers/iommu/amd/io_pgtable.c    |  4 ++--
 drivers/iommu/amd/io_pgtable_v2.c |  8 ++++----
 drivers/iommu/amd/iommu.c         |  4 ++--
 drivers/iommu/intel/dmar.c        |  4 ++--
 drivers/iommu/intel/iommu.c       | 12 ++++++------
 drivers/iommu/intel/pasid.c       |  4 ++--
 drivers/iommu/iommu-pages.h       |  9 ---------
 drivers/iommu/riscv/iommu.c       |  6 +++---
 drivers/iommu/rockchip-iommu.c    |  8 ++++----
 drivers/iommu/tegra-smmu.c        | 12 ++++++------
 11 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f47ff0e0c75f4e..73ebcb958ad864 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -955,7 +955,7 @@ static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
 static void __init free_cwwb_sem(struct amd_iommu *iommu)
 {
 	if (iommu->cmd_sem)
-		iommu_free_page((void *)iommu->cmd_sem);
+		iommu_free_pages((void *)iommu->cmd_sem);
 }
 
 static void iommu_enable_xt(struct amd_iommu *iommu)
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index f3399087859fd1..025d8a3fe9cb78 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -153,7 +153,7 @@ static bool increase_address_space(struct amd_io_pgtable *pgtable,
 
 out:
 	spin_unlock_irqrestore(&domain->lock, flags);
-	iommu_free_page(pte);
+	iommu_free_pages(pte);
 
 	return ret;
 }
@@ -229,7 +229,7 @@ static u64 *alloc_pte(struct amd_io_pgtable *pgtable,
 
 			/* pte could have been changed somewhere. */
 			if (!try_cmpxchg64(pte, &__pte, __npte))
-				iommu_free_page(page);
+				iommu_free_pages(page);
 			else if (IOMMU_PTE_PRESENT(__pte))
 				*updated = true;
 
diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
index c616de2c5926ec..cce3fc9861ef77 100644
--- a/drivers/iommu/amd/io_pgtable_v2.c
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -121,10 +121,10 @@ static void free_pgtable(u64 *pt, int level)
 		if (level > 2)
 			free_pgtable(p, level - 1);
 		else
-			iommu_free_page(p);
+			iommu_free_pages(p);
 	}
 
-	iommu_free_page(pt);
+	iommu_free_pages(pt);
 }
 
 /* Allocate page table */
@@ -159,7 +159,7 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
 			__npte = set_pgtable_attr(page);
 			/* pte could have been changed somewhere. */
 			if (!try_cmpxchg64(pte, &__pte, __npte))
-				iommu_free_page(page);
+				iommu_free_pages(page);
 			else if (IOMMU_PTE_PRESENT(__pte))
 				*updated = true;
 
@@ -181,7 +181,7 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
 		if (pg_size == IOMMU_PAGE_SIZE_1G)
 			free_pgtable(__pte, end_level - 1);
 		else if (pg_size == IOMMU_PAGE_SIZE_2M)
-			iommu_free_page(__pte);
+			iommu_free_pages(__pte);
 	}
 
 	return pte;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b48a72bd7b23df..e23d104d177ad9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1812,7 +1812,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
 
 		ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
 
-		iommu_free_page(ptr);
+		iommu_free_pages(ptr);
 	}
 }
 
@@ -1845,7 +1845,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
 	/* Free per device domain ID */
 	pdom_id_free(gcr3_info->domid);
 
-	iommu_free_page(gcr3_info->gcr3_tbl);
+	iommu_free_pages(gcr3_info->gcr3_tbl);
 	gcr3_info->gcr3_tbl = NULL;
 }
 
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 9f424acf474e94..c812c83d77da10 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1187,7 +1187,7 @@ static void free_iommu(struct intel_iommu *iommu)
 	}
 
 	if (iommu->qi) {
-		iommu_free_page(iommu->qi->desc);
+		iommu_free_pages(iommu->qi->desc);
 		kfree(iommu->qi->desc_status);
 		kfree(iommu->qi);
 	}
@@ -1714,7 +1714,7 @@ int dmar_enable_qi(struct intel_iommu *iommu)
 
 	qi->desc_status = kcalloc(QI_LENGTH, sizeof(int), GFP_ATOMIC);
 	if (!qi->desc_status) {
-		iommu_free_page(qi->desc);
+		iommu_free_pages(qi->desc);
 		kfree(qi);
 		iommu->qi = NULL;
 		return -ENOMEM;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc46098f875b16..1e73bfa00329ae 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -571,17 +571,17 @@ static void free_context_table(struct intel_iommu *iommu)
 	for (i = 0; i < ROOT_ENTRY_NR; i++) {
 		context = iommu_context_addr(iommu, i, 0, 0);
 		if (context)
-			iommu_free_page(context);
+			iommu_free_pages(context);
 
 		if (!sm_supported(iommu))
 			continue;
 
 		context = iommu_context_addr(iommu, i, 0x80, 0);
 		if (context)
-			iommu_free_page(context);
+			iommu_free_pages(context);
 	}
 
-	iommu_free_page(iommu->root_entry);
+	iommu_free_pages(iommu->root_entry);
 	iommu->root_entry = NULL;
 }
 
@@ -744,7 +744,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 			tmp = 0ULL;
 			if (!try_cmpxchg64(&pte->val, &tmp, pteval))
 				/* Someone else set it while we were thinking; use theirs. */
-				iommu_free_page(tmp_page);
+				iommu_free_pages(tmp_page);
 			else
 				domain_flush_cache(domain, pte, sizeof(*pte));
 		}
@@ -857,7 +857,7 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level,
 		      last_pfn < level_pfn + level_size(level) - 1)) {
 			dma_clear_pte(pte);
 			domain_flush_cache(domain, pte, sizeof(*pte));
-			iommu_free_page(level_pte);
+			iommu_free_pages(level_pte);
 		}
 next:
 		pfn += level_size(level);
@@ -881,7 +881,7 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain,
 
 	/* free pgd */
 	if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
-		iommu_free_page(domain->pgd);
+		iommu_free_pages(domain->pgd);
 		domain->pgd = NULL;
 	}
 }
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 00da94b1c4c907..4249f12db7fc43 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -96,7 +96,7 @@ void intel_pasid_free_table(struct device *dev)
 	max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
 	for (i = 0; i < max_pde; i++) {
 		table = get_pasid_table_from_pde(&dir[i]);
-		iommu_free_page(table);
+		iommu_free_pages(table);
 	}
 
 	iommu_free_pages(pasid_table->table);
@@ -160,7 +160,7 @@ static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
 		tmp = 0ULL;
 		if (!try_cmpxchg64(&dir[dir_index].val, &tmp,
 				   (u64)virt_to_phys(entries) | PASID_PTE_PRESENT)) {
-			iommu_free_page(entries);
+			iommu_free_pages(entries);
 			goto retry;
 		}
 		if (!ecap_coherent(info->iommu->ecap)) {
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 88587da1782b94..fcd17b94f7b830 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -122,15 +122,6 @@ static inline void iommu_free_pages(void *virt)
 	put_page(page);
 }
 
-/**
- * iommu_free_page - free page
- * @virt: virtual address of the page to be freed.
- */
-static inline void iommu_free_page(void *virt)
-{
-	iommu_free_pages(virt);
-}
-
 /**
  * iommu_put_pages_list - free a list of pages.
  * @page: the head of the lru list to be freed.
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 1868468d018a28..4fe07343d84e61 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1105,7 +1105,7 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain,
 	if (freelist)
 		list_add_tail(&virt_to_page(ptr)->lru, freelist);
 	else
-		iommu_free_page(ptr);
+		iommu_free_pages(ptr);
 }
 
 static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain,
@@ -1148,7 +1148,7 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain,
 			old = pte;
 			pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE);
 			if (cmpxchg_relaxed(ptr, old, pte) != old) {
-				iommu_free_page(addr);
+				iommu_free_pages(addr);
 				goto pte_retry;
 			}
 		}
@@ -1393,7 +1393,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
 	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
 					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
 	if (domain->pscid < 0) {
-		iommu_free_page(domain->pgd_root);
+		iommu_free_pages(domain->pgd_root);
 		kfree(domain);
 		return ERR_PTR(-ENOMEM);
 	}
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 323cc665c35703..798e85bd994d56 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -737,7 +737,7 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
 	pt_dma = dma_map_single(dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE);
 	if (dma_mapping_error(dma_dev, pt_dma)) {
 		dev_err(dma_dev, "DMA mapping error while allocating page table\n");
-		iommu_free_page(page_table);
+		iommu_free_pages(page_table);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -1086,7 +1086,7 @@ static struct iommu_domain *rk_iommu_domain_alloc_paging(struct device *dev)
 	return &rk_domain->domain;
 
 err_free_dt:
-	iommu_free_page(rk_domain->dt);
+	iommu_free_pages(rk_domain->dt);
 err_free_domain:
 	kfree(rk_domain);
 
@@ -1107,13 +1107,13 @@ static void rk_iommu_domain_free(struct iommu_domain *domain)
 			u32 *page_table = phys_to_virt(pt_phys);
 			dma_unmap_single(dma_dev, pt_phys,
 					 SPAGE_SIZE, DMA_TO_DEVICE);
-			iommu_free_page(page_table);
+			iommu_free_pages(page_table);
 		}
 	}
 
 	dma_unmap_single(dma_dev, rk_domain->dt_dma,
 			 SPAGE_SIZE, DMA_TO_DEVICE);
-	iommu_free_page(rk_domain->dt);
+	iommu_free_pages(rk_domain->dt);
 
 	kfree(rk_domain);
 }
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index c134647292fb22..844682a41afa66 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -303,7 +303,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 
 	as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
 	if (!as->count) {
-		iommu_free_page(as->pd);
+		iommu_free_pages(as->pd);
 		kfree(as);
 		return NULL;
 	}
@@ -311,7 +311,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 	as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
 	if (!as->pts) {
 		kfree(as->count);
-		iommu_free_page(as->pd);
+		iommu_free_pages(as->pd);
 		kfree(as);
 		return NULL;
 	}
@@ -608,14 +608,14 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova,
 		dma = dma_map_single(smmu->dev, pt, SMMU_SIZE_PT,
 				     DMA_TO_DEVICE);
 		if (dma_mapping_error(smmu->dev, dma)) {
-			iommu_free_page(pt);
+			iommu_free_pages(pt);
 			return NULL;
 		}
 
 		if (!smmu_dma_addr_valid(smmu, dma)) {
 			dma_unmap_single(smmu->dev, dma, SMMU_SIZE_PT,
 					 DMA_TO_DEVICE);
-			iommu_free_page(pt);
+			iommu_free_pages(pt);
 			return NULL;
 		}
 
@@ -656,7 +656,7 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
 
 		dma_unmap_single(smmu->dev, pte_dma, SMMU_SIZE_PT,
 				 DMA_TO_DEVICE);
-		iommu_free_page(pt);
+		iommu_free_pages(pt);
 		as->pts[pde] = NULL;
 	}
 }
@@ -707,7 +707,7 @@ static struct tegra_pt *as_get_pde_page(struct tegra_smmu_as *as,
 	 */
 	if (as->pts[pde]) {
 		if (pt)
-			iommu_free_page(pt);
+			iommu_free_pages(pt);
 
 		pt = as->pts[pde];
 	}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 07/23] iommu/pages: De-inline the substantial functions
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 06/23] iommu/pages: Remove iommu_free_page() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  6:43   ` Baolu Lu
  2025-03-12 12:45   ` Mostafa Saleh
  2025-02-25 19:39 ` [PATCH v3 08/23] iommu/vtd: Use virt_to_phys() Jason Gunthorpe
                   ` (17 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

These are called in a lot of places and are not trivial. Move them to the
core module.

Tidy some of the comments and function arguments, fold
__iommu_alloc_account() into its only caller, change
__iommu_free_account() into __iommu_free_page() to remove some
duplication.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/Makefile      |   1 +
 drivers/iommu/iommu-pages.c |  84 +++++++++++++++++++++++++++++
 drivers/iommu/iommu-pages.h | 103 ++----------------------------------
 3 files changed, 90 insertions(+), 98 deletions(-)
 create mode 100644 drivers/iommu/iommu-pages.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 5e5a83c6c2aae2..fe91d770abe16c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y += amd/ intel/ arm/ iommufd/ riscv/
 obj-$(CONFIG_IOMMU_API) += iommu.o
+obj-$(CONFIG_IOMMU_SUPPORT) += iommu-pages.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
new file mode 100644
index 00000000000000..31ff83ffaf0106
--- /dev/null
+++ b/drivers/iommu/iommu-pages.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+#include "iommu-pages.h"
+#include <linux/gfp.h>
+#include <linux/mm.h>
+
+/**
+ * iommu_alloc_pages_node - Allocate a zeroed page of a given order from
+ *                          specific NUMA node
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ * @order: page order
+ *
+ * Returns the virtual address of the allocated page. The page must be
+ * freed either by calling iommu_free_pages() or via iommu_put_pages_list().
+ */
+void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
+{
+	const unsigned long pgcnt = 1UL << order;
+	struct page *page;
+
+	page = alloc_pages_node(nid, gfp | __GFP_ZERO | __GFP_COMP, order);
+	if (unlikely(!page))
+		return NULL;
+
+	/*
+	 * All page allocations that should be reported to as "iommu-pagetables"
+	 * to userspace must use one of the functions below. This includes
+	 * allocations of page-tables and other per-iommu_domain configuration
+	 * structures.
+	 *
+	 * This is necessary for the proper accounting as IOMMU state can be
+	 * rather large, i.e. multiple gigabytes in size.
+	 */
+	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, pgcnt);
+	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, pgcnt);
+
+	return page_address(page);
+}
+EXPORT_SYMBOL_GPL(iommu_alloc_pages_node);
+
+static void __iommu_free_page(struct page *page)
+{
+	unsigned int order = folio_order(page_folio(page));
+	const unsigned long pgcnt = 1UL << order;
+
+	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt);
+	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
+	put_page(page);
+}
+
+/**
+ * iommu_free_pages - free pages
+ * @virt: virtual address of the page to be freed.
+ *
+ * The page must have have been allocated by iommu_alloc_pages_node()
+ */
+void iommu_free_pages(void *virt)
+{
+	if (!virt)
+		return;
+	__iommu_free_page(virt_to_page(virt));
+}
+EXPORT_SYMBOL_GPL(iommu_free_pages);
+
+/**
+ * iommu_put_pages_list - free a list of pages.
+ * @head: the head of the lru list to be freed.
+ *
+ * Frees a list of pages allocated by iommu_alloc_pages_node().
+ */
+void iommu_put_pages_list(struct list_head *head)
+{
+	while (!list_empty(head)) {
+		struct page *p = list_entry(head->prev, struct page, lru);
+
+		list_del(&p->lru);
+		__iommu_free_page(p);
+	}
+}
+EXPORT_SYMBOL_GPL(iommu_put_pages_list);
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index fcd17b94f7b830..e3c35aa14ad716 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -7,67 +7,12 @@
 #ifndef __IOMMU_PAGES_H
 #define __IOMMU_PAGES_H
 
-#include <linux/vmstat.h>
-#include <linux/gfp.h>
-#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/topology.h>
 
-/*
- * All page allocations that should be reported to as "iommu-pagetables" to
- * userspace must use one of the functions below.  This includes allocations of
- * page-tables and other per-iommu_domain configuration structures.
- *
- * This is necessary for the proper accounting as IOMMU state can be rather
- * large, i.e. multiple gigabytes in size.
- */
-
-/**
- * __iommu_alloc_account - account for newly allocated page.
- * @page: head struct page of the page.
- * @order: order of the page
- */
-static inline void __iommu_alloc_account(struct page *page, int order)
-{
-	const long pgcnt = 1l << order;
-
-	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, pgcnt);
-	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, pgcnt);
-}
-
-/**
- * __iommu_free_account - account a page that is about to be freed.
- * @page: head struct page of the page.
- * @order: order of the page
- */
-static inline void __iommu_free_account(struct page *page)
-{
-	unsigned int order = folio_order(page_folio(page));
-	const long pgcnt = 1l << order;
-
-	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt);
-	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
-}
-
-/**
- * iommu_alloc_pages_node - allocate a zeroed page of a given order from
- * specific NUMA node.
- * @nid: memory NUMA node id
- * @gfp: buddy allocator flags
- * @order: page order
- *
- * returns the virtual address of the allocated page
- */
-static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
-{
-	struct page *page =
-		alloc_pages_node(nid, gfp | __GFP_ZERO | __GFP_COMP, order);
-
-	if (unlikely(!page))
-		return NULL;
-
-	__iommu_alloc_account(page, order);
-
-	return page_address(page);
-}
+void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order);
+void iommu_free_pages(void *virt);
+void iommu_put_pages_list(struct list_head *head);
 
 /**
  * iommu_alloc_pages - allocate a zeroed page of a given order
@@ -104,42 +49,4 @@ static inline void *iommu_alloc_page(gfp_t gfp)
 	return iommu_alloc_pages_node(numa_node_id(), gfp, 0);
 }
 
-/**
- * iommu_free_pages - free pages
- * @virt: virtual address of the page to be freed.
- *
- * The page must have have been allocated by iommu_alloc_pages_node()
- */
-static inline void iommu_free_pages(void *virt)
-{
-	struct page *page;
-
-	if (!virt)
-		return;
-
-	page = virt_to_page(virt);
-	__iommu_free_account(page);
-	put_page(page);
-}
-
-/**
- * iommu_put_pages_list - free a list of pages.
- * @page: the head of the lru list to be freed.
- *
- * There are no locking requirement for these pages, as they are going to be
- * put on a free list as soon as refcount reaches 0. Pages are put on this LRU
- * list once they are removed from the IOMMU page tables. However, they can
- * still be access through debugfs.
- */
-static inline void iommu_put_pages_list(struct list_head *page)
-{
-	while (!list_empty(page)) {
-		struct page *p = list_entry(page->prev, struct page, lru);
-
-		list_del(&p->lru);
-		__iommu_free_account(p);
-		put_page(p);
-	}
-}
-
 #endif	/* __IOMMU_PAGES_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 08/23] iommu/vtd: Use virt_to_phys()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 07/23] iommu/pages: De-inline the substantial functions Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-03-10  2:21   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 09/23] iommu/pages: Formalize the freelist API Jason Gunthorpe
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

If all the inlines are unwound virt_to_dma_pfn() is simply:
   return page_to_pfn(virt_to_page(p)) << (PAGE_SHIFT - VTD_PAGE_SHIFT);

Which can be re-arranged to:
   (page_to_pfn(virt_to_page(p)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT

The only caller is:
   ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT)

re-arranged to:
   ((page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT) << VTD_PAGE_SHIFT

Which simplifies to:
   page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT

That is the same as virt_to_phys(tmp_page), so just remove all of this.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c |  3 ++-
 drivers/iommu/intel/iommu.h | 19 -------------------
 2 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1e73bfa00329ae..d864eb180642f2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -737,7 +737,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 				return NULL;
 
 			domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE);
-			pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+			pteval = virt_to_phys(tmp_page) | DMA_PTE_READ |
+				 DMA_PTE_WRITE;
 			if (domain->use_first_level)
 				pteval |= DMA_FL_PTE_US | DMA_FL_PTE_ACCESS;
 
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 6ea7bbe26b19d5..dd980808998da9 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -953,25 +953,6 @@ static inline unsigned long lvl_to_nr_pages(unsigned int lvl)
 	return 1UL << min_t(int, (lvl - 1) * LEVEL_STRIDE, MAX_AGAW_PFN_WIDTH);
 }
 
-/* VT-d pages must always be _smaller_ than MM pages. Otherwise things
-   are never going to work. */
-static inline unsigned long mm_to_dma_pfn_start(unsigned long mm_pfn)
-{
-	return mm_pfn << (PAGE_SHIFT - VTD_PAGE_SHIFT);
-}
-static inline unsigned long mm_to_dma_pfn_end(unsigned long mm_pfn)
-{
-	return ((mm_pfn + 1) << (PAGE_SHIFT - VTD_PAGE_SHIFT)) - 1;
-}
-static inline unsigned long page_to_dma_pfn(struct page *pg)
-{
-	return mm_to_dma_pfn_start(page_to_pfn(pg));
-}
-static inline unsigned long virt_to_dma_pfn(void *p)
-{
-	return page_to_dma_pfn(virt_to_page(p));
-}
-
 static inline void context_set_present(struct context_entry *context)
 {
 	context->lo |= 1;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 09/23] iommu/pages: Formalize the freelist API
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 08/23] iommu/vtd: Use virt_to_phys() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  6:56   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 10/23] iommu/riscv: Convert to use struct iommu_pages_list Jason Gunthorpe
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

We want to get rid of struct page references outside the internal
allocator implementation. The free list has the driver open code something
like:

   list_add_tail(&virt_to_page(ptr)->lru, freelist);

Move the above into a small inline and make the freelist into a wrapper
type 'struct iommu_pages_list' so that the compiler can help check all the
conversion.

This struct has also proven helpful in some future ideas to convert to a
singly linked list to get an extra pointer in the struct page, and to
signal that the pages should be freed with RCU.

Use a temporary _Generic so we don't need to rename the free function as
the patches progress.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-pages.c | 23 ++++++++++++-------
 drivers/iommu/iommu-pages.h | 45 ++++++++++++++++++++++++++++++++++---
 include/linux/iommu.h       | 12 ++++++++++
 3 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
index 31ff83ffaf0106..af8694b46417fa 100644
--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -67,18 +67,25 @@ void iommu_free_pages(void *virt)
 EXPORT_SYMBOL_GPL(iommu_free_pages);
 
 /**
- * iommu_put_pages_list - free a list of pages.
- * @head: the head of the lru list to be freed.
+ * iommu_put_pages_list_new - free a list of pages.
+ * @list: The list of pages to be freed
  *
  * Frees a list of pages allocated by iommu_alloc_pages_node().
  */
-void iommu_put_pages_list(struct list_head *head)
+void iommu_put_pages_list_new(struct iommu_pages_list *list)
 {
-	while (!list_empty(head)) {
-		struct page *p = list_entry(head->prev, struct page, lru);
+	struct page *p, *tmp;
 
-		list_del(&p->lru);
+	list_for_each_entry_safe(p, tmp, &list->pages, lru)
 		__iommu_free_page(p);
-	}
 }
-EXPORT_SYMBOL_GPL(iommu_put_pages_list);
+EXPORT_SYMBOL_GPL(iommu_put_pages_list_new);
+
+void iommu_put_pages_list_old(struct list_head *head)
+{
+	struct page *p, *tmp;
+
+	list_for_each_entry_safe(p, tmp, head, lru)
+		__iommu_free_page(p);
+}
+EXPORT_SYMBOL_GPL(iommu_put_pages_list_old);
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index e3c35aa14ad716..0acc26af7202df 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -7,12 +7,51 @@
 #ifndef __IOMMU_PAGES_H
 #define __IOMMU_PAGES_H
 
-#include <linux/types.h>
-#include <linux/topology.h>
+#include <linux/iommu.h>
 
 void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order);
 void iommu_free_pages(void *virt);
-void iommu_put_pages_list(struct list_head *head);
+void iommu_put_pages_list_new(struct iommu_pages_list *list);
+void iommu_put_pages_list_old(struct list_head *head);
+
+#define iommu_put_pages_list(head)                                   \
+	_Generic(head,                                               \
+		struct iommu_pages_list *: iommu_put_pages_list_new, \
+		struct list_head *: iommu_put_pages_list_old)(head)
+
+/**
+ * iommu_pages_list_add - add the page to a iommu_pages_list
+ * @list: List to add the page to
+ * @virt: Address returned from iommu_alloc_pages_node()
+ */
+static inline void iommu_pages_list_add(struct iommu_pages_list *list,
+					void *virt)
+{
+	list_add_tail(&virt_to_page(virt)->lru, &list->pages);
+}
+
+/**
+ * iommu_pages_list_splice - Put all the pages in list from into list to
+ * @from: Source list of pages
+ * @to: Destination list of pages
+ *
+ * from must be re-initialized after calling this function if it is to be
+ * used again.
+ */
+static inline void iommu_pages_list_splice(struct iommu_pages_list *from,
+					   struct iommu_pages_list *to)
+{
+	list_splice(&from->pages, &to->pages);
+}
+
+/**
+ * iommu_pages_list_empty - True if the list is empty
+ * @list: List to check
+ */
+static inline bool iommu_pages_list_empty(struct iommu_pages_list *list)
+{
+	return list_empty(&list->pages);
+}
 
 /**
  * iommu_alloc_pages - allocate a zeroed page of a given order
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 38c65e92ecd091..e414951c0af83f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -326,6 +326,18 @@ typedef unsigned int ioasid_t;
 /* Read but do not clear any dirty bits */
 #define IOMMU_DIRTY_NO_CLEAR (1 << 0)
 
+/*
+ * Pages allocated through iommu_alloc_pages_node() can be placed on this list
+ * using iommu_pages_list_add(). Note: ONLY pages from iommu_alloc_pages_node()
+ * can be used this way!
+ */
+struct iommu_pages_list {
+	struct list_head pages;
+};
+
+#define IOMMU_PAGES_LIST_INIT(name) \
+	((struct iommu_pages_list){ .pages = LIST_HEAD_INIT(name.pages) })
+
 #ifdef CONFIG_IOMMU_API
 
 /**
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 10/23] iommu/riscv: Convert to use struct iommu_pages_list
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 09/23] iommu/pages: Formalize the freelist API Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 11/23] iommu/amd: " Jason Gunthorpe
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Change the internal freelist to use struct iommu_pages_list.

riscv uses this page list to free page table levels that are replaced
with leaf ptes.

Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/riscv/iommu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 4fe07343d84e61..2750f2e6e01a2b 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1085,7 +1085,8 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain,
 #define _io_pte_entry(pn, prot)	((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))
 
 static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain,
-				 unsigned long pte, struct list_head *freelist)
+				 unsigned long pte,
+				 struct iommu_pages_list *freelist)
 {
 	unsigned long *ptr;
 	int i;
@@ -1103,7 +1104,7 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain,
 	}
 
 	if (freelist)
-		list_add_tail(&virt_to_page(ptr)->lru, freelist);
+		iommu_pages_list_add(freelist, ptr);
 	else
 		iommu_free_pages(ptr);
 }
@@ -1192,7 +1193,7 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
 	unsigned long *ptr;
 	unsigned long pte, old, pte_prot;
 	int rc = 0;
-	LIST_HEAD(freelist);
+	struct iommu_pages_list freelist = IOMMU_PAGES_LIST_INIT(freelist);
 
 	if (!(prot & IOMMU_WRITE))
 		pte_prot = _PAGE_BASE | _PAGE_READ;
@@ -1223,7 +1224,7 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain,
 
 	*mapped = size;
 
-	if (!list_empty(&freelist)) {
+	if (!iommu_pages_list_empty(&freelist)) {
 		/*
 		 * In 1.0 spec version, the smallest scope we can use to
 		 * invalidate all levels of page table (i.e. leaf and non-leaf)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 11/23] iommu/amd: Convert to use struct iommu_pages_list
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (9 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 10/23] iommu/riscv: Convert to use struct iommu_pages_list Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 12/23] iommu: Change iommu_iotlb_gather to use iommu_page_list Jason Gunthorpe
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Change the internal freelist to use struct iommu_pages_list.

AMD uses the freelist to batch free the entire table during domain
destruction, and to replace table levels with leafs during map.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/io_pgtable.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 025d8a3fe9cb78..04d2b0883c3e32 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -54,14 +54,7 @@ static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
  *
  ****************************************************************************/
 
-static void free_pt_page(u64 *pt, struct list_head *freelist)
-{
-	struct page *p = virt_to_page(pt);
-
-	list_add_tail(&p->lru, freelist);
-}
-
-static void free_pt_lvl(u64 *pt, struct list_head *freelist, int lvl)
+static void free_pt_lvl(u64 *pt, struct iommu_pages_list *freelist, int lvl)
 {
 	u64 *p;
 	int i;
@@ -84,20 +77,20 @@ static void free_pt_lvl(u64 *pt, struct list_head *freelist, int lvl)
 		if (lvl > 2)
 			free_pt_lvl(p, freelist, lvl - 1);
 		else
-			free_pt_page(p, freelist);
+			iommu_pages_list_add(freelist, p);
 	}
 
-	free_pt_page(pt, freelist);
+	iommu_pages_list_add(freelist, pt);
 }
 
-static void free_sub_pt(u64 *root, int mode, struct list_head *freelist)
+static void free_sub_pt(u64 *root, int mode, struct iommu_pages_list *freelist)
 {
 	switch (mode) {
 	case PAGE_MODE_NONE:
 	case PAGE_MODE_7_LEVEL:
 		break;
 	case PAGE_MODE_1_LEVEL:
-		free_pt_page(root, freelist);
+		iommu_pages_list_add(freelist, root);
 		break;
 	case PAGE_MODE_2_LEVEL:
 	case PAGE_MODE_3_LEVEL:
@@ -306,7 +299,8 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
 	return pte;
 }
 
-static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *freelist)
+static void free_clear_pte(u64 *pte, u64 pteval,
+			   struct iommu_pages_list *freelist)
 {
 	u64 *pt;
 	int mode;
@@ -335,7 +329,7 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 			      int prot, gfp_t gfp, size_t *mapped)
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
-	LIST_HEAD(freelist);
+	struct iommu_pages_list freelist = IOMMU_PAGES_LIST_INIT(freelist);
 	bool updated = false;
 	u64 __pte, *pte;
 	int ret, i, count;
@@ -360,7 +354,7 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 		for (i = 0; i < count; ++i)
 			free_clear_pte(&pte[i], pte[i], &freelist);
 
-		if (!list_empty(&freelist))
+		if (!iommu_pages_list_empty(&freelist))
 			updated = true;
 
 		if (count > 1) {
@@ -531,7 +525,7 @@ static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
 static void v1_free_pgtable(struct io_pgtable *iop)
 {
 	struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, pgtbl);
-	LIST_HEAD(freelist);
+	struct iommu_pages_list freelist = IOMMU_PAGES_LIST_INIT(freelist);
 
 	if (pgtable->mode == PAGE_MODE_NONE)
 		return;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 12/23] iommu: Change iommu_iotlb_gather to use iommu_page_list
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (10 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 11/23] iommu/amd: " Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  7:02   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 13/23] iommu/pages: Remove iommu_put_pages_list_old and the _Generic Jason Gunthorpe
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

This converts the remaining places using list of pages to the new API.

The Intel free path was shared with its gather path, so it is converted at
the same time.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/dma-iommu.c   |  9 +++++----
 drivers/iommu/intel/iommu.c | 24 ++++++++++++------------
 include/linux/iommu.h       |  4 ++--
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2a9fa0c8cc00fe..3d5a2ed2e337be 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -114,7 +114,7 @@ early_param("iommu.forcedac", iommu_dma_forcedac_setup);
 struct iova_fq_entry {
 	unsigned long iova_pfn;
 	unsigned long pages;
-	struct list_head freelist;
+	struct iommu_pages_list freelist;
 	u64 counter; /* Flush counter when this entry was added */
 };
 
@@ -201,7 +201,7 @@ static void fq_flush_timeout(struct timer_list *t)
 
 static void queue_iova(struct iommu_dma_cookie *cookie,
 		unsigned long pfn, unsigned long pages,
-		struct list_head *freelist)
+		struct iommu_pages_list *freelist)
 {
 	struct iova_fq *fq;
 	unsigned long flags;
@@ -240,7 +240,7 @@ static void queue_iova(struct iommu_dma_cookie *cookie,
 	fq->entries[idx].iova_pfn = pfn;
 	fq->entries[idx].pages    = pages;
 	fq->entries[idx].counter  = atomic64_read(&cookie->fq_flush_start_cnt);
-	list_splice(freelist, &fq->entries[idx].freelist);
+	iommu_pages_list_splice(freelist, &fq->entries[idx].freelist);
 
 	spin_unlock_irqrestore(&fq->lock, flags);
 
@@ -298,7 +298,8 @@ static void iommu_dma_init_one_fq(struct iova_fq *fq, size_t fq_size)
 	spin_lock_init(&fq->lock);
 
 	for (i = 0; i < fq_size; i++)
-		INIT_LIST_HEAD(&fq->entries[i].freelist);
+		fq->entries[i].freelist =
+			IOMMU_PAGES_LIST_INIT(fq->entries[i].freelist);
 }
 
 static int iommu_dma_init_fq_single(struct iommu_dma_cookie *cookie)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d864eb180642f2..6df5c202fbeba6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -894,18 +894,16 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain,
    The 'pte' argument is the *parent* PTE, pointing to the page that is to
    be freed. */
 static void dma_pte_list_pagetables(struct dmar_domain *domain,
-				    int level, struct dma_pte *pte,
-				    struct list_head *freelist)
+				    int level, struct dma_pte *parent_pte,
+				    struct iommu_pages_list *freelist)
 {
-	struct page *pg;
+	struct dma_pte *pte = phys_to_virt(dma_pte_addr(parent_pte));
 
-	pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT);
-	list_add_tail(&pg->lru, freelist);
+	iommu_pages_list_add(freelist, pte);
 
 	if (level == 1)
 		return;
 
-	pte = page_address(pg);
 	do {
 		if (dma_pte_present(pte) && !dma_pte_superpage(pte))
 			dma_pte_list_pagetables(domain, level - 1, pte, freelist);
@@ -916,7 +914,7 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain,
 static void dma_pte_clear_level(struct dmar_domain *domain, int level,
 				struct dma_pte *pte, unsigned long pfn,
 				unsigned long start_pfn, unsigned long last_pfn,
-				struct list_head *freelist)
+				struct iommu_pages_list *freelist)
 {
 	struct dma_pte *first_pte = NULL, *last_pte = NULL;
 
@@ -961,7 +959,8 @@ static void dma_pte_clear_level(struct dmar_domain *domain, int level,
    the page tables, and may have cached the intermediate levels. The
    pages can only be freed after the IOTLB flush has been done. */
 static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn,
-			 unsigned long last_pfn, struct list_head *freelist)
+			 unsigned long last_pfn,
+			 struct iommu_pages_list *freelist)
 {
 	if (WARN_ON(!domain_pfn_supported(domain, last_pfn)) ||
 	    WARN_ON(start_pfn > last_pfn))
@@ -973,8 +972,7 @@ static void domain_unmap(struct dmar_domain *domain, unsigned long start_pfn,
 
 	/* free pgd */
 	if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
-		struct page *pgd_page = virt_to_page(domain->pgd);
-		list_add_tail(&pgd_page->lru, freelist);
+		iommu_pages_list_add(freelist, domain->pgd);
 		domain->pgd = NULL;
 	}
 }
@@ -1422,7 +1420,8 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 static void domain_exit(struct dmar_domain *domain)
 {
 	if (domain->pgd) {
-		LIST_HEAD(freelist);
+		struct iommu_pages_list freelist =
+			IOMMU_PAGES_LIST_INIT(freelist);
 
 		domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist);
 		iommu_put_pages_list(&freelist);
@@ -3558,7 +3557,8 @@ static void intel_iommu_tlb_sync(struct iommu_domain *domain,
 				 struct iommu_iotlb_gather *gather)
 {
 	cache_tag_flush_range(to_dmar_domain(domain), gather->start,
-			      gather->end, list_empty(&gather->freelist));
+			      gather->end,
+			      iommu_pages_list_empty(&gather->freelist));
 	iommu_put_pages_list(&gather->freelist);
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e414951c0af83f..166d8e1bcb100d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -360,7 +360,7 @@ struct iommu_iotlb_gather {
 	unsigned long		start;
 	unsigned long		end;
 	size_t			pgsize;
-	struct list_head	freelist;
+	struct iommu_pages_list	freelist;
 	bool			queued;
 };
 
@@ -849,7 +849,7 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
 {
 	*gather = (struct iommu_iotlb_gather) {
 		.start	= ULONG_MAX,
-		.freelist = LIST_HEAD_INIT(gather->freelist),
+		.freelist = IOMMU_PAGES_LIST_INIT(gather->freelist),
 	};
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 13/23] iommu/pages: Remove iommu_put_pages_list_old and the _Generic
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (11 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 12/23] iommu: Change iommu_iotlb_gather to use iommu_page_list Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  7:04   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio Jason Gunthorpe
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Nothing uses the old list_head path now, remove it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-pages.c | 15 +++------------
 drivers/iommu/iommu-pages.h |  8 +-------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
index af8694b46417fa..6eacb6a34586a6 100644
--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -67,25 +67,16 @@ void iommu_free_pages(void *virt)
 EXPORT_SYMBOL_GPL(iommu_free_pages);
 
 /**
- * iommu_put_pages_list_new - free a list of pages.
+ * iommu_put_pages_list - free a list of pages.
  * @list: The list of pages to be freed
  *
  * Frees a list of pages allocated by iommu_alloc_pages_node().
  */
-void iommu_put_pages_list_new(struct iommu_pages_list *list)
+void iommu_put_pages_list(struct iommu_pages_list *list)
 {
 	struct page *p, *tmp;
 
 	list_for_each_entry_safe(p, tmp, &list->pages, lru)
 		__iommu_free_page(p);
 }
-EXPORT_SYMBOL_GPL(iommu_put_pages_list_new);
-
-void iommu_put_pages_list_old(struct list_head *head)
-{
-	struct page *p, *tmp;
-
-	list_for_each_entry_safe(p, tmp, head, lru)
-		__iommu_free_page(p);
-}
-EXPORT_SYMBOL_GPL(iommu_put_pages_list_old);
+EXPORT_SYMBOL_GPL(iommu_put_pages_list);
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 0acc26af7202df..8dc0202bf108e4 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -11,13 +11,7 @@
 
 void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order);
 void iommu_free_pages(void *virt);
-void iommu_put_pages_list_new(struct iommu_pages_list *list);
-void iommu_put_pages_list_old(struct list_head *head);
-
-#define iommu_put_pages_list(head)                                   \
-	_Generic(head,                                               \
-		struct iommu_pages_list *: iommu_put_pages_list_new, \
-		struct list_head *: iommu_put_pages_list_old)(head)
+void iommu_put_pages_list(struct iommu_pages_list *list);
 
 /**
  * iommu_pages_list_add - add the page to a iommu_pages_list
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (12 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 13/23] iommu/pages: Remove iommu_put_pages_list_old and the _Generic Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26 12:42   ` Baolu Lu
  2025-02-27  5:17   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 15/23] iommu/pages: Move the __GFP_HIGHMEM checks into the common code Jason Gunthorpe
                   ` (10 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

This brings the iommu page table allocator into the modern world of having
its own private page descriptor and not re-using fields from struct page
for its own purpose. It follows the basic pattern of struct ptdesc which
did this transformation for the CPU page table allocator.

Currently iommu-pages is pretty basic so this isn't a huge benefit,
however I see a coming need for features that CPU allocator has, like sub
PAGE_SIZE allocations, and RCU freeing. This provides the base
infrastructure to implement those cleanly.

Remove numa_node_id() calls from the inlines and instead use NUMA_NO_NODE
which will get switched to numa_mem_id(), which seems to be the right ID
to use for memory allocations.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-pages.c | 54 ++++++++++++++++++++++++++-----------
 drivers/iommu/iommu-pages.h | 43 ++++++++++++++++++++++++++---
 2 files changed, 78 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
index 6eacb6a34586a6..3077df642adb1f 100644
--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -7,6 +7,21 @@
 #include <linux/gfp.h>
 #include <linux/mm.h>
 
+#define IOPTDESC_MATCH(pg_elm, elm)                    \
+	static_assert(offsetof(struct page, pg_elm) == \
+		      offsetof(struct ioptdesc, elm))
+IOPTDESC_MATCH(flags, __page_flags);
+IOPTDESC_MATCH(lru, iopt_freelist_elm); /* Ensure bit 0 is clear */
+IOPTDESC_MATCH(mapping, __page_mapping);
+IOPTDESC_MATCH(private, _private);
+IOPTDESC_MATCH(page_type, __page_type);
+IOPTDESC_MATCH(_refcount, __page_refcount);
+#ifdef CONFIG_MEMCG
+IOPTDESC_MATCH(memcg_data, memcg_data);
+#endif
+#undef IOPTDESC_MATCH
+static_assert(sizeof(struct ioptdesc) <= sizeof(struct page));
+
 /**
  * iommu_alloc_pages_node - Allocate a zeroed page of a given order from
  *                          specific NUMA node
@@ -20,10 +35,17 @@
 void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
 {
 	const unsigned long pgcnt = 1UL << order;
-	struct page *page;
+	struct folio *folio;
 
-	page = alloc_pages_node(nid, gfp | __GFP_ZERO | __GFP_COMP, order);
-	if (unlikely(!page))
+	/*
+	 * __folio_alloc_node() does not handle NUMA_NO_NODE like
+	 * alloc_pages_node() did.
+	 */
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);
+	if (unlikely(!folio))
 		return NULL;
 
 	/*
@@ -35,21 +57,21 @@ void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
 	 * This is necessary for the proper accounting as IOMMU state can be
 	 * rather large, i.e. multiple gigabytes in size.
 	 */
-	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, pgcnt);
-	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, pgcnt);
+	mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt);
+	lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt);
 
-	return page_address(page);
+	return folio_address(folio);
 }
 EXPORT_SYMBOL_GPL(iommu_alloc_pages_node);
 
-static void __iommu_free_page(struct page *page)
+static void __iommu_free_desc(struct ioptdesc *iopt)
 {
-	unsigned int order = folio_order(page_folio(page));
-	const unsigned long pgcnt = 1UL << order;
+	struct folio *folio = ioptdesc_folio(iopt);
+	const unsigned long pgcnt = 1UL << folio_order(folio);
 
-	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt);
-	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
-	put_page(page);
+	mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, -pgcnt);
+	lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, -pgcnt);
+	folio_put(folio);
 }
 
 /**
@@ -62,7 +84,7 @@ void iommu_free_pages(void *virt)
 {
 	if (!virt)
 		return;
-	__iommu_free_page(virt_to_page(virt));
+	__iommu_free_desc(virt_to_ioptdesc(virt));
 }
 EXPORT_SYMBOL_GPL(iommu_free_pages);
 
@@ -74,9 +96,9 @@ EXPORT_SYMBOL_GPL(iommu_free_pages);
  */
 void iommu_put_pages_list(struct iommu_pages_list *list)
 {
-	struct page *p, *tmp;
+	struct ioptdesc *iopt, *tmp;
 
-	list_for_each_entry_safe(p, tmp, &list->pages, lru)
-		__iommu_free_page(p);
+	list_for_each_entry_safe(iopt, tmp, &list->pages, iopt_freelist_elm)
+		__iommu_free_desc(iopt);
 }
 EXPORT_SYMBOL_GPL(iommu_put_pages_list);
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 8dc0202bf108e4..f4578f252e2580 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -9,6 +9,43 @@
 
 #include <linux/iommu.h>
 
+/**
+ * struct ioptdesc - Memory descriptor for IOMMU page tables
+ * @iopt_freelist_elm: List element for a struct iommu_pages_list
+ *
+ * This struct overlays struct page for now. Do not modify without a good
+ * understanding of the issues.
+ */
+struct ioptdesc {
+	unsigned long __page_flags;
+
+	struct list_head iopt_freelist_elm;
+	unsigned long __page_mapping;
+	pgoff_t __index;
+	void *_private;
+
+	unsigned int __page_type;
+	atomic_t __page_refcount;
+#ifdef CONFIG_MEMCG
+	unsigned long memcg_data;
+#endif
+};
+
+static inline struct ioptdesc *folio_ioptdesc(struct folio *folio)
+{
+	return (struct ioptdesc *)folio;
+}
+
+static inline struct folio *ioptdesc_folio(struct ioptdesc *iopt)
+{
+	return (struct folio *)iopt;
+}
+
+static inline struct ioptdesc *virt_to_ioptdesc(void *virt)
+{
+	return folio_ioptdesc(virt_to_folio(virt));
+}
+
 void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order);
 void iommu_free_pages(void *virt);
 void iommu_put_pages_list(struct iommu_pages_list *list);
@@ -21,7 +58,7 @@ void iommu_put_pages_list(struct iommu_pages_list *list);
 static inline void iommu_pages_list_add(struct iommu_pages_list *list,
 					void *virt)
 {
-	list_add_tail(&virt_to_page(virt)->lru, &list->pages);
+	list_add_tail(&virt_to_ioptdesc(virt)->iopt_freelist_elm, &list->pages);
 }
 
 /**
@@ -56,7 +93,7 @@ static inline bool iommu_pages_list_empty(struct iommu_pages_list *list)
  */
 static inline void *iommu_alloc_pages(gfp_t gfp, int order)
 {
-	return iommu_alloc_pages_node(numa_node_id(), gfp, order);
+	return iommu_alloc_pages_node(NUMA_NO_NODE, gfp, order);
 }
 
 /**
@@ -79,7 +116,7 @@ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
  */
 static inline void *iommu_alloc_page(gfp_t gfp)
 {
-	return iommu_alloc_pages_node(numa_node_id(), gfp, 0);
+	return iommu_alloc_pages_node(NUMA_NO_NODE, gfp, 0);
 }
 
 #endif	/* __IOMMU_PAGES_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 15/23] iommu/pages: Move the __GFP_HIGHMEM checks into the common code
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (13 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-03-12 12:45   ` Mostafa Saleh
  2025-02-25 19:39 ` [PATCH v3 16/23] iommu/pages: Allow sub page sizes to be passed into the allocator Jason Gunthorpe
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

The entire allocator API is built around using the kernel virtual address,
it is illegal to pass GFP_HIGHMEM in as a GFP flag. Block it in the common
code. Remove the duplicated checks from drivers.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/io-pgtable-arm.c  | 2 --
 drivers/iommu/io-pgtable-dart.c | 1 -
 drivers/iommu/iommu-pages.c     | 4 ++++
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 62df2528d020b2..08d0f62abe8a09 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -267,8 +267,6 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	dma_addr_t dma;
 	void *pages;
 
-	VM_BUG_ON((gfp & __GFP_HIGHMEM));
-
 	if (cfg->alloc)
 		pages = cfg->alloc(cookie, size, gfp);
 	else
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index 7efcaea0bd5c86..ebf330e67bfa30 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -111,7 +111,6 @@ static void *__dart_alloc_pages(size_t size, gfp_t gfp)
 {
 	int order = get_order(size);
 
-	VM_BUG_ON((gfp & __GFP_HIGHMEM));
 	return iommu_alloc_pages(gfp, order);
 }
 
diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
index 3077df642adb1f..a7eed09420a231 100644
--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -37,6 +37,10 @@ void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
 	const unsigned long pgcnt = 1UL << order;
 	struct folio *folio;
 
+	/* This uses page_address() on the memory. */
+	if (WARN_ON(gfp & __GFP_HIGHMEM))
+		return NULL;
+
 	/*
 	 * __folio_alloc_node() does not handle NUMA_NO_NODE like
 	 * alloc_pages_node() did.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 16/23] iommu/pages: Allow sub page sizes to be passed into the allocator
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (14 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 15/23] iommu/pages: Move the __GFP_HIGHMEM checks into the common code Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26 12:22   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 17/23] iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc() Jason Gunthorpe
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Generally drivers have a specific idea what their HW structure size should
be. In a lot of cases this is related to PAGE_SIZE, but not always. ARM64,
for example, allows a 4K IO page table size on a 64K CPU page table
system.

Currently we don't have any good support for sub page allocations, but
make the API accommodate this by accepting a sub page size from the caller
and rounding up internally.

This is done by moving away from order as the size input and using size:
  size == 1 << (order + PAGE_SHIFT)

Following patches convert drivers away from using order and try to specify
allocation sizes independent of PAGE_SIZE.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-pages.c | 29 +++++++++++++++---------
 drivers/iommu/iommu-pages.h | 44 ++++++++++++++++++++++++++++++++-----
 include/linux/iommu.h       |  6 ++---
 3 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
index a7eed09420a231..4cc77fddfeeb47 100644
--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -23,24 +23,32 @@ IOPTDESC_MATCH(memcg_data, memcg_data);
 static_assert(sizeof(struct ioptdesc) <= sizeof(struct page));
 
 /**
- * iommu_alloc_pages_node - Allocate a zeroed page of a given order from
- *                          specific NUMA node
+ * iommu_alloc_pages_node_sz - Allocate a zeroed page of a given size from
+ *                             specific NUMA node
  * @nid: memory NUMA node id
  * @gfp: buddy allocator flags
- * @order: page order
+ * @size: Memory size to allocate, rounded up to a power of 2
  *
- * Returns the virtual address of the allocated page. The page must be
- * freed either by calling iommu_free_pages() or via iommu_put_pages_list().
+ * Returns the virtual address of the allocated page. The page must be freed
+ * either by calling iommu_free_pages() or via iommu_put_pages_list(). The
+ * returned allocation is round_up_pow_two(size) big, and is physically aligned
+ * to its size.
  */
-void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
+void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
 {
-	const unsigned long pgcnt = 1UL << order;
+	unsigned long pgcnt;
 	struct folio *folio;
+	unsigned int order;
 
 	/* This uses page_address() on the memory. */
 	if (WARN_ON(gfp & __GFP_HIGHMEM))
 		return NULL;
 
+	/*
+	 * Currently sub page allocations result in a full page being returned.
+	 */
+	order = get_order(size);
+
 	/*
 	 * __folio_alloc_node() does not handle NUMA_NO_NODE like
 	 * alloc_pages_node() did.
@@ -61,12 +69,13 @@ void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
 	 * This is necessary for the proper accounting as IOMMU state can be
 	 * rather large, i.e. multiple gigabytes in size.
 	 */
+	pgcnt = 1UL << order;
 	mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt);
 	lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt);
 
 	return folio_address(folio);
 }
-EXPORT_SYMBOL_GPL(iommu_alloc_pages_node);
+EXPORT_SYMBOL_GPL(iommu_alloc_pages_node_sz);
 
 static void __iommu_free_desc(struct ioptdesc *iopt)
 {
@@ -82,7 +91,7 @@ static void __iommu_free_desc(struct ioptdesc *iopt)
  * iommu_free_pages - free pages
  * @virt: virtual address of the page to be freed.
  *
- * The page must have have been allocated by iommu_alloc_pages_node()
+ * The page must have have been allocated by iommu_alloc_pages_node_sz()
  */
 void iommu_free_pages(void *virt)
 {
@@ -96,7 +105,7 @@ EXPORT_SYMBOL_GPL(iommu_free_pages);
  * iommu_put_pages_list - free a list of pages.
  * @list: The list of pages to be freed
  *
- * Frees a list of pages allocated by iommu_alloc_pages_node().
+ * Frees a list of pages allocated by iommu_alloc_pages_node_sz().
  */
 void iommu_put_pages_list(struct iommu_pages_list *list)
 {
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index f4578f252e2580..3c4575d637da6d 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -46,14 +46,14 @@ static inline struct ioptdesc *virt_to_ioptdesc(void *virt)
 	return folio_ioptdesc(virt_to_folio(virt));
 }
 
-void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order);
+void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size);
 void iommu_free_pages(void *virt);
 void iommu_put_pages_list(struct iommu_pages_list *list);
 
 /**
  * iommu_pages_list_add - add the page to a iommu_pages_list
  * @list: List to add the page to
- * @virt: Address returned from iommu_alloc_pages_node()
+ * @virt: Address returned from iommu_alloc_pages_node_sz()
  */
 static inline void iommu_pages_list_add(struct iommu_pages_list *list,
 					void *virt)
@@ -84,16 +84,48 @@ static inline bool iommu_pages_list_empty(struct iommu_pages_list *list)
 	return list_empty(&list->pages);
 }
 
+/**
+ * iommu_alloc_pages_node - Allocate a zeroed page of a given order from
+ *                          specific NUMA node
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ * @order: page order
+ *
+ * Returns the virtual address of the allocated page.
+ * Prefer to use iommu_alloc_pages_node_lg2()
+ */
+static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp,
+					   unsigned int order)
+{
+	return iommu_alloc_pages_node_sz(nid, gfp, 1 << (order + PAGE_SHIFT));
+}
+
 /**
  * iommu_alloc_pages - allocate a zeroed page of a given order
  * @gfp: buddy allocator flags
  * @order: page order
  *
  * returns the virtual address of the allocated page
+ * Prefer to use iommu_alloc_pages_lg2()
  */
 static inline void *iommu_alloc_pages(gfp_t gfp, int order)
 {
-	return iommu_alloc_pages_node(NUMA_NO_NODE, gfp, order);
+	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp,
+					 1 << (order + PAGE_SHIFT));
+}
+
+/**
+ * iommu_alloc_pages_sz - Allocate a zeroed page of a given size from
+ *                          specific NUMA node
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ * @size: Memory size to allocate, this is rounded up to a power of 2
+ *
+ * Returns the virtual address of the allocated page.
+ */
+static inline void *iommu_alloc_pages_sz(gfp_t gfp, size_t size)
+{
+	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp, size);
 }
 
 /**
@@ -102,10 +134,11 @@ static inline void *iommu_alloc_pages(gfp_t gfp, int order)
  * @gfp: buddy allocator flags
  *
  * returns the virtual address of the allocated page
+ * Prefer to use iommu_alloc_pages_node_lg2()
  */
 static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
 {
-	return iommu_alloc_pages_node(nid, gfp, 0);
+	return iommu_alloc_pages_node_sz(nid, gfp, PAGE_SIZE);
 }
 
 /**
@@ -113,10 +146,11 @@ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
  * @gfp: buddy allocator flags
  *
  * returns the virtual address of the allocated page
+ * Prefer to use iommu_alloc_pages_lg2()
  */
 static inline void *iommu_alloc_page(gfp_t gfp)
 {
-	return iommu_alloc_pages_node(NUMA_NO_NODE, gfp, 0);
+	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp, PAGE_SIZE);
 }
 
 #endif	/* __IOMMU_PAGES_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 166d8e1bcb100d..b74c9f3dbcce1d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -327,9 +327,9 @@ typedef unsigned int ioasid_t;
 #define IOMMU_DIRTY_NO_CLEAR (1 << 0)
 
 /*
- * Pages allocated through iommu_alloc_pages_node() can be placed on this list
- * using iommu_pages_list_add(). Note: ONLY pages from iommu_alloc_pages_node()
- * can be used this way!
+ * Pages allocated through iommu_alloc_pages_node_sz() can be placed on this
+ * list using iommu_pages_list_add(). Note: ONLY pages from
+ * iommu_alloc_pages_node_sz() can be used this way!
  */
 struct iommu_pages_list {
 	struct list_head pages;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 17/23] iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (15 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 16/23] iommu/pages: Allow sub page sizes to be passed into the allocator Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 18/23] iommu/amd: Use roundup_pow_two() instead of get_order() Jason Gunthorpe
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

This is just CPU memory used by the driver to track things, it doesn't
need to use iommu-pages. All of them are indexed by devid and devid is
bounded by pci_seg->last_bdf or we are already out of bounds on the page
allocation.

Switch them to use some version of kvmalloc_array() and drop the now
unused constants and remove the tbl_size() round up to PAGE_SIZE multiples
logic.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  8 --------
 drivers/iommu/amd/init.c            | 26 ++++++++++++--------------
 2 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 0bbda60d3cdc7d..9704edf5cc26e0 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -29,8 +29,6 @@
  * some size calculation constants
  */
 #define DEV_TABLE_ENTRY_SIZE		32
-#define ALIAS_TABLE_ENTRY_SIZE		2
-#define RLOOKUP_TABLE_ENTRY_SIZE	(sizeof(void *))
 
 /* Capability offsets used by the driver */
 #define MMIO_CAP_HDR_OFFSET	0x00
@@ -613,12 +611,6 @@ struct amd_iommu_pci_seg {
 	/* Size of the device table */
 	u32 dev_table_size;
 
-	/* Size of the alias table */
-	u32 alias_table_size;
-
-	/* Size of the rlookup table */
-	u32 rlookup_table_size;
-
 	/*
 	 * device table virtual address
 	 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 73ebcb958ad864..fb3c3d17efc167 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -660,8 +660,9 @@ static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 /* Allocate per PCI segment IOMMU rlookup table. */
 static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	pci_seg->rlookup_table = iommu_alloc_pages(GFP_KERNEL,
-						   get_order(pci_seg->rlookup_table_size));
+	pci_seg->rlookup_table = kvcalloc(pci_seg->last_bdf + 1,
+					  sizeof(*pci_seg->rlookup_table),
+					  GFP_KERNEL);
 	if (pci_seg->rlookup_table == NULL)
 		return -ENOMEM;
 
@@ -670,16 +671,15 @@ static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	iommu_free_pages(pci_seg->rlookup_table);
+	kvfree(pci_seg->rlookup_table);
 	pci_seg->rlookup_table = NULL;
 }
 
 static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	pci_seg->irq_lookup_table = iommu_alloc_pages(GFP_KERNEL,
-						      get_order(pci_seg->rlookup_table_size));
-	kmemleak_alloc(pci_seg->irq_lookup_table,
-		       pci_seg->rlookup_table_size, 1, GFP_KERNEL);
+	pci_seg->irq_lookup_table = kvcalloc(pci_seg->last_bdf + 1,
+					     sizeof(*pci_seg->irq_lookup_table),
+					     GFP_KERNEL);
 	if (pci_seg->irq_lookup_table == NULL)
 		return -ENOMEM;
 
@@ -688,8 +688,7 @@ static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_se
 
 static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	kmemleak_free(pci_seg->irq_lookup_table);
-	iommu_free_pages(pci_seg->irq_lookup_table);
+	kvfree(pci_seg->irq_lookup_table);
 	pci_seg->irq_lookup_table = NULL;
 }
 
@@ -697,8 +696,9 @@ static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
 	int i;
 
-	pci_seg->alias_table = iommu_alloc_pages(GFP_KERNEL,
-						 get_order(pci_seg->alias_table_size));
+	pci_seg->alias_table = kvmalloc_array(pci_seg->last_bdf + 1,
+					      sizeof(*pci_seg->alias_table),
+					      GFP_KERNEL);
 	if (!pci_seg->alias_table)
 		return -ENOMEM;
 
@@ -713,7 +713,7 @@ static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
 
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	iommu_free_pages(pci_seg->alias_table);
+	kvfree(pci_seg->alias_table);
 	pci_seg->alias_table = NULL;
 }
 
@@ -1604,8 +1604,6 @@ static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id,
 	pci_seg->last_bdf = last_bdf;
 	DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
 	pci_seg->dev_table_size     = tbl_size(DEV_TABLE_ENTRY_SIZE, last_bdf);
-	pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE, last_bdf);
-	pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE, last_bdf);
 
 	pci_seg->id = id;
 	init_llist_head(&pci_seg->dev_data_list);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 18/23] iommu/amd: Use roundup_pow_two() instead of get_order()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (16 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 17/23] iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 19/23] iommu/riscv: Update to use iommu_alloc_pages_node_lg2() Jason Gunthorpe
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

If x >= PAGE_SIZE then:

  1 << (get_order(x) + PAGE_SHIFT) == roundup_pow_two()

Inline this into the only caller, compute the size of the HW device table
in terms of 4K pages which matches the HW definition.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/init.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fb3c3d17efc167..e3f4283ebbc201 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -245,14 +245,6 @@ static void init_translation_status(struct amd_iommu *iommu)
 		iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
 }
 
-static inline unsigned long tbl_size(int entry_size, int last_bdf)
-{
-	unsigned shift = PAGE_SHIFT +
-			 get_order((last_bdf + 1) * entry_size);
-
-	return 1UL << shift;
-}
-
 int amd_iommu_get_num_iommus(void)
 {
 	return amd_iommus_present;
@@ -1603,7 +1595,9 @@ static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id,
 
 	pci_seg->last_bdf = last_bdf;
 	DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
-	pci_seg->dev_table_size     = tbl_size(DEV_TABLE_ENTRY_SIZE, last_bdf);
+	pci_seg->dev_table_size =
+		max(roundup_pow_of_two((last_bdf + 1) * DEV_TABLE_ENTRY_SIZE),
+		    SZ_4K);
 
 	pci_seg->id = id;
 	init_llist_head(&pci_seg->dev_data_list);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 19/23] iommu/riscv: Update to use iommu_alloc_pages_node_lg2()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (17 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 18/23] iommu/amd: Use roundup_pow_two() instead of get_order() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-25 19:39 ` [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages Jason Gunthorpe
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

One part of RISCV already has a computed size, however the queue
allocation must be aligned to 4k. The other objects are 4k by spec.

Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/riscv/iommu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 2750f2e6e01a2b..8835c82f118db4 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -65,13 +65,14 @@ static int riscv_iommu_devres_pages_match(struct device *dev, void *res, void *p
 	return devres->addr == target->addr;
 }
 
-static void *riscv_iommu_get_pages(struct riscv_iommu_device *iommu, int order)
+static void *riscv_iommu_get_pages(struct riscv_iommu_device *iommu,
+				   unsigned int size)
 {
 	struct riscv_iommu_devres *devres;
 	void *addr;
 
-	addr = iommu_alloc_pages_node(dev_to_node(iommu->dev),
-				      GFP_KERNEL_ACCOUNT, order);
+	addr = iommu_alloc_pages_node_sz(dev_to_node(iommu->dev),
+					 GFP_KERNEL_ACCOUNT, size);
 	if (unlikely(!addr))
 		return NULL;
 
@@ -161,9 +162,9 @@ static int riscv_iommu_queue_alloc(struct riscv_iommu_device *iommu,
 	} else {
 		do {
 			const size_t queue_size = entry_size << (logsz + 1);
-			const int order = get_order(queue_size);
 
-			queue->base = riscv_iommu_get_pages(iommu, order);
+			queue->base = riscv_iommu_get_pages(
+				iommu, max(queue_size, SZ_4K));
 			queue->phys = __pa(queue->base);
 		} while (!queue->base && logsz-- > 0);
 	}
@@ -618,7 +619,7 @@ static struct riscv_iommu_dc *riscv_iommu_get_dc(struct riscv_iommu_device *iomm
 				break;
 			}
 
-			ptr = riscv_iommu_get_pages(iommu, 0);
+			ptr = riscv_iommu_get_pages(iommu, SZ_4K);
 			if (!ptr)
 				return NULL;
 
@@ -698,7 +699,7 @@ static int riscv_iommu_iodir_alloc(struct riscv_iommu_device *iommu)
 	}
 
 	if (!iommu->ddt_root) {
-		iommu->ddt_root = riscv_iommu_get_pages(iommu, 0);
+		iommu->ddt_root = riscv_iommu_get_pages(iommu, SZ_4K);
 		iommu->ddt_phys = __pa(iommu->ddt_root);
 	}
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (18 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 19/23] iommu/riscv: Update to use iommu_alloc_pages_node_lg2() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26 12:24   ` Baolu Lu
  2025-03-12 12:59   ` Mostafa Saleh
  2025-02-25 19:39 ` [PATCH v3 21/23] iommu/pages: Remove iommu_alloc_page/pages() Jason Gunthorpe
                   ` (4 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Convert most of the places calling get_order() as an argument to the
iommu-pages allocator into order_base_2() or the _sz flavour
instead. These places already have an exact size, there is no particular
reason to use order here.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/init.c        | 29 +++++++++++++++--------------
 drivers/iommu/intel/dmar.c      |  6 +++---
 drivers/iommu/io-pgtable-arm.c  |  3 +--
 drivers/iommu/io-pgtable-dart.c | 12 +++---------
 drivers/iommu/sun50i-iommu.c    |  4 ++--
 5 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index e3f4283ebbc201..a5720df7b22397 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -635,8 +635,8 @@ static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 pci_
 /* Allocate per PCI segment device table */
 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	pci_seg->dev_table = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
-					       get_order(pci_seg->dev_table_size));
+	pci_seg->dev_table = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
+						  pci_seg->dev_table_size);
 	if (!pci_seg->dev_table)
 		return -ENOMEM;
 
@@ -716,8 +716,7 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
  */
 static int __init alloc_command_buffer(struct amd_iommu *iommu)
 {
-	iommu->cmd_buf = iommu_alloc_pages(GFP_KERNEL,
-					   get_order(CMD_BUFFER_SIZE));
+	iommu->cmd_buf = iommu_alloc_pages_sz(GFP_KERNEL, CMD_BUFFER_SIZE);
 
 	return iommu->cmd_buf ? 0 : -ENOMEM;
 }
@@ -820,14 +819,16 @@ static void __init free_command_buffer(struct amd_iommu *iommu)
 void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
 				  size_t size)
 {
-	int order = get_order(size);
-	void *buf = iommu_alloc_pages(gfp, order);
+	void *buf;
 
-	if (buf &&
-	    check_feature(FEATURE_SNP) &&
-	    set_memory_4k((unsigned long)buf, (1 << order))) {
+	size = PAGE_ALIGN(size);
+	buf = iommu_alloc_pages_sz(gfp, size);
+	if (!buf)
+		return NULL;
+	if (check_feature(FEATURE_SNP) &&
+	    set_memory_4k((unsigned long)buf, size / PAGE_SIZE)) {
 		iommu_free_pages(buf);
-		buf = NULL;
+		return NULL;
 	}
 
 	return buf;
@@ -922,11 +923,11 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
 	if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
 		return 0;
 
-	iommu->ga_log = iommu_alloc_pages(GFP_KERNEL, get_order(GA_LOG_SIZE));
+	iommu->ga_log = iommu_alloc_pages_sz(GFP_KERNEL, GA_LOG_SIZE);
 	if (!iommu->ga_log)
 		goto err_out;
 
-	iommu->ga_log_tail = iommu_alloc_pages(GFP_KERNEL, get_order(8));
+	iommu->ga_log_tail = iommu_alloc_pages_sz(GFP_KERNEL, 8);
 	if (!iommu->ga_log_tail)
 		goto err_out;
 
@@ -1021,8 +1022,8 @@ static bool __copy_device_table(struct amd_iommu *iommu)
 	if (!old_devtb)
 		return false;
 
-	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
-						     get_order(pci_seg->dev_table_size));
+	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages_sz(
+		GFP_KERNEL | GFP_DMA32, pci_seg->dev_table_size);
 	if (pci_seg->old_dev_tbl_cpy == NULL) {
 		pr_err("Failed to allocate memory for copying old device table!\n");
 		memunmap(old_devtb);
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index c812c83d77da10..4c7ce92acf6976 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1681,7 +1681,6 @@ int dmar_enable_qi(struct intel_iommu *iommu)
 {
 	struct q_inval *qi;
 	void *desc;
-	int order;
 
 	if (!ecap_qis(iommu->ecap))
 		return -ENOENT;
@@ -1702,8 +1701,9 @@ int dmar_enable_qi(struct intel_iommu *iommu)
 	 * Need two pages to accommodate 256 descriptors of 256 bits each
 	 * if the remapping hardware supports scalable mode translation.
 	 */
-	order = ecap_smts(iommu->ecap) ? 1 : 0;
-	desc = iommu_alloc_pages_node(iommu->node, GFP_ATOMIC, order);
+	desc = iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC,
+					 ecap_smts(iommu->ecap) ? SZ_8K :
+								  SZ_4K);
 	if (!desc) {
 		kfree(qi);
 		iommu->qi = NULL;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 08d0f62abe8a09..d13149ec5be77e 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -263,14 +263,13 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 				    void *cookie)
 {
 	struct device *dev = cfg->iommu_dev;
-	int order = get_order(size);
 	dma_addr_t dma;
 	void *pages;
 
 	if (cfg->alloc)
 		pages = cfg->alloc(cookie, size, gfp);
 	else
-		pages = iommu_alloc_pages_node(dev_to_node(dev), gfp, order);
+		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, size);
 
 	if (!pages)
 		return NULL;
diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index ebf330e67bfa30..a0988669bb951a 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -107,13 +107,6 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
 	return paddr;
 }
 
-static void *__dart_alloc_pages(size_t size, gfp_t gfp)
-{
-	int order = get_order(size);
-
-	return iommu_alloc_pages(gfp, order);
-}
-
 static int dart_init_pte(struct dart_io_pgtable *data,
 			     unsigned long iova, phys_addr_t paddr,
 			     dart_iopte prot, int num_entries,
@@ -255,7 +248,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 
 	/* no L2 table present */
 	if (!pte) {
-		cptep = __dart_alloc_pages(tblsz, gfp);
+		cptep = iommu_alloc_pages_sz(gfp, tblsz);
 		if (!cptep)
 			return -ENOMEM;
 
@@ -412,7 +405,8 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits;
 
 	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) {
-		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL);
+		data->pgd[i] =
+			iommu_alloc_pages_sz(GFP_KERNEL, DART_GRANULE(data));
 		if (!data->pgd[i])
 			goto out_free_data;
 		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]);
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 6385560dbc3fb0..76c9620af4bba8 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -690,8 +690,8 @@ sun50i_iommu_domain_alloc_paging(struct device *dev)
 	if (!sun50i_domain)
 		return NULL;
 
-	sun50i_domain->dt = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
-					      get_order(DT_SIZE));
+	sun50i_domain->dt =
+		iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32, DT_SIZE);
 	if (!sun50i_domain->dt)
 		goto err_free_domain;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 21/23] iommu/pages: Remove iommu_alloc_page/pages()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (19 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26  9:15   ` Marek Szyprowski
  2025-02-25 19:39 ` [PATCH v3 22/23] iommu/pages: Remove iommu_alloc_page_node() Jason Gunthorpe
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

A few small changes to the remaining drivers using these will allow
them to be removed:

- Exynos wants to allocate fixed 16K/8K allocations
- Rockchip already has a define SPAGE_SIZE which is used by the
  dma_map immediately following, using SPAGE_ORDER which is a lg2size
- tegra has size constants already for its two allocations

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/exynos-iommu.c   |  4 ++--
 drivers/iommu/iommu-pages.h    | 26 --------------------------
 drivers/iommu/rockchip-iommu.c |  6 ++++--
 drivers/iommu/tegra-smmu.c     |  4 ++--
 4 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 1019e08b43b71c..74337081278551 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -902,11 +902,11 @@ static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device *dev)
 	if (!domain)
 		return NULL;
 
-	domain->pgtable = iommu_alloc_pages(GFP_KERNEL, 2);
+	domain->pgtable = iommu_alloc_pages_sz(GFP_KERNEL, SZ_16K);
 	if (!domain->pgtable)
 		goto err_pgtable;
 
-	domain->lv2entcnt = iommu_alloc_pages(GFP_KERNEL, 1);
+	domain->lv2entcnt = iommu_alloc_pages_sz(GFP_KERNEL, SZ_8K);
 	if (!domain->lv2entcnt)
 		goto err_counter;
 
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 3c4575d637da6d..4513fbc76260cd 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -100,20 +100,6 @@ static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp,
 	return iommu_alloc_pages_node_sz(nid, gfp, 1 << (order + PAGE_SHIFT));
 }
 
-/**
- * iommu_alloc_pages - allocate a zeroed page of a given order
- * @gfp: buddy allocator flags
- * @order: page order
- *
- * returns the virtual address of the allocated page
- * Prefer to use iommu_alloc_pages_lg2()
- */
-static inline void *iommu_alloc_pages(gfp_t gfp, int order)
-{
-	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp,
-					 1 << (order + PAGE_SHIFT));
-}
-
 /**
  * iommu_alloc_pages_sz - Allocate a zeroed page of a given size from
  *                          specific NUMA node
@@ -141,16 +127,4 @@ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
 	return iommu_alloc_pages_node_sz(nid, gfp, PAGE_SIZE);
 }
 
-/**
- * iommu_alloc_page - allocate a zeroed page
- * @gfp: buddy allocator flags
- *
- * returns the virtual address of the allocated page
- * Prefer to use iommu_alloc_pages_lg2()
- */
-static inline void *iommu_alloc_page(gfp_t gfp)
-{
-	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp, PAGE_SIZE);
-}
-
 #endif	/* __IOMMU_PAGES_H */
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 798e85bd994d56..5af82072b03a17 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -730,7 +730,8 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
 	if (rk_dte_is_pt_valid(dte))
 		goto done;
 
-	page_table = iommu_alloc_page(GFP_ATOMIC | rk_ops->gfp_flags);
+	page_table = iommu_alloc_pages_sz(GFP_ATOMIC | rk_ops->gfp_flags,
+					  SPAGE_SIZE);
 	if (!page_table)
 		return ERR_PTR(-ENOMEM);
 
@@ -1064,7 +1065,8 @@ static struct iommu_domain *rk_iommu_domain_alloc_paging(struct device *dev)
 	 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
 	 * Allocate one 4 KiB page for each table.
 	 */
-	rk_domain->dt = iommu_alloc_page(GFP_KERNEL | rk_ops->gfp_flags);
+	rk_domain->dt = iommu_alloc_pages_sz(GFP_KERNEL | rk_ops->gfp_flags,
+					     SPAGE_SIZE);
 	if (!rk_domain->dt)
 		goto err_free_domain;
 
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 844682a41afa66..a9c35efde56969 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -295,7 +295,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 
 	as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
 
-	as->pd = iommu_alloc_page(GFP_KERNEL | __GFP_DMA);
+	as->pd = iommu_alloc_pages_sz(GFP_KERNEL | __GFP_DMA, SMMU_SIZE_PD);
 	if (!as->pd) {
 		kfree(as);
 		return NULL;
@@ -695,7 +695,7 @@ static struct tegra_pt *as_get_pde_page(struct tegra_smmu_as *as,
 	if (gfpflags_allow_blocking(gfp))
 		spin_unlock_irqrestore(&as->lock, *flags);
 
-	pt = iommu_alloc_page(gfp | __GFP_DMA);
+	pt = iommu_alloc_pages_sz(gfp | __GFP_DMA, SMMU_SIZE_PT);
 
 	if (gfpflags_allow_blocking(gfp))
 		spin_lock_irqsave(&as->lock, *flags);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 22/23] iommu/pages: Remove iommu_alloc_page_node()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (20 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 21/23] iommu/pages: Remove iommu_alloc_page/pages() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26 12:26   ` Baolu Lu
  2025-02-25 19:39 ` [PATCH v3 23/23] iommu/pages: Remove iommu_alloc_pages_node() Jason Gunthorpe
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Use iommu_alloc_pages_node_sz() instead.

AMD and Intel are both using 4K pages for these structures since those
drivers only work on 4K PAGE_SIZE.

riscv is also spec'd to use SZ_4K.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/io_pgtable.c    |  8 +++++---
 drivers/iommu/amd/io_pgtable_v2.c |  4 ++--
 drivers/iommu/amd/iommu.c         |  2 +-
 drivers/iommu/intel/iommu.c       | 13 ++++++++-----
 drivers/iommu/intel/pasid.c       |  3 ++-
 drivers/iommu/iommu-pages.h       | 13 -------------
 drivers/iommu/riscv/iommu.c       |  7 ++++---
 7 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 04d2b0883c3e32..2eb8a351ca91e4 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -121,7 +121,7 @@ static bool increase_address_space(struct amd_io_pgtable *pgtable,
 	bool ret = true;
 	u64 *pte;
 
-	pte = iommu_alloc_page_node(cfg->amd.nid, gfp);
+	pte = iommu_alloc_pages_node_sz(cfg->amd.nid, gfp, SZ_4K);
 	if (!pte)
 		return false;
 
@@ -213,7 +213,8 @@ static u64 *alloc_pte(struct amd_io_pgtable *pgtable,
 
 		if (!IOMMU_PTE_PRESENT(__pte) ||
 		    pte_level == PAGE_MODE_NONE) {
-			page = iommu_alloc_page_node(cfg->amd.nid, gfp);
+			page = iommu_alloc_pages_node_sz(cfg->amd.nid, gfp,
+							 SZ_4K);
 
 			if (!page)
 				return NULL;
@@ -542,7 +543,8 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 {
 	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 
-	pgtable->root = iommu_alloc_page_node(cfg->amd.nid, GFP_KERNEL);
+	pgtable->root =
+		iommu_alloc_pages_node_sz(cfg->amd.nid, GFP_KERNEL, SZ_4K);
 	if (!pgtable->root)
 		return NULL;
 	pgtable->mode = PAGE_MODE_3_LEVEL;
diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
index cce3fc9861ef77..a07c22707037eb 100644
--- a/drivers/iommu/amd/io_pgtable_v2.c
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -152,7 +152,7 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
 		}
 
 		if (!IOMMU_PTE_PRESENT(__pte)) {
-			page = iommu_alloc_page_node(nid, gfp);
+			page = iommu_alloc_pages_node_sz(nid, gfp, SZ_4K);
 			if (!page)
 				return NULL;
 
@@ -346,7 +346,7 @@ static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
 	int ias = IOMMU_IN_ADDR_BIT_SIZE;
 
-	pgtable->pgd = iommu_alloc_page_node(cfg->amd.nid, GFP_KERNEL);
+	pgtable->pgd = iommu_alloc_pages_node_sz(cfg->amd.nid, GFP_KERNEL, SZ_4K);
 	if (!pgtable->pgd)
 		return NULL;
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e23d104d177ad9..d465cf2e635413 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1884,7 +1884,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
 		return -ENOSPC;
 	gcr3_info->domid = domid;
 
-	gcr3_info->gcr3_tbl = iommu_alloc_page_node(nid, GFP_ATOMIC);
+	gcr3_info->gcr3_tbl = iommu_alloc_pages_node_sz(nid, GFP_ATOMIC, SZ_4K);
 	if (gcr3_info->gcr3_tbl == NULL) {
 		pdom_id_free(domid);
 		return -ENOMEM;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6df5c202fbeba6..f72de7519d840c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -397,7 +397,8 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
 		if (!alloc)
 			return NULL;
 
-		context = iommu_alloc_page_node(iommu->node, GFP_ATOMIC);
+		context = iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC,
+						    SZ_4K);
 		if (!context)
 			return NULL;
 
@@ -731,7 +732,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 		if (!dma_pte_present(pte)) {
 			uint64_t pteval, tmp;
 
-			tmp_page = iommu_alloc_page_node(domain->nid, gfp);
+			tmp_page = iommu_alloc_pages_node_sz(domain->nid, gfp,
+							     SZ_4K);
 
 			if (!tmp_page)
 				return NULL;
@@ -982,7 +984,7 @@ static int iommu_alloc_root_entry(struct intel_iommu *iommu)
 {
 	struct root_entry *root;
 
-	root = iommu_alloc_page_node(iommu->node, GFP_ATOMIC);
+	root = iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC, SZ_4K);
 	if (!root) {
 		pr_err("Allocating root entry for %s failed\n",
 			iommu->name);
@@ -1994,7 +1996,8 @@ static int copy_context_table(struct intel_iommu *iommu,
 			if (!old_ce)
 				goto out;
 
-			new_ce = iommu_alloc_page_node(iommu->node, GFP_KERNEL);
+			new_ce = iommu_alloc_pages_node_sz(iommu->node,
+							   GFP_KERNEL, SZ_4K);
 			if (!new_ce)
 				goto out_unmap;
 
@@ -3315,7 +3318,7 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 		domain->domain.geometry.aperture_end = __DOMAIN_MAX_ADDR(domain->gaw);
 
 	/* always allocate the top pgd */
-	domain->pgd = iommu_alloc_page_node(domain->nid, GFP_KERNEL);
+	domain->pgd = iommu_alloc_pages_node_sz(domain->nid, GFP_KERNEL, SZ_4K);
 	if (!domain->pgd) {
 		kfree(domain);
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 4249f12db7fc43..2b6e0706d76d62 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -147,7 +147,8 @@ static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
 	if (!entries) {
 		u64 tmp;
 
-		entries = iommu_alloc_page_node(info->iommu->node, GFP_ATOMIC);
+		entries = iommu_alloc_pages_node_sz(info->iommu->node,
+						    GFP_ATOMIC, SZ_4K);
 		if (!entries)
 			return NULL;
 
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 4513fbc76260cd..7ece83bb0f54bb 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -114,17 +114,4 @@ static inline void *iommu_alloc_pages_sz(gfp_t gfp, size_t size)
 	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp, size);
 }
 
-/**
- * iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
- * @nid: memory NUMA node id
- * @gfp: buddy allocator flags
- *
- * returns the virtual address of the allocated page
- * Prefer to use iommu_alloc_pages_node_lg2()
- */
-static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
-{
-	return iommu_alloc_pages_node_sz(nid, gfp, PAGE_SIZE);
-}
-
 #endif	/* __IOMMU_PAGES_H */
diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index 8835c82f118db4..bb57092ca90110 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -1144,7 +1144,8 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain,
 		 * page table. This might race with other mappings, retry.
 		 */
 		if (_io_pte_none(pte)) {
-			addr = iommu_alloc_page_node(domain->numa_node, gfp);
+			addr = iommu_alloc_pages_node_sz(domain->numa_node, gfp,
+							 SZ_4K);
 			if (!addr)
 				return NULL;
 			old = pte;
@@ -1385,8 +1386,8 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
 	domain->numa_node = dev_to_node(iommu->dev);
 	domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD);
 	domain->pgd_mode = pgd_mode;
-	domain->pgd_root = iommu_alloc_page_node(domain->numa_node,
-						 GFP_KERNEL_ACCOUNT);
+	domain->pgd_root = iommu_alloc_pages_node_sz(domain->numa_node,
+						     GFP_KERNEL_ACCOUNT, SZ_4K);
 	if (!domain->pgd_root) {
 		kfree(domain);
 		return ERR_PTR(-ENOMEM);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 23/23] iommu/pages: Remove iommu_alloc_pages_node()
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (21 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 22/23] iommu/pages: Remove iommu_alloc_page_node() Jason Gunthorpe
@ 2025-02-25 19:39 ` Jason Gunthorpe
  2025-02-26 12:30   ` Baolu Lu
  2025-02-25 20:18 ` [PATCH v3 00/23] iommu: Further abstract iommu-pages Nicolin Chen
  2025-02-25 23:17 ` Alejandro Jimenez
  24 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-25 19:39 UTC (permalink / raw)
  To: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

Intel is the only thing that uses this now, convert to the size versions,
trying to avoid PAGE_SHIFT.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.h         |  7 +++----
 drivers/iommu/intel/irq_remapping.c |  8 ++++----
 drivers/iommu/intel/pasid.c         |  3 ++-
 drivers/iommu/intel/prq.c           |  3 ++-
 drivers/iommu/iommu-pages.h         | 16 ----------------
 5 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index dd980808998da9..1036ed0d899472 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -493,14 +493,13 @@ struct q_inval {
 
 /* Page Request Queue depth */
 #define PRQ_ORDER	4
-#define PRQ_RING_MASK	((0x1000 << PRQ_ORDER) - 0x20)
-#define PRQ_DEPTH	((0x1000 << PRQ_ORDER) >> 5)
+#define PRQ_SIZE	(SZ_4K << PRQ_ORDER)
+#define PRQ_RING_MASK	(PRQ_SIZE - 0x20)
+#define PRQ_DEPTH	(PRQ_SIZE >> 5)
 
 struct dmar_pci_notify_info;
 
 #ifdef CONFIG_IRQ_REMAP
-/* 1MB - maximum possible interrupt remapping table size */
-#define INTR_REMAP_PAGE_ORDER	8
 #define INTR_REMAP_TABLE_REG_SIZE	0xf
 #define INTR_REMAP_TABLE_REG_SIZE_MASK  0xf
 
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index d6b796f8f100cd..fdaa508c17d140 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -538,11 +538,11 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 	if (!ir_table)
 		return -ENOMEM;
 
-	ir_table_base = iommu_alloc_pages_node(iommu->node, GFP_KERNEL,
-					       INTR_REMAP_PAGE_ORDER);
+	/* 1MB - maximum possible interrupt remapping table size */
+	ir_table_base =
+		iommu_alloc_pages_node_sz(iommu->node, GFP_KERNEL, SZ_1M);
 	if (!ir_table_base) {
-		pr_err("IR%d: failed to allocate pages of order %d\n",
-		       iommu->seq_id, INTR_REMAP_PAGE_ORDER);
+		pr_err("IR%d: failed to allocate 1M of pages\n", iommu->seq_id);
 		goto out_free_table;
 	}
 
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 2b6e0706d76d62..3afbad4eb46303 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -60,7 +60,8 @@ int intel_pasid_alloc_table(struct device *dev)
 
 	size = max_pasid >> (PASID_PDE_SHIFT - 3);
 	order = size ? get_order(size) : 0;
-	dir = iommu_alloc_pages_node(info->iommu->node, GFP_KERNEL, order);
+	dir = iommu_alloc_pages_node_sz(info->iommu->node, GFP_KERNEL,
+					1 << (order + PAGE_SHIFT));
 	if (!dir) {
 		kfree(pasid_table);
 		return -ENOMEM;
diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
index 01ecafed31453c..0f8c121a8b3f9d 100644
--- a/drivers/iommu/intel/prq.c
+++ b/drivers/iommu/intel/prq.c
@@ -288,7 +288,8 @@ int intel_iommu_enable_prq(struct intel_iommu *iommu)
 	struct iopf_queue *iopfq;
 	int irq, ret;
 
-	iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER);
+	iommu->prq =
+		iommu_alloc_pages_node_sz(iommu->node, GFP_KERNEL, PRQ_SIZE);
 	if (!iommu->prq) {
 		pr_warn("IOMMU: %s: Failed to allocate page request queue\n",
 			iommu->name);
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 7ece83bb0f54bb..b3af2813ed0ced 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -84,22 +84,6 @@ static inline bool iommu_pages_list_empty(struct iommu_pages_list *list)
 	return list_empty(&list->pages);
 }
 
-/**
- * iommu_alloc_pages_node - Allocate a zeroed page of a given order from
- *                          specific NUMA node
- * @nid: memory NUMA node id
- * @gfp: buddy allocator flags
- * @order: page order
- *
- * Returns the virtual address of the allocated page.
- * Prefer to use iommu_alloc_pages_node_lg2()
- */
-static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp,
-					   unsigned int order)
-{
-	return iommu_alloc_pages_node_sz(nid, gfp, 1 << (order + PAGE_SHIFT));
-}
-
 /**
  * iommu_alloc_pages_sz - Allocate a zeroed page of a given size from
  *                          specific NUMA node
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/23] iommu: Further abstract iommu-pages
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (22 preceding siblings ...)
  2025-02-25 19:39 ` [PATCH v3 23/23] iommu/pages: Remove iommu_alloc_pages_node() Jason Gunthorpe
@ 2025-02-25 20:18 ` Nicolin Chen
  2025-02-25 23:17 ` Alejandro Jimenez
  24 siblings, 0 replies; 55+ messages in thread
From: Nicolin Chen @ 2025-02-25 20:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:17PM -0400, Jason Gunthorpe wrote:
> This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pages
> 
> v3:
>  - Fix comments
>  - Rename __iommu_free_page() to __iommu_free_desc()
>  - Retain the max IMR table size comment in vt-d

Ran some sanity in baremetal with some DMA/ATS flow on ARM64. Also
tested the kernel with a stage-2 setup via iommufd running a vSMMU
enabled VM, the guest kernel doesn't have these changes though.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/23] iommu: Further abstract iommu-pages
  2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
                   ` (23 preceding siblings ...)
  2025-02-25 20:18 ` [PATCH v3 00/23] iommu: Further abstract iommu-pages Nicolin Chen
@ 2025-02-25 23:17 ` Alejandro Jimenez
  24 siblings, 0 replies; 55+ messages in thread
From: Alejandro Jimenez @ 2025-02-25 23:17 UTC (permalink / raw)
  To: jgg
  Cc: alim.akhtar, alyssa, aou, asahi, bagasdotme, baolu.lu, dwmw2,
	heiko, iommu, jernej.skrabec, jonathanh, joro, jroedel, krzk,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, m.szyprowski, marcan, palmer,
	pasha.tatashin, patches, paul.walmsley, rientjes, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tjeznach,
	vdumpa, wens, will, willy

>Improve the API to work directly on sizes instead of order, the drivers
> generally have HW specs and code paths that already have specific sizes.
> Pass those sizes down into the allocator to remove some boiler plate
> get_order() in drivers. This is cleanup to be ready for a possible sub
> page allocator some day.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pages
>
> v3:

Tested v3 on AMD Zen4 bare metal host, using v1/v2 page table formats. Ran host
kernel with iommu.passthrough=0. Launched KVM guest with large memory (512+ G)
and 16 VFs. No errors or warnings found.

Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages()
  2025-02-25 19:39 ` [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages() Jason Gunthorpe
@ 2025-02-26  6:25   ` Baolu Lu
  2025-03-12 11:43   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  6:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> These were only used by tegra-smmu and leaked the struct page out of the
> API. Delete them since tega-smmu has been converted to the other APIs.
> 
> In the process flatten the call tree so we have fewer one line functions
> calling other one line functions.. iommu_alloc_pages_node() is the real
> allocator and everything else can just call it directly.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 04/23] iommu/pages: Make iommu_put_pages_list() work with high order allocations
  2025-02-25 19:39 ` [PATCH v3 04/23] iommu/pages: Make iommu_put_pages_list() work with high order allocations Jason Gunthorpe
@ 2025-02-26  6:28   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  6:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> alloc_pages_node(, order) needs to be paired with __free_pages(, order) to
> free all the allocated pages. For order != 0 the return from
> alloc_pages_node() is just a page list, it hasn't been formed into a
> folio.
> 
> However iommu_put_pages_list() just calls put_page() on the head page of
> an allocation, which will end up leaking the tail pages if order != 0.
> 
> Fix this by using __GFP_COMP to create a high order folio and then always
> use put_page() to free the full high order folio.
> 
> __iommu_free_account() can get the order of the allocation via
> folio_order(), which corrects the accounting of high order allocations in
> iommu_put_pages_list(). This is the same technique slub uses.
> 
> As far as I can tell, none of the places using high order allocations are
> also using the free list, so this not a current bug.
> 
> Fixes: 06c375053cef ("iommu/vt-d: add wrapper functions for page allocations")
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages()
  2025-02-25 19:39 ` [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages() Jason Gunthorpe
@ 2025-02-26  6:32   ` Baolu Lu
  2025-03-12 11:43   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  6:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> Now that we have a folio under the allocation iommu_free_pages() can know
> the order of the original allocation and do the correct thing to free it.
> 
> The next patch will rename iommu_free_page() to iommu_free_pages() so we
> have naming consistency with iommu_alloc_pages_node().
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

For changes in intel iommu driver,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/23] iommu/pages: Remove iommu_free_page()
  2025-02-25 19:39 ` [PATCH v3 06/23] iommu/pages: Remove iommu_free_page() Jason Gunthorpe
@ 2025-02-26  6:34   ` Baolu Lu
  2025-03-12 11:44   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> Use iommu_free_pages() instead.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 07/23] iommu/pages: De-inline the substantial functions
  2025-02-25 19:39 ` [PATCH v3 07/23] iommu/pages: De-inline the substantial functions Jason Gunthorpe
@ 2025-02-26  6:43   ` Baolu Lu
  2025-03-12 12:45   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  6:43 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> These are called in a lot of places and are not trivial. Move them to the
> core module.
> 
> Tidy some of the comments and function arguments, fold
> __iommu_alloc_account() into its only caller, change
> __iommu_free_account() into __iommu_free_page() to remove some
> duplication.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 09/23] iommu/pages: Formalize the freelist API
  2025-02-25 19:39 ` [PATCH v3 09/23] iommu/pages: Formalize the freelist API Jason Gunthorpe
@ 2025-02-26  6:56   ` Baolu Lu
  2025-02-26 17:31     ` Jason Gunthorpe
  0 siblings, 1 reply; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  6:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 38c65e92ecd091..e414951c0af83f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -326,6 +326,18 @@ typedef unsigned int ioasid_t;
>   /* Read but do not clear any dirty bits */
>   #define IOMMU_DIRTY_NO_CLEAR (1 << 0)
>   
> +/*
> + * Pages allocated through iommu_alloc_pages_node() can be placed on this list
> + * using iommu_pages_list_add(). Note: ONLY pages from iommu_alloc_pages_node()
> + * can be used this way!
> + */
> +struct iommu_pages_list {
> +	struct list_head pages;
> +};
> +
> +#define IOMMU_PAGES_LIST_INIT(name) \
> +	((struct iommu_pages_list){ .pages = LIST_HEAD_INIT(name.pages) })
> +
>   #ifdef CONFIG_IOMMU_API

Any reason why the above cannot be placed in the iommu-pages.h header
file? My understanding is that iommu-pages is only for the iommu drivers
and should not be accessible for external subsystems.

Thanks,
baolu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 12/23] iommu: Change iommu_iotlb_gather to use iommu_page_list
  2025-02-25 19:39 ` [PATCH v3 12/23] iommu: Change iommu_iotlb_gather to use iommu_page_list Jason Gunthorpe
@ 2025-02-26  7:02   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  7:02 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> This converts the remaining places using list of pages to the new API.
> 
> The Intel free path was shared with its gather path, so it is converted at
> the same time.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 13/23] iommu/pages: Remove iommu_put_pages_list_old and the _Generic
  2025-02-25 19:39 ` [PATCH v3 13/23] iommu/pages: Remove iommu_put_pages_list_old and the _Generic Jason Gunthorpe
@ 2025-02-26  7:04   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26  7:04 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> Nothing uses the old list_head path now, remove it.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 21/23] iommu/pages: Remove iommu_alloc_page/pages()
  2025-02-25 19:39 ` [PATCH v3 21/23] iommu/pages: Remove iommu_alloc_page/pages() Jason Gunthorpe
@ 2025-02-26  9:15   ` Marek Szyprowski
  0 siblings, 0 replies; 55+ messages in thread
From: Marek Szyprowski @ 2025-02-26  9:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	Lu Baolu, David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Hector Martin, Palmer Dabbelt,
	Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 25.02.2025 20:39, Jason Gunthorpe wrote:
> A few small changes to the remaining drivers using these will allow
> them to be removed:
>
> - Exynos wants to allocate fixed 16K/8K allocations
> - Rockchip already has a define SPAGE_SIZE which is used by the
>    dma_map immediately following, using SPAGE_ORDER which is a lg2size
> - tegra has size constants already for its two allocations
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For exynos-iommu:
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>   drivers/iommu/exynos-iommu.c   |  4 ++--
>   drivers/iommu/iommu-pages.h    | 26 --------------------------
>   drivers/iommu/rockchip-iommu.c |  6 ++++--
>   drivers/iommu/tegra-smmu.c     |  4 ++--
>   4 files changed, 8 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 1019e08b43b71c..74337081278551 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -902,11 +902,11 @@ static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device *dev)
>   	if (!domain)
>   		return NULL;
>   
> -	domain->pgtable = iommu_alloc_pages(GFP_KERNEL, 2);
> +	domain->pgtable = iommu_alloc_pages_sz(GFP_KERNEL, SZ_16K);
>   	if (!domain->pgtable)
>   		goto err_pgtable;
>   
> -	domain->lv2entcnt = iommu_alloc_pages(GFP_KERNEL, 1);
> +	domain->lv2entcnt = iommu_alloc_pages_sz(GFP_KERNEL, SZ_8K);
>   	if (!domain->lv2entcnt)
>   		goto err_counter;
>   
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index 3c4575d637da6d..4513fbc76260cd 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -100,20 +100,6 @@ static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp,
>   	return iommu_alloc_pages_node_sz(nid, gfp, 1 << (order + PAGE_SHIFT));
>   }
>   
> -/**
> - * iommu_alloc_pages - allocate a zeroed page of a given order
> - * @gfp: buddy allocator flags
> - * @order: page order
> - *
> - * returns the virtual address of the allocated page
> - * Prefer to use iommu_alloc_pages_lg2()
> - */
> -static inline void *iommu_alloc_pages(gfp_t gfp, int order)
> -{
> -	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp,
> -					 1 << (order + PAGE_SHIFT));
> -}
> -
>   /**
>    * iommu_alloc_pages_sz - Allocate a zeroed page of a given size from
>    *                          specific NUMA node
> @@ -141,16 +127,4 @@ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
>   	return iommu_alloc_pages_node_sz(nid, gfp, PAGE_SIZE);
>   }
>   
> -/**
> - * iommu_alloc_page - allocate a zeroed page
> - * @gfp: buddy allocator flags
> - *
> - * returns the virtual address of the allocated page
> - * Prefer to use iommu_alloc_pages_lg2()
> - */
> -static inline void *iommu_alloc_page(gfp_t gfp)
> -{
> -	return iommu_alloc_pages_node_sz(NUMA_NO_NODE, gfp, PAGE_SIZE);
> -}
> -
>   #endif	/* __IOMMU_PAGES_H */
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 798e85bd994d56..5af82072b03a17 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -730,7 +730,8 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
>   	if (rk_dte_is_pt_valid(dte))
>   		goto done;
>   
> -	page_table = iommu_alloc_page(GFP_ATOMIC | rk_ops->gfp_flags);
> +	page_table = iommu_alloc_pages_sz(GFP_ATOMIC | rk_ops->gfp_flags,
> +					  SPAGE_SIZE);
>   	if (!page_table)
>   		return ERR_PTR(-ENOMEM);
>   
> @@ -1064,7 +1065,8 @@ static struct iommu_domain *rk_iommu_domain_alloc_paging(struct device *dev)
>   	 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
>   	 * Allocate one 4 KiB page for each table.
>   	 */
> -	rk_domain->dt = iommu_alloc_page(GFP_KERNEL | rk_ops->gfp_flags);
> +	rk_domain->dt = iommu_alloc_pages_sz(GFP_KERNEL | rk_ops->gfp_flags,
> +					     SPAGE_SIZE);
>   	if (!rk_domain->dt)
>   		goto err_free_domain;
>   
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 844682a41afa66..a9c35efde56969 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -295,7 +295,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
>   
>   	as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
>   
> -	as->pd = iommu_alloc_page(GFP_KERNEL | __GFP_DMA);
> +	as->pd = iommu_alloc_pages_sz(GFP_KERNEL | __GFP_DMA, SMMU_SIZE_PD);
>   	if (!as->pd) {
>   		kfree(as);
>   		return NULL;
> @@ -695,7 +695,7 @@ static struct tegra_pt *as_get_pde_page(struct tegra_smmu_as *as,
>   	if (gfpflags_allow_blocking(gfp))
>   		spin_unlock_irqrestore(&as->lock, *flags);
>   
> -	pt = iommu_alloc_page(gfp | __GFP_DMA);
> +	pt = iommu_alloc_pages_sz(gfp | __GFP_DMA, SMMU_SIZE_PT);
>   
>   	if (gfpflags_allow_blocking(gfp))
>   		spin_lock_irqsave(&as->lock, *flags);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 16/23] iommu/pages: Allow sub page sizes to be passed into the allocator
  2025-02-25 19:39 ` [PATCH v3 16/23] iommu/pages: Allow sub page sizes to be passed into the allocator Jason Gunthorpe
@ 2025-02-26 12:22   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26 12:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: baolu.lu, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2025/2/26 3:39, Jason Gunthorpe wrote:
> Generally drivers have a specific idea what their HW structure size should
> be. In a lot of cases this is related to PAGE_SIZE, but not always. ARM64,
> for example, allows a 4K IO page table size on a 64K CPU page table
> system.
> 
> Currently we don't have any good support for sub page allocations, but
> make the API accommodate this by accepting a sub page size from the caller
> and rounding up internally.
> 
> This is done by moving away from order as the size input and using size:
>    size == 1 << (order + PAGE_SHIFT)
> 
> Following patches convert drivers away from using order and try to specify
> allocation sizes independent of PAGE_SIZE.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
  2025-02-25 19:39 ` [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages Jason Gunthorpe
@ 2025-02-26 12:24   ` Baolu Lu
  2025-03-12 12:59   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26 12:24 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: baolu.lu, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2025/2/26 3:39, Jason Gunthorpe wrote:
> Convert most of the places calling get_order() as an argument to the
> iommu-pages allocator into order_base_2() or the _sz flavour
> instead. These places already have an exact size, there is no particular
> reason to use order here.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
>   drivers/iommu/amd/init.c        | 29 +++++++++++++++--------------
>   drivers/iommu/intel/dmar.c      |  6 +++---
>   drivers/iommu/io-pgtable-arm.c  |  3 +--
>   drivers/iommu/io-pgtable-dart.c | 12 +++---------
>   drivers/iommu/sun50i-iommu.c    |  4 ++--
>   5 files changed, 24 insertions(+), 30 deletions(-)

For changes in intel iommu driver,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 22/23] iommu/pages: Remove iommu_alloc_page_node()
  2025-02-25 19:39 ` [PATCH v3 22/23] iommu/pages: Remove iommu_alloc_page_node() Jason Gunthorpe
@ 2025-02-26 12:26   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26 12:26 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: baolu.lu, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2025/2/26 3:39, Jason Gunthorpe wrote:
> Use iommu_alloc_pages_node_sz() instead.
> 
> AMD and Intel are both using 4K pages for these structures since those
> drivers only work on 4K PAGE_SIZE.
> 
> riscv is also spec'd to use SZ_4K.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
>   drivers/iommu/amd/io_pgtable.c    |  8 +++++---
>   drivers/iommu/amd/io_pgtable_v2.c |  4 ++--
>   drivers/iommu/amd/iommu.c         |  2 +-
>   drivers/iommu/intel/iommu.c       | 13 ++++++++-----
>   drivers/iommu/intel/pasid.c       |  3 ++-
>   drivers/iommu/iommu-pages.h       | 13 -------------
>   drivers/iommu/riscv/iommu.c       |  7 ++++---
>   7 files changed, 22 insertions(+), 28 deletions(-)

For change in intel iommu driver,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 23/23] iommu/pages: Remove iommu_alloc_pages_node()
  2025-02-25 19:39 ` [PATCH v3 23/23] iommu/pages: Remove iommu_alloc_pages_node() Jason Gunthorpe
@ 2025-02-26 12:30   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-26 12:30 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: baolu.lu, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2025/2/26 3:39, Jason Gunthorpe wrote:
> Intel is the only thing that uses this now, convert to the size versions,
> trying to avoid PAGE_SHIFT.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio
  2025-02-25 19:39 ` [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio Jason Gunthorpe
@ 2025-02-26 12:42   ` Baolu Lu
  2025-02-26 13:51     ` Jason Gunthorpe
  2025-02-27  5:17   ` Baolu Lu
  1 sibling, 1 reply; 55+ messages in thread
From: Baolu Lu @ 2025-02-26 12:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: baolu.lu, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2025/2/26 3:39, Jason Gunthorpe wrote:
> This brings the iommu page table allocator into the modern world of having
> its own private page descriptor and not re-using fields from struct page
> for its own purpose. It follows the basic pattern of struct ptdesc which
> did this transformation for the CPU page table allocator.
> 
> Currently iommu-pages is pretty basic so this isn't a huge benefit,
> however I see a coming need for features that CPU allocator has, like sub
> PAGE_SIZE allocations, and RCU freeing. This provides the base
> infrastructure to implement those cleanly.

I understand that this is intended as the start point of having private
descriptors for folios allocated to iommu drivers. But I don't believe
this is currently the case after this patch, as the underlying memory
remains a struct folio. This patch merely uses an iommu-pages specific
structure pointer to reference it.

Could you please elaborate a bit on the future plans that would make it
a true implementation of iommu private page descriptors?

> 
> Remove numa_node_id() calls from the inlines and instead use NUMA_NO_NODE
> which will get switched to numa_mem_id(), which seems to be the right ID
> to use for memory allocations.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
>   drivers/iommu/iommu-pages.c | 54 ++++++++++++++++++++++++++-----------
>   drivers/iommu/iommu-pages.h | 43 ++++++++++++++++++++++++++---
>   2 files changed, 78 insertions(+), 19 deletions(-)

Thanks,
baolu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio
  2025-02-26 12:42   ` Baolu Lu
@ 2025-02-26 13:51     ` Jason Gunthorpe
  2025-02-27  5:17       ` Baolu Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 13:51 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, David Woodhouse,
	Heiko Stuebner, iommu, Jernej Skrabec, Jonathan Hunter,
	Joerg Roedel, Krzysztof Kozlowski, linux-arm-kernel, linux-riscv,
	linux-rockchip, linux-samsung-soc, linux-sunxi, linux-tegra,
	Marek Szyprowski, Hector Martin, Palmer Dabbelt, Paul Walmsley,
	Robin Murphy, Samuel Holland, Suravee Suthikulpanit, Sven Peter,
	Thierry Reding, Tomasz Jeznach, Krishna Reddy, Chen-Yu Tsai,
	Will Deacon, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On Wed, Feb 26, 2025 at 08:42:23PM +0800, Baolu Lu wrote:
> On 2025/2/26 3:39, Jason Gunthorpe wrote:
> > This brings the iommu page table allocator into the modern world of having
> > its own private page descriptor and not re-using fields from struct page
> > for its own purpose. It follows the basic pattern of struct ptdesc which
> > did this transformation for the CPU page table allocator.
> > 
> > Currently iommu-pages is pretty basic so this isn't a huge benefit,
> > however I see a coming need for features that CPU allocator has, like sub
> > PAGE_SIZE allocations, and RCU freeing. This provides the base
> > infrastructure to implement those cleanly.
> 
> I understand that this is intended as the start point of having private
> descriptors for folios allocated to iommu drivers. But I don't believe
> this is currently the case after this patch, as the underlying memory
> remains a struct folio. This patch merely uses an iommu-pages specific
> structure pointer to reference it.

Right now the mm provides 64 bytes of per-page memory that is a struct
page.

You can call that 64 bytes a struct folio sometimes, and we have now
been also calling those bytes a struct XXdesc like this patch does.

This is all a slow incremental evolution toward giving each user of
the per-page memory its own unique type and understanding of what it
needs while removing use of of the highly overloaded struct page.

Eventually Matthew wants to drop the 64 bytes down to 8 bytes and
allocate the per-page memory directly. This would allow each user to
use more/less memory depending on their need.

https://kernelnewbies.org/MatthewWilcox/Memdescs

When that happens the 

	folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);

Will turn into something maybe more like:

   ioptdesc = memdesc_alloc_node(gfp, order, nid, sizeof(struct ioptdesc));

And then the folio word would disappear from this code.

Right now things are going down Matthew's list:

https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

This series is part of "Remove page->lru uses"

Jason


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 09/23] iommu/pages: Formalize the freelist API
  2025-02-26  6:56   ` Baolu Lu
@ 2025-02-26 17:31     ` Jason Gunthorpe
  2025-02-27  5:11       ` Baolu Lu
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 17:31 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, David Woodhouse,
	Heiko Stuebner, iommu, Jernej Skrabec, Jonathan Hunter,
	Joerg Roedel, Krzysztof Kozlowski, linux-arm-kernel, linux-riscv,
	linux-rockchip, linux-samsung-soc, linux-sunxi, linux-tegra,
	Marek Szyprowski, Hector Martin, Palmer Dabbelt, Paul Walmsley,
	Robin Murphy, Samuel Holland, Suravee Suthikulpanit, Sven Peter,
	Thierry Reding, Tomasz Jeznach, Krishna Reddy, Chen-Yu Tsai,
	Will Deacon, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On Wed, Feb 26, 2025 at 02:56:29PM +0800, Baolu Lu wrote:
> On 2/26/25 03:39, Jason Gunthorpe wrote:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 38c65e92ecd091..e414951c0af83f 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -326,6 +326,18 @@ typedef unsigned int ioasid_t;
> >   /* Read but do not clear any dirty bits */
> >   #define IOMMU_DIRTY_NO_CLEAR (1 << 0)
> > +/*
> > + * Pages allocated through iommu_alloc_pages_node() can be placed on this list
> > + * using iommu_pages_list_add(). Note: ONLY pages from iommu_alloc_pages_node()
> > + * can be used this way!
> > + */
> > +struct iommu_pages_list {
> > +	struct list_head pages;
> > +};
> > +
> > +#define IOMMU_PAGES_LIST_INIT(name) \
> > +	((struct iommu_pages_list){ .pages = LIST_HEAD_INIT(name.pages) })
> > +
> >   #ifdef CONFIG_IOMMU_API
> 
> Any reason why the above cannot be placed in the iommu-pages.h header
> file? My understanding is that iommu-pages is only for the iommu drivers
> and should not be accessible for external subsystems.

I wanted to do that, but the issue is the gather:

struct iommu_iotlb_gather {
	unsigned long		start;
	unsigned long		end;
	size_t			pgsize;
	struct iommu_pages_list	freelist;

The struct is inlined so it must be declared. I do not want to include
iommu-pages.h in this header.

Once the struct itself is there it made sense to include the INIT too.

FWIW I have a longstanding desire to split iommu.h into
internal-driver-facing and external-user-facing files..

Thanks,
Jason


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 09/23] iommu/pages: Formalize the freelist API
  2025-02-26 17:31     ` Jason Gunthorpe
@ 2025-02-27  5:11       ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-27  5:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, David Woodhouse,
	Heiko Stuebner, iommu, Jernej Skrabec, Jonathan Hunter,
	Joerg Roedel, Krzysztof Kozlowski, linux-arm-kernel, linux-riscv,
	linux-rockchip, linux-samsung-soc, linux-sunxi, linux-tegra,
	Marek Szyprowski, Hector Martin, Palmer Dabbelt, Paul Walmsley,
	Robin Murphy, Samuel Holland, Suravee Suthikulpanit, Sven Peter,
	Thierry Reding, Tomasz Jeznach, Krishna Reddy, Chen-Yu Tsai,
	Will Deacon, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/27/25 01:31, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 02:56:29PM +0800, Baolu Lu wrote:
>> On 2/26/25 03:39, Jason Gunthorpe wrote:
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index 38c65e92ecd091..e414951c0af83f 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -326,6 +326,18 @@ typedef unsigned int ioasid_t;
>>>    /* Read but do not clear any dirty bits */
>>>    #define IOMMU_DIRTY_NO_CLEAR (1 << 0)
>>> +/*
>>> + * Pages allocated through iommu_alloc_pages_node() can be placed on this list
>>> + * using iommu_pages_list_add(). Note: ONLY pages from iommu_alloc_pages_node()
>>> + * can be used this way!
>>> + */
>>> +struct iommu_pages_list {
>>> +	struct list_head pages;
>>> +};
>>> +
>>> +#define IOMMU_PAGES_LIST_INIT(name) \
>>> +	((struct iommu_pages_list){ .pages = LIST_HEAD_INIT(name.pages) })
>>> +
>>>    #ifdef CONFIG_IOMMU_API
>> Any reason why the above cannot be placed in the iommu-pages.h header
>> file? My understanding is that iommu-pages is only for the iommu drivers
>> and should not be accessible for external subsystems.
> I wanted to do that, but the issue is the gather:
> 
> struct iommu_iotlb_gather {
> 	unsigned long		start;
> 	unsigned long		end;
> 	size_t			pgsize;
> 	struct iommu_pages_list	freelist;
> 
> The struct is inlined so it must be declared. I do not want to include
> iommu-pages.h in this header.
> 
> Once the struct itself is there it made sense to include the INIT too.
> 
> FWIW I have a longstanding desire to split iommu.h into
> internal-driver-facing and external-user-facing files..

Okay, thanks!


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio
  2025-02-26 13:51     ` Jason Gunthorpe
@ 2025-02-27  5:17       ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-27  5:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, David Woodhouse,
	Heiko Stuebner, iommu, Jernej Skrabec, Jonathan Hunter,
	Joerg Roedel, Krzysztof Kozlowski, linux-arm-kernel, linux-riscv,
	linux-rockchip, linux-samsung-soc, linux-sunxi, linux-tegra,
	Marek Szyprowski, Hector Martin, Palmer Dabbelt, Paul Walmsley,
	Robin Murphy, Samuel Holland, Suravee Suthikulpanit, Sven Peter,
	Thierry Reding, Tomasz Jeznach, Krishna Reddy, Chen-Yu Tsai,
	Will Deacon, Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 21:51, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 08:42:23PM +0800, Baolu Lu wrote:
>> On 2025/2/26 3:39, Jason Gunthorpe wrote:
>>> This brings the iommu page table allocator into the modern world of having
>>> its own private page descriptor and not re-using fields from struct page
>>> for its own purpose. It follows the basic pattern of struct ptdesc which
>>> did this transformation for the CPU page table allocator.
>>>
>>> Currently iommu-pages is pretty basic so this isn't a huge benefit,
>>> however I see a coming need for features that CPU allocator has, like sub
>>> PAGE_SIZE allocations, and RCU freeing. This provides the base
>>> infrastructure to implement those cleanly.
>> I understand that this is intended as the start point of having private
>> descriptors for folios allocated to iommu drivers. But I don't believe
>> this is currently the case after this patch, as the underlying memory
>> remains a struct folio. This patch merely uses an iommu-pages specific
>> structure pointer to reference it.
> Right now the mm provides 64 bytes of per-page memory that is a struct
> page.
> 
> You can call that 64 bytes a struct folio sometimes, and we have now
> been also calling those bytes a struct XXdesc like this patch does.
> 
> This is all a slow incremental evolution toward giving each user of
> the per-page memory its own unique type and understanding of what it
> needs while removing use of of the highly overloaded struct page.
> 
> Eventually Matthew wants to drop the 64 bytes down to 8 bytes and
> allocate the per-page memory directly. This would allow each user to
> use more/less memory depending on their need.
> 
> https://kernelnewbies.org/MatthewWilcox/Memdescs
> 
> When that happens the
> 
> 	folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);
> 
> Will turn into something maybe more like:
> 
>     ioptdesc = memdesc_alloc_node(gfp, order, nid, sizeof(struct ioptdesc));
> 
> And then the folio word would disappear from this code.
> 
> Right now things are going down Matthew's list:
> 
> https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
> 
> This series is part of "Remove page->lru uses"

Cool! Thank you for the explanation.

Thanks,
baolu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio
  2025-02-25 19:39 ` [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio Jason Gunthorpe
  2025-02-26 12:42   ` Baolu Lu
@ 2025-02-27  5:17   ` Baolu Lu
  1 sibling, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-02-27  5:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> This brings the iommu page table allocator into the modern world of having
> its own private page descriptor and not re-using fields from struct page
> for its own purpose. It follows the basic pattern of struct ptdesc which
> did this transformation for the CPU page table allocator.
> 
> Currently iommu-pages is pretty basic so this isn't a huge benefit,
> however I see a coming need for features that CPU allocator has, like sub
> PAGE_SIZE allocations, and RCU freeing. This provides the base
> infrastructure to implement those cleanly.
> 
> Remove numa_node_id() calls from the inlines and instead use NUMA_NO_NODE
> which will get switched to numa_mem_id(), which seems to be the right ID
> to use for memory allocations.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 08/23] iommu/vtd: Use virt_to_phys()
  2025-02-25 19:39 ` [PATCH v3 08/23] iommu/vtd: Use virt_to_phys() Jason Gunthorpe
@ 2025-03-10  2:21   ` Baolu Lu
  0 siblings, 0 replies; 55+ messages in thread
From: Baolu Lu @ 2025-03-10  2:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon
  Cc: Bagas Sanjaya, Joerg Roedel, Pasha Tatashin, patches,
	David Rientjes, Matthew Wilcox

On 2/26/25 03:39, Jason Gunthorpe wrote:
> If all the inlines are unwound virt_to_dma_pfn() is simply:
>     return page_to_pfn(virt_to_page(p)) << (PAGE_SHIFT - VTD_PAGE_SHIFT);
> 
> Which can be re-arranged to:
>     (page_to_pfn(virt_to_page(p)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT
> 
> The only caller is:
>     ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT)
> 
> re-arranged to:
>     ((page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT) >> VTD_PAGE_SHIFT) << VTD_PAGE_SHIFT
> 
> Which simplifies to:
>     page_to_pfn(virt_to_page(tmp_page)) << PAGE_SHIFT
> 
> That is the same as virt_to_phys(tmp_page), so just remove all of this.
> 
> Reviewed-by: Lu Baolu<baolu.lu@linux.intel.com>
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
>   drivers/iommu/intel/iommu.c |  3 ++-
>   drivers/iommu/intel/iommu.h | 19 -------------------
>   2 files changed, 2 insertions(+), 20 deletions(-)

Queued this cleanup patch for iommu/vt-d.

Thanks,
baolu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages()
  2025-02-25 19:39 ` [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages() Jason Gunthorpe
  2025-02-26  6:25   ` Baolu Lu
@ 2025-03-12 11:43   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-12 11:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:20PM -0400, Jason Gunthorpe wrote:
> These were only used by tegra-smmu and leaked the struct page out of the
> API. Delete them since tega-smmu has been converted to the other APIs.
> 
> In the process flatten the call tree so we have fewer one line functions
> calling other one line functions.. iommu_alloc_pages_node() is the real
> allocator and everything else can just call it directly.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>

> ---
>  drivers/iommu/iommu-pages.h | 49 ++++++-------------------------------
>  1 file changed, 7 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index 82ebf00330811c..0ca2437989a0e1 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -46,40 +46,6 @@ static inline void __iommu_free_account(struct page *page, int order)
>  	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
>  }
>  
> -/**
> - * __iommu_alloc_pages - allocate a zeroed page of a given order.
> - * @gfp: buddy allocator flags
> - * @order: page order
> - *
> - * returns the head struct page of the allocated page.
> - */
> -static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
> -{
> -	struct page *page;
> -
> -	page = alloc_pages(gfp | __GFP_ZERO, order);
> -	if (unlikely(!page))
> -		return NULL;
> -
> -	__iommu_alloc_account(page, order);
> -
> -	return page;
> -}
> -
> -/**
> - * __iommu_free_pages - free page of a given order
> - * @page: head struct page of the page
> - * @order: page order
> - */
> -static inline void __iommu_free_pages(struct page *page, int order)
> -{
> -	if (!page)
> -		return;
> -
> -	__iommu_free_account(page, order);
> -	__free_pages(page, order);
> -}
> -
>  /**
>   * iommu_alloc_pages_node - allocate a zeroed page of a given order from
>   * specific NUMA node.
> @@ -110,12 +76,7 @@ static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
>   */
>  static inline void *iommu_alloc_pages(gfp_t gfp, int order)
>  {
> -	struct page *page = __iommu_alloc_pages(gfp, order);
> -
> -	if (unlikely(!page))
> -		return NULL;
> -
> -	return page_address(page);
> +	return iommu_alloc_pages_node(numa_node_id(), gfp, order);
>  }
>  
>  /**
> @@ -138,7 +99,7 @@ static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
>   */
>  static inline void *iommu_alloc_page(gfp_t gfp)
>  {
> -	return iommu_alloc_pages(gfp, 0);
> +	return iommu_alloc_pages_node(numa_node_id(), gfp, 0);
>  }
>  
>  /**
> @@ -148,10 +109,14 @@ static inline void *iommu_alloc_page(gfp_t gfp)
>   */
>  static inline void iommu_free_pages(void *virt, int order)
>  {
> +	struct page *page;
> +
>  	if (!virt)
>  		return;
>  
> -	__iommu_free_pages(virt_to_page(virt), order);
> +	page = virt_to_page(virt);
> +	__iommu_free_account(page, order);
> +	__free_pages(page, order);
>  }
>  
>  /**
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages()
  2025-02-25 19:39 ` [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages() Jason Gunthorpe
  2025-02-26  6:32   ` Baolu Lu
@ 2025-03-12 11:43   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-12 11:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:22PM -0400, Jason Gunthorpe wrote:
> Now that we have a folio under the allocation iommu_free_pages() can know
> the order of the original allocation and do the correct thing to free it.
> 
> The next patch will rename iommu_free_page() to iommu_free_pages() so we
> have naming consistency with iommu_alloc_pages_node().
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>

> ---
>  drivers/iommu/amd/init.c            | 28 +++++++++++-----------------
>  drivers/iommu/amd/ppr.c             |  2 +-
>  drivers/iommu/exynos-iommu.c        |  8 ++++----
>  drivers/iommu/intel/irq_remapping.c |  4 ++--
>  drivers/iommu/intel/pasid.c         |  3 +--
>  drivers/iommu/intel/pasid.h         |  1 -
>  drivers/iommu/intel/prq.c           |  4 ++--
>  drivers/iommu/io-pgtable-arm.c      |  4 ++--
>  drivers/iommu/io-pgtable-dart.c     | 10 ++++------
>  drivers/iommu/iommu-pages.h         |  9 +++++----
>  drivers/iommu/riscv/iommu.c         |  6 ++----
>  drivers/iommu/sun50i-iommu.c        |  2 +-
>  12 files changed, 35 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index c5cd92edada061..f47ff0e0c75f4e 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -653,8 +653,7 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
>  
>  static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
>  {
> -	iommu_free_pages(pci_seg->dev_table,
> -			 get_order(pci_seg->dev_table_size));
> +	iommu_free_pages(pci_seg->dev_table);
>  	pci_seg->dev_table = NULL;
>  }
>  
> @@ -671,8 +670,7 @@ static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
>  
>  static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
>  {
> -	iommu_free_pages(pci_seg->rlookup_table,
> -			 get_order(pci_seg->rlookup_table_size));
> +	iommu_free_pages(pci_seg->rlookup_table);
>  	pci_seg->rlookup_table = NULL;
>  }
>  
> @@ -691,8 +689,7 @@ static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_se
>  static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
>  {
>  	kmemleak_free(pci_seg->irq_lookup_table);
> -	iommu_free_pages(pci_seg->irq_lookup_table,
> -			 get_order(pci_seg->rlookup_table_size));
> +	iommu_free_pages(pci_seg->irq_lookup_table);
>  	pci_seg->irq_lookup_table = NULL;
>  }
>  
> @@ -716,8 +713,7 @@ static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
>  
>  static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>  {
> -	iommu_free_pages(pci_seg->alias_table,
> -			 get_order(pci_seg->alias_table_size));
> +	iommu_free_pages(pci_seg->alias_table);
>  	pci_seg->alias_table = NULL;
>  }
>  
> @@ -826,7 +822,7 @@ static void iommu_disable_command_buffer(struct amd_iommu *iommu)
>  
>  static void __init free_command_buffer(struct amd_iommu *iommu)
>  {
> -	iommu_free_pages(iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
> +	iommu_free_pages(iommu->cmd_buf);
>  }
>  
>  void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
> @@ -838,7 +834,7 @@ void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
>  	if (buf &&
>  	    check_feature(FEATURE_SNP) &&
>  	    set_memory_4k((unsigned long)buf, (1 << order))) {
> -		iommu_free_pages(buf, order);
> +		iommu_free_pages(buf);
>  		buf = NULL;
>  	}
>  
> @@ -882,14 +878,14 @@ static void iommu_disable_event_buffer(struct amd_iommu *iommu)
>  
>  static void __init free_event_buffer(struct amd_iommu *iommu)
>  {
> -	iommu_free_pages(iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
> +	iommu_free_pages(iommu->evt_buf);
>  }
>  
>  static void free_ga_log(struct amd_iommu *iommu)
>  {
>  #ifdef CONFIG_IRQ_REMAP
> -	iommu_free_pages(iommu->ga_log, get_order(GA_LOG_SIZE));
> -	iommu_free_pages(iommu->ga_log_tail, get_order(8));
> +	iommu_free_pages(iommu->ga_log);
> +	iommu_free_pages(iommu->ga_log_tail);
>  #endif
>  }
>  
> @@ -2781,8 +2777,7 @@ static void early_enable_iommus(void)
>  
>  		for_each_pci_segment(pci_seg) {
>  			if (pci_seg->old_dev_tbl_cpy != NULL) {
> -				iommu_free_pages(pci_seg->old_dev_tbl_cpy,
> -						 get_order(pci_seg->dev_table_size));
> +				iommu_free_pages(pci_seg->old_dev_tbl_cpy);
>  				pci_seg->old_dev_tbl_cpy = NULL;
>  			}
>  		}
> @@ -2795,8 +2790,7 @@ static void early_enable_iommus(void)
>  		pr_info("Copied DEV table from previous kernel.\n");
>  
>  		for_each_pci_segment(pci_seg) {
> -			iommu_free_pages(pci_seg->dev_table,
> -					 get_order(pci_seg->dev_table_size));
> +			iommu_free_pages(pci_seg->dev_table);
>  			pci_seg->dev_table = pci_seg->old_dev_tbl_cpy;
>  		}
>  
> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
> index 7c67d69f0b8cad..e6767c057d01fa 100644
> --- a/drivers/iommu/amd/ppr.c
> +++ b/drivers/iommu/amd/ppr.c
> @@ -48,7 +48,7 @@ void amd_iommu_enable_ppr_log(struct amd_iommu *iommu)
>  
>  void __init amd_iommu_free_ppr_log(struct amd_iommu *iommu)
>  {
> -	iommu_free_pages(iommu->ppr_log, get_order(PPR_LOG_SIZE));
> +	iommu_free_pages(iommu->ppr_log);
>  }
>  
>  /*
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index c666ecab955d21..1019e08b43b71c 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -932,9 +932,9 @@ static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device *dev)
>  	return &domain->domain;
>  
>  err_lv2ent:
> -	iommu_free_pages(domain->lv2entcnt, 1);
> +	iommu_free_pages(domain->lv2entcnt);
>  err_counter:
> -	iommu_free_pages(domain->pgtable, 2);
> +	iommu_free_pages(domain->pgtable);
>  err_pgtable:
>  	kfree(domain);
>  	return NULL;
> @@ -975,8 +975,8 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain)
>  					phys_to_virt(base));
>  		}
>  
> -	iommu_free_pages(domain->pgtable, 2);
> -	iommu_free_pages(domain->lv2entcnt, 1);
> +	iommu_free_pages(domain->pgtable);
> +	iommu_free_pages(domain->lv2entcnt);
>  	kfree(domain);
>  }
>  
> diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
> index ad795c772f21b5..d6b796f8f100cd 100644
> --- a/drivers/iommu/intel/irq_remapping.c
> +++ b/drivers/iommu/intel/irq_remapping.c
> @@ -620,7 +620,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
>  out_free_bitmap:
>  	bitmap_free(bitmap);
>  out_free_pages:
> -	iommu_free_pages(ir_table_base, INTR_REMAP_PAGE_ORDER);
> +	iommu_free_pages(ir_table_base);
>  out_free_table:
>  	kfree(ir_table);
>  
> @@ -641,7 +641,7 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
>  			irq_domain_free_fwnode(fn);
>  			iommu->ir_domain = NULL;
>  		}
> -		iommu_free_pages(iommu->ir_table->base, INTR_REMAP_PAGE_ORDER);
> +		iommu_free_pages(iommu->ir_table->base);
>  		bitmap_free(iommu->ir_table->bitmap);
>  		kfree(iommu->ir_table);
>  		iommu->ir_table = NULL;
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index fb59a7d35958f5..00da94b1c4c907 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -67,7 +67,6 @@ int intel_pasid_alloc_table(struct device *dev)
>  	}
>  
>  	pasid_table->table = dir;
> -	pasid_table->order = order;
>  	pasid_table->max_pasid = 1 << (order + PAGE_SHIFT + 3);
>  	info->pasid_table = pasid_table;
>  
> @@ -100,7 +99,7 @@ void intel_pasid_free_table(struct device *dev)
>  		iommu_free_page(table);
>  	}
>  
> -	iommu_free_pages(pasid_table->table, pasid_table->order);
> +	iommu_free_pages(pasid_table->table);
>  	kfree(pasid_table);
>  }
>  
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index 668d8ece6b143c..fd0fd1a0df84cc 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -47,7 +47,6 @@ struct pasid_entry {
>  /* The representative of a PASID table */
>  struct pasid_table {
>  	void			*table;		/* pasid table pointer */
> -	int			order;		/* page order of pasid table */
>  	u32			max_pasid;	/* max pasid */
>  };
>  
> diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
> index c2d792db52c3e2..01ecafed31453c 100644
> --- a/drivers/iommu/intel/prq.c
> +++ b/drivers/iommu/intel/prq.c
> @@ -338,7 +338,7 @@ int intel_iommu_enable_prq(struct intel_iommu *iommu)
>  	dmar_free_hwirq(irq);
>  	iommu->pr_irq = 0;
>  free_prq:
> -	iommu_free_pages(iommu->prq, PRQ_ORDER);
> +	iommu_free_pages(iommu->prq);
>  	iommu->prq = NULL;
>  
>  	return ret;
> @@ -361,7 +361,7 @@ int intel_iommu_finish_prq(struct intel_iommu *iommu)
>  		iommu->iopf_queue = NULL;
>  	}
>  
> -	iommu_free_pages(iommu->prq, PRQ_ORDER);
> +	iommu_free_pages(iommu->prq);
>  	iommu->prq = NULL;
>  
>  	return 0;
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 7632c80edea63a..62df2528d020b2 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -300,7 +300,7 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>  	if (cfg->free)
>  		cfg->free(cookie, pages, size);
>  	else
> -		iommu_free_pages(pages, order);
> +		iommu_free_pages(pages);
>  
>  	return NULL;
>  }
> @@ -316,7 +316,7 @@ static void __arm_lpae_free_pages(void *pages, size_t size,
>  	if (cfg->free)
>  		cfg->free(cookie, pages, size);
>  	else
> -		iommu_free_pages(pages, get_order(size));
> +		iommu_free_pages(pages);
>  }
>  
>  static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index c004640640ee50..7efcaea0bd5c86 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -262,7 +262,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  
>  		pte = dart_install_table(cptep, ptep, 0, data);
>  		if (pte)
> -			iommu_free_pages(cptep, get_order(tblsz));
> +			iommu_free_pages(cptep);
>  
>  		/* L2 table is present (now) */
>  		pte = READ_ONCE(*ptep);
> @@ -423,8 +423,7 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  out_free_data:
>  	while (--i >= 0) {
> -		iommu_free_pages(data->pgd[i],
> -				 get_order(DART_GRANULE(data)));
> +		iommu_free_pages(data->pgd[i]);
>  	}
>  	kfree(data);
>  	return NULL;
> @@ -433,7 +432,6 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>  static void apple_dart_free_pgtable(struct io_pgtable *iop)
>  {
>  	struct dart_io_pgtable *data = io_pgtable_to_data(iop);
> -	int order = get_order(DART_GRANULE(data));
>  	dart_iopte *ptep, *end;
>  	int i;
>  
> @@ -445,9 +443,9 @@ static void apple_dart_free_pgtable(struct io_pgtable *iop)
>  			dart_iopte pte = *ptep++;
>  
>  			if (pte)
> -				iommu_free_pages(iopte_deref(pte, data), order);
> +				iommu_free_pages(iopte_deref(pte, data));
>  		}
> -		iommu_free_pages(data->pgd[i], order);
> +		iommu_free_pages(data->pgd[i]);
>  	}
>  
>  	kfree(data);
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index 26b91940bdc146..88587da1782b94 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -105,11 +105,12 @@ static inline void *iommu_alloc_page(gfp_t gfp)
>  }
>  
>  /**
> - * iommu_free_pages - free page of a given order
> + * iommu_free_pages - free pages
>   * @virt: virtual address of the page to be freed.
> - * @order: page order
> + *
> + * The page must have have been allocated by iommu_alloc_pages_node()
>   */
> -static inline void iommu_free_pages(void *virt, int order)
> +static inline void iommu_free_pages(void *virt)
>  {
>  	struct page *page;
>  
> @@ -127,7 +128,7 @@ static inline void iommu_free_pages(void *virt, int order)
>   */
>  static inline void iommu_free_page(void *virt)
>  {
> -	iommu_free_pages(virt, 0);
> +	iommu_free_pages(virt);
>  }
>  
>  /**
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 8f049d4a0e2cb8..1868468d018a28 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -48,14 +48,13 @@ static DEFINE_IDA(riscv_iommu_pscids);
>  /* Device resource-managed allocations */
>  struct riscv_iommu_devres {
>  	void *addr;
> -	int order;
>  };
>  
>  static void riscv_iommu_devres_pages_release(struct device *dev, void *res)
>  {
>  	struct riscv_iommu_devres *devres = res;
>  
> -	iommu_free_pages(devres->addr, devres->order);
> +	iommu_free_pages(devres->addr);
>  }
>  
>  static int riscv_iommu_devres_pages_match(struct device *dev, void *res, void *p)
> @@ -80,12 +79,11 @@ static void *riscv_iommu_get_pages(struct riscv_iommu_device *iommu, int order)
>  			      sizeof(struct riscv_iommu_devres), GFP_KERNEL);
>  
>  	if (unlikely(!devres)) {
> -		iommu_free_pages(addr, order);
> +		iommu_free_pages(addr);
>  		return NULL;
>  	}
>  
>  	devres->addr = addr;
> -	devres->order = order;
>  
>  	devres_add(iommu->dev, devres);
>  
> diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
> index 8d8f11854676c0..6385560dbc3fb0 100644
> --- a/drivers/iommu/sun50i-iommu.c
> +++ b/drivers/iommu/sun50i-iommu.c
> @@ -713,7 +713,7 @@ static void sun50i_iommu_domain_free(struct iommu_domain *domain)
>  {
>  	struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain);
>  
> -	iommu_free_pages(sun50i_domain->dt, get_order(DT_SIZE));
> +	iommu_free_pages(sun50i_domain->dt);
>  	sun50i_domain->dt = NULL;
>  
>  	kfree(sun50i_domain);
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/23] iommu/pages: Remove iommu_free_page()
  2025-02-25 19:39 ` [PATCH v3 06/23] iommu/pages: Remove iommu_free_page() Jason Gunthorpe
  2025-02-26  6:34   ` Baolu Lu
@ 2025-03-12 11:44   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-12 11:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:23PM -0400, Jason Gunthorpe wrote:
> Use iommu_free_pages() instead.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>

> ---
>  drivers/iommu/amd/init.c          |  2 +-
>  drivers/iommu/amd/io_pgtable.c    |  4 ++--
>  drivers/iommu/amd/io_pgtable_v2.c |  8 ++++----
>  drivers/iommu/amd/iommu.c         |  4 ++--
>  drivers/iommu/intel/dmar.c        |  4 ++--
>  drivers/iommu/intel/iommu.c       | 12 ++++++------
>  drivers/iommu/intel/pasid.c       |  4 ++--
>  drivers/iommu/iommu-pages.h       |  9 ---------
>  drivers/iommu/riscv/iommu.c       |  6 +++---
>  drivers/iommu/rockchip-iommu.c    |  8 ++++----
>  drivers/iommu/tegra-smmu.c        | 12 ++++++------
>  11 files changed, 32 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index f47ff0e0c75f4e..73ebcb958ad864 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -955,7 +955,7 @@ static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>  static void __init free_cwwb_sem(struct amd_iommu *iommu)
>  {
>  	if (iommu->cmd_sem)
> -		iommu_free_page((void *)iommu->cmd_sem);
> +		iommu_free_pages((void *)iommu->cmd_sem);
>  }
>  
>  static void iommu_enable_xt(struct amd_iommu *iommu)
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index f3399087859fd1..025d8a3fe9cb78 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -153,7 +153,7 @@ static bool increase_address_space(struct amd_io_pgtable *pgtable,
>  
>  out:
>  	spin_unlock_irqrestore(&domain->lock, flags);
> -	iommu_free_page(pte);
> +	iommu_free_pages(pte);
>  
>  	return ret;
>  }
> @@ -229,7 +229,7 @@ static u64 *alloc_pte(struct amd_io_pgtable *pgtable,
>  
>  			/* pte could have been changed somewhere. */
>  			if (!try_cmpxchg64(pte, &__pte, __npte))
> -				iommu_free_page(page);
> +				iommu_free_pages(page);
>  			else if (IOMMU_PTE_PRESENT(__pte))
>  				*updated = true;
>  
> diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
> index c616de2c5926ec..cce3fc9861ef77 100644
> --- a/drivers/iommu/amd/io_pgtable_v2.c
> +++ b/drivers/iommu/amd/io_pgtable_v2.c
> @@ -121,10 +121,10 @@ static void free_pgtable(u64 *pt, int level)
>  		if (level > 2)
>  			free_pgtable(p, level - 1);
>  		else
> -			iommu_free_page(p);
> +			iommu_free_pages(p);
>  	}
>  
> -	iommu_free_page(pt);
> +	iommu_free_pages(pt);
>  }
>  
>  /* Allocate page table */
> @@ -159,7 +159,7 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
>  			__npte = set_pgtable_attr(page);
>  			/* pte could have been changed somewhere. */
>  			if (!try_cmpxchg64(pte, &__pte, __npte))
> -				iommu_free_page(page);
> +				iommu_free_pages(page);
>  			else if (IOMMU_PTE_PRESENT(__pte))
>  				*updated = true;
>  
> @@ -181,7 +181,7 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
>  		if (pg_size == IOMMU_PAGE_SIZE_1G)
>  			free_pgtable(__pte, end_level - 1);
>  		else if (pg_size == IOMMU_PAGE_SIZE_2M)
> -			iommu_free_page(__pte);
> +			iommu_free_pages(__pte);
>  	}
>  
>  	return pte;
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index b48a72bd7b23df..e23d104d177ad9 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1812,7 +1812,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
>  
>  		ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
>  
> -		iommu_free_page(ptr);
> +		iommu_free_pages(ptr);
>  	}
>  }
>  
> @@ -1845,7 +1845,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
>  	/* Free per device domain ID */
>  	pdom_id_free(gcr3_info->domid);
>  
> -	iommu_free_page(gcr3_info->gcr3_tbl);
> +	iommu_free_pages(gcr3_info->gcr3_tbl);
>  	gcr3_info->gcr3_tbl = NULL;
>  }
>  
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 9f424acf474e94..c812c83d77da10 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1187,7 +1187,7 @@ static void free_iommu(struct intel_iommu *iommu)
>  	}
>  
>  	if (iommu->qi) {
> -		iommu_free_page(iommu->qi->desc);
> +		iommu_free_pages(iommu->qi->desc);
>  		kfree(iommu->qi->desc_status);
>  		kfree(iommu->qi);
>  	}
> @@ -1714,7 +1714,7 @@ int dmar_enable_qi(struct intel_iommu *iommu)
>  
>  	qi->desc_status = kcalloc(QI_LENGTH, sizeof(int), GFP_ATOMIC);
>  	if (!qi->desc_status) {
> -		iommu_free_page(qi->desc);
> +		iommu_free_pages(qi->desc);
>  		kfree(qi);
>  		iommu->qi = NULL;
>  		return -ENOMEM;
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cc46098f875b16..1e73bfa00329ae 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -571,17 +571,17 @@ static void free_context_table(struct intel_iommu *iommu)
>  	for (i = 0; i < ROOT_ENTRY_NR; i++) {
>  		context = iommu_context_addr(iommu, i, 0, 0);
>  		if (context)
> -			iommu_free_page(context);
> +			iommu_free_pages(context);
>  
>  		if (!sm_supported(iommu))
>  			continue;
>  
>  		context = iommu_context_addr(iommu, i, 0x80, 0);
>  		if (context)
> -			iommu_free_page(context);
> +			iommu_free_pages(context);
>  	}
>  
> -	iommu_free_page(iommu->root_entry);
> +	iommu_free_pages(iommu->root_entry);
>  	iommu->root_entry = NULL;
>  }
>  
> @@ -744,7 +744,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
>  			tmp = 0ULL;
>  			if (!try_cmpxchg64(&pte->val, &tmp, pteval))
>  				/* Someone else set it while we were thinking; use theirs. */
> -				iommu_free_page(tmp_page);
> +				iommu_free_pages(tmp_page);
>  			else
>  				domain_flush_cache(domain, pte, sizeof(*pte));
>  		}
> @@ -857,7 +857,7 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level,
>  		      last_pfn < level_pfn + level_size(level) - 1)) {
>  			dma_clear_pte(pte);
>  			domain_flush_cache(domain, pte, sizeof(*pte));
> -			iommu_free_page(level_pte);
> +			iommu_free_pages(level_pte);
>  		}
>  next:
>  		pfn += level_size(level);
> @@ -881,7 +881,7 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain,
>  
>  	/* free pgd */
>  	if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
> -		iommu_free_page(domain->pgd);
> +		iommu_free_pages(domain->pgd);
>  		domain->pgd = NULL;
>  	}
>  }
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 00da94b1c4c907..4249f12db7fc43 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -96,7 +96,7 @@ void intel_pasid_free_table(struct device *dev)
>  	max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
>  	for (i = 0; i < max_pde; i++) {
>  		table = get_pasid_table_from_pde(&dir[i]);
> -		iommu_free_page(table);
> +		iommu_free_pages(table);
>  	}
>  
>  	iommu_free_pages(pasid_table->table);
> @@ -160,7 +160,7 @@ static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
>  		tmp = 0ULL;
>  		if (!try_cmpxchg64(&dir[dir_index].val, &tmp,
>  				   (u64)virt_to_phys(entries) | PASID_PTE_PRESENT)) {
> -			iommu_free_page(entries);
> +			iommu_free_pages(entries);
>  			goto retry;
>  		}
>  		if (!ecap_coherent(info->iommu->ecap)) {
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index 88587da1782b94..fcd17b94f7b830 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -122,15 +122,6 @@ static inline void iommu_free_pages(void *virt)
>  	put_page(page);
>  }
>  
> -/**
> - * iommu_free_page - free page
> - * @virt: virtual address of the page to be freed.
> - */
> -static inline void iommu_free_page(void *virt)
> -{
> -	iommu_free_pages(virt);
> -}
> -
>  /**
>   * iommu_put_pages_list - free a list of pages.
>   * @page: the head of the lru list to be freed.
> diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> index 1868468d018a28..4fe07343d84e61 100644
> --- a/drivers/iommu/riscv/iommu.c
> +++ b/drivers/iommu/riscv/iommu.c
> @@ -1105,7 +1105,7 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain,
>  	if (freelist)
>  		list_add_tail(&virt_to_page(ptr)->lru, freelist);
>  	else
> -		iommu_free_page(ptr);
> +		iommu_free_pages(ptr);
>  }
>  
>  static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain,
> @@ -1148,7 +1148,7 @@ static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain,
>  			old = pte;
>  			pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE);
>  			if (cmpxchg_relaxed(ptr, old, pte) != old) {
> -				iommu_free_page(addr);
> +				iommu_free_pages(addr);
>  				goto pte_retry;
>  			}
>  		}
> @@ -1393,7 +1393,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev)
>  	domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1,
>  					RISCV_IOMMU_MAX_PSCID, GFP_KERNEL);
>  	if (domain->pscid < 0) {
> -		iommu_free_page(domain->pgd_root);
> +		iommu_free_pages(domain->pgd_root);
>  		kfree(domain);
>  		return ERR_PTR(-ENOMEM);
>  	}
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 323cc665c35703..798e85bd994d56 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -737,7 +737,7 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
>  	pt_dma = dma_map_single(dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE);
>  	if (dma_mapping_error(dma_dev, pt_dma)) {
>  		dev_err(dma_dev, "DMA mapping error while allocating page table\n");
> -		iommu_free_page(page_table);
> +		iommu_free_pages(page_table);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> @@ -1086,7 +1086,7 @@ static struct iommu_domain *rk_iommu_domain_alloc_paging(struct device *dev)
>  	return &rk_domain->domain;
>  
>  err_free_dt:
> -	iommu_free_page(rk_domain->dt);
> +	iommu_free_pages(rk_domain->dt);
>  err_free_domain:
>  	kfree(rk_domain);
>  
> @@ -1107,13 +1107,13 @@ static void rk_iommu_domain_free(struct iommu_domain *domain)
>  			u32 *page_table = phys_to_virt(pt_phys);
>  			dma_unmap_single(dma_dev, pt_phys,
>  					 SPAGE_SIZE, DMA_TO_DEVICE);
> -			iommu_free_page(page_table);
> +			iommu_free_pages(page_table);
>  		}
>  	}
>  
>  	dma_unmap_single(dma_dev, rk_domain->dt_dma,
>  			 SPAGE_SIZE, DMA_TO_DEVICE);
> -	iommu_free_page(rk_domain->dt);
> +	iommu_free_pages(rk_domain->dt);
>  
>  	kfree(rk_domain);
>  }
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index c134647292fb22..844682a41afa66 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -303,7 +303,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
>  
>  	as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
>  	if (!as->count) {
> -		iommu_free_page(as->pd);
> +		iommu_free_pages(as->pd);
>  		kfree(as);
>  		return NULL;
>  	}
> @@ -311,7 +311,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
>  	as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
>  	if (!as->pts) {
>  		kfree(as->count);
> -		iommu_free_page(as->pd);
> +		iommu_free_pages(as->pd);
>  		kfree(as);
>  		return NULL;
>  	}
> @@ -608,14 +608,14 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova,
>  		dma = dma_map_single(smmu->dev, pt, SMMU_SIZE_PT,
>  				     DMA_TO_DEVICE);
>  		if (dma_mapping_error(smmu->dev, dma)) {
> -			iommu_free_page(pt);
> +			iommu_free_pages(pt);
>  			return NULL;
>  		}
>  
>  		if (!smmu_dma_addr_valid(smmu, dma)) {
>  			dma_unmap_single(smmu->dev, dma, SMMU_SIZE_PT,
>  					 DMA_TO_DEVICE);
> -			iommu_free_page(pt);
> +			iommu_free_pages(pt);
>  			return NULL;
>  		}
>  
> @@ -656,7 +656,7 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
>  
>  		dma_unmap_single(smmu->dev, pte_dma, SMMU_SIZE_PT,
>  				 DMA_TO_DEVICE);
> -		iommu_free_page(pt);
> +		iommu_free_pages(pt);
>  		as->pts[pde] = NULL;
>  	}
>  }
> @@ -707,7 +707,7 @@ static struct tegra_pt *as_get_pde_page(struct tegra_smmu_as *as,
>  	 */
>  	if (as->pts[pde]) {
>  		if (pt)
> -			iommu_free_page(pt);
> +			iommu_free_pages(pt);
>  
>  		pt = as->pts[pde];
>  	}
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 07/23] iommu/pages: De-inline the substantial functions
  2025-02-25 19:39 ` [PATCH v3 07/23] iommu/pages: De-inline the substantial functions Jason Gunthorpe
  2025-02-26  6:43   ` Baolu Lu
@ 2025-03-12 12:45   ` Mostafa Saleh
  1 sibling, 0 replies; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-12 12:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:24PM -0400, Jason Gunthorpe wrote:
> These are called in a lot of places and are not trivial. Move them to the
> core module.
> 
> Tidy some of the comments and function arguments, fold
> __iommu_alloc_account() into its only caller, change
> __iommu_free_account() into __iommu_free_page() to remove some
> duplication.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>

> ---
>  drivers/iommu/Makefile      |   1 +
>  drivers/iommu/iommu-pages.c |  84 +++++++++++++++++++++++++++++
>  drivers/iommu/iommu-pages.h | 103 ++----------------------------------
>  3 files changed, 90 insertions(+), 98 deletions(-)
>  create mode 100644 drivers/iommu/iommu-pages.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 5e5a83c6c2aae2..fe91d770abe16c 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y += amd/ intel/ arm/ iommufd/ riscv/
>  obj-$(CONFIG_IOMMU_API) += iommu.o
> +obj-$(CONFIG_IOMMU_SUPPORT) += iommu-pages.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
>  obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
> diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
> new file mode 100644
> index 00000000000000..31ff83ffaf0106
> --- /dev/null
> +++ b/drivers/iommu/iommu-pages.c
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +#include "iommu-pages.h"
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +
> +/**
> + * iommu_alloc_pages_node - Allocate a zeroed page of a given order from
> + *                          specific NUMA node
> + * @nid: memory NUMA node id
> + * @gfp: buddy allocator flags
> + * @order: page order
> + *
> + * Returns the virtual address of the allocated page. The page must be
> + * freed either by calling iommu_free_pages() or via iommu_put_pages_list().
> + */
> +void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
> +{
> +	const unsigned long pgcnt = 1UL << order;
> +	struct page *page;
> +
> +	page = alloc_pages_node(nid, gfp | __GFP_ZERO | __GFP_COMP, order);
> +	if (unlikely(!page))
> +		return NULL;
> +
> +	/*
> +	 * All page allocations that should be reported to as "iommu-pagetables"
> +	 * to userspace must use one of the functions below. This includes
> +	 * allocations of page-tables and other per-iommu_domain configuration
> +	 * structures.
> +	 *
> +	 * This is necessary for the proper accounting as IOMMU state can be
> +	 * rather large, i.e. multiple gigabytes in size.
> +	 */
> +	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, pgcnt);
> +	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, pgcnt);
> +
> +	return page_address(page);
> +}
> +EXPORT_SYMBOL_GPL(iommu_alloc_pages_node);
> +
> +static void __iommu_free_page(struct page *page)
> +{
> +	unsigned int order = folio_order(page_folio(page));
> +	const unsigned long pgcnt = 1UL << order;
> +
> +	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt);
> +	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
> +	put_page(page);
> +}
> +
> +/**
> + * iommu_free_pages - free pages
> + * @virt: virtual address of the page to be freed.
> + *
> + * The page must have have been allocated by iommu_alloc_pages_node()
> + */
> +void iommu_free_pages(void *virt)
> +{
> +	if (!virt)
> +		return;
> +	__iommu_free_page(virt_to_page(virt));
> +}
> +EXPORT_SYMBOL_GPL(iommu_free_pages);
> +
> +/**
> + * iommu_put_pages_list - free a list of pages.
> + * @head: the head of the lru list to be freed.
> + *
> + * Frees a list of pages allocated by iommu_alloc_pages_node().
> + */
> +void iommu_put_pages_list(struct list_head *head)
> +{
> +	while (!list_empty(head)) {
> +		struct page *p = list_entry(head->prev, struct page, lru);
> +
> +		list_del(&p->lru);
> +		__iommu_free_page(p);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(iommu_put_pages_list);
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index fcd17b94f7b830..e3c35aa14ad716 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -7,67 +7,12 @@
>  #ifndef __IOMMU_PAGES_H
>  #define __IOMMU_PAGES_H
>  
> -#include <linux/vmstat.h>
> -#include <linux/gfp.h>
> -#include <linux/mm.h>
> +#include <linux/types.h>
> +#include <linux/topology.h>
>  
> -/*
> - * All page allocations that should be reported to as "iommu-pagetables" to
> - * userspace must use one of the functions below.  This includes allocations of
> - * page-tables and other per-iommu_domain configuration structures.
> - *
> - * This is necessary for the proper accounting as IOMMU state can be rather
> - * large, i.e. multiple gigabytes in size.
> - */
> -
> -/**
> - * __iommu_alloc_account - account for newly allocated page.
> - * @page: head struct page of the page.
> - * @order: order of the page
> - */
> -static inline void __iommu_alloc_account(struct page *page, int order)
> -{
> -	const long pgcnt = 1l << order;
> -
> -	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, pgcnt);
> -	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, pgcnt);
> -}
> -
> -/**
> - * __iommu_free_account - account a page that is about to be freed.
> - * @page: head struct page of the page.
> - * @order: order of the page
> - */
> -static inline void __iommu_free_account(struct page *page)
> -{
> -	unsigned int order = folio_order(page_folio(page));
> -	const long pgcnt = 1l << order;
> -
> -	mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt);
> -	mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt);
> -}
> -
> -/**
> - * iommu_alloc_pages_node - allocate a zeroed page of a given order from
> - * specific NUMA node.
> - * @nid: memory NUMA node id
> - * @gfp: buddy allocator flags
> - * @order: page order
> - *
> - * returns the virtual address of the allocated page
> - */
> -static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
> -{
> -	struct page *page =
> -		alloc_pages_node(nid, gfp | __GFP_ZERO | __GFP_COMP, order);
> -
> -	if (unlikely(!page))
> -		return NULL;
> -
> -	__iommu_alloc_account(page, order);
> -
> -	return page_address(page);
> -}
> +void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order);
> +void iommu_free_pages(void *virt);
> +void iommu_put_pages_list(struct list_head *head);
>  
>  /**
>   * iommu_alloc_pages - allocate a zeroed page of a given order
> @@ -104,42 +49,4 @@ static inline void *iommu_alloc_page(gfp_t gfp)
>  	return iommu_alloc_pages_node(numa_node_id(), gfp, 0);
>  }
>  
> -/**
> - * iommu_free_pages - free pages
> - * @virt: virtual address of the page to be freed.
> - *
> - * The page must have have been allocated by iommu_alloc_pages_node()
> - */
> -static inline void iommu_free_pages(void *virt)
> -{
> -	struct page *page;
> -
> -	if (!virt)
> -		return;
> -
> -	page = virt_to_page(virt);
> -	__iommu_free_account(page);
> -	put_page(page);
> -}
> -
> -/**
> - * iommu_put_pages_list - free a list of pages.
> - * @page: the head of the lru list to be freed.
> - *
> - * There are no locking requirement for these pages, as they are going to be
> - * put on a free list as soon as refcount reaches 0. Pages are put on this LRU
> - * list once they are removed from the IOMMU page tables. However, they can
> - * still be access through debugfs.
> - */
> -static inline void iommu_put_pages_list(struct list_head *page)
> -{
> -	while (!list_empty(page)) {
> -		struct page *p = list_entry(page->prev, struct page, lru);
> -
> -		list_del(&p->lru);
> -		__iommu_free_account(p);
> -		put_page(p);
> -	}
> -}
> -
>  #endif	/* __IOMMU_PAGES_H */
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/23] iommu/pages: Move the __GFP_HIGHMEM checks into the common code
  2025-02-25 19:39 ` [PATCH v3 15/23] iommu/pages: Move the __GFP_HIGHMEM checks into the common code Jason Gunthorpe
@ 2025-03-12 12:45   ` Mostafa Saleh
  0 siblings, 0 replies; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-12 12:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:32PM -0400, Jason Gunthorpe wrote:
> The entire allocator API is built around using the kernel virtual address,
> it is illegal to pass GFP_HIGHMEM in as a GFP flag. Block it in the common
> code. Remove the duplicated checks from drivers.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
> ---
>  drivers/iommu/io-pgtable-arm.c  | 2 --
>  drivers/iommu/io-pgtable-dart.c | 1 -
>  drivers/iommu/iommu-pages.c     | 4 ++++
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 62df2528d020b2..08d0f62abe8a09 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -267,8 +267,6 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>  	dma_addr_t dma;
>  	void *pages;
>  
> -	VM_BUG_ON((gfp & __GFP_HIGHMEM));
> -
>  	if (cfg->alloc)
>  		pages = cfg->alloc(cookie, size, gfp);
>  	else
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index 7efcaea0bd5c86..ebf330e67bfa30 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -111,7 +111,6 @@ static void *__dart_alloc_pages(size_t size, gfp_t gfp)
>  {
>  	int order = get_order(size);
>  
> -	VM_BUG_ON((gfp & __GFP_HIGHMEM));
>  	return iommu_alloc_pages(gfp, order);
>  }
>  
> diff --git a/drivers/iommu/iommu-pages.c b/drivers/iommu/iommu-pages.c
> index 3077df642adb1f..a7eed09420a231 100644
> --- a/drivers/iommu/iommu-pages.c
> +++ b/drivers/iommu/iommu-pages.c
> @@ -37,6 +37,10 @@ void *iommu_alloc_pages_node(int nid, gfp_t gfp, unsigned int order)
>  	const unsigned long pgcnt = 1UL << order;
>  	struct folio *folio;
>  
> +	/* This uses page_address() on the memory. */
> +	if (WARN_ON(gfp & __GFP_HIGHMEM))
> +		return NULL;
> +
>  	/*
>  	 * __folio_alloc_node() does not handle NUMA_NO_NODE like
>  	 * alloc_pages_node() did.
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
  2025-02-25 19:39 ` [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages Jason Gunthorpe
  2025-02-26 12:24   ` Baolu Lu
@ 2025-03-12 12:59   ` Mostafa Saleh
  2025-03-17 13:35     ` Jason Gunthorpe
  1 sibling, 1 reply; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-12 12:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Tue, Feb 25, 2025 at 03:39:37PM -0400, Jason Gunthorpe wrote:
> Convert most of the places calling get_order() as an argument to the
> iommu-pages allocator into order_base_2() or the _sz flavour
> instead. These places already have an exact size, there is no particular
> reason to use order here.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/amd/init.c        | 29 +++++++++++++++--------------
>  drivers/iommu/intel/dmar.c      |  6 +++---
>  drivers/iommu/io-pgtable-arm.c  |  3 +--
>  drivers/iommu/io-pgtable-dart.c | 12 +++---------
>  drivers/iommu/sun50i-iommu.c    |  4 ++--
>  5 files changed, 24 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index e3f4283ebbc201..a5720df7b22397 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -635,8 +635,8 @@ static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 pci_
>  /* Allocate per PCI segment device table */
>  static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
>  {
> -	pci_seg->dev_table = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
> -					       get_order(pci_seg->dev_table_size));
> +	pci_seg->dev_table = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
> +						  pci_seg->dev_table_size);
>  	if (!pci_seg->dev_table)
>  		return -ENOMEM;
>  
> @@ -716,8 +716,7 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>   */
>  static int __init alloc_command_buffer(struct amd_iommu *iommu)
>  {
> -	iommu->cmd_buf = iommu_alloc_pages(GFP_KERNEL,
> -					   get_order(CMD_BUFFER_SIZE));
> +	iommu->cmd_buf = iommu_alloc_pages_sz(GFP_KERNEL, CMD_BUFFER_SIZE);
>  
>  	return iommu->cmd_buf ? 0 : -ENOMEM;
>  }
> @@ -820,14 +819,16 @@ static void __init free_command_buffer(struct amd_iommu *iommu)
>  void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
>  				  size_t size)
>  {
> -	int order = get_order(size);
> -	void *buf = iommu_alloc_pages(gfp, order);
> +	void *buf;
>  
> -	if (buf &&
> -	    check_feature(FEATURE_SNP) &&
> -	    set_memory_4k((unsigned long)buf, (1 << order))) {
> +	size = PAGE_ALIGN(size);
> +	buf = iommu_alloc_pages_sz(gfp, size);
> +	if (!buf)
> +		return NULL;
> +	if (check_feature(FEATURE_SNP) &&
> +	    set_memory_4k((unsigned long)buf, size / PAGE_SIZE)) {
>  		iommu_free_pages(buf);
> -		buf = NULL;
> +		return NULL;
>  	}
>  
>  	return buf;
> @@ -922,11 +923,11 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>  	if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
>  		return 0;
>  
> -	iommu->ga_log = iommu_alloc_pages(GFP_KERNEL, get_order(GA_LOG_SIZE));
> +	iommu->ga_log = iommu_alloc_pages_sz(GFP_KERNEL, GA_LOG_SIZE);
>  	if (!iommu->ga_log)
>  		goto err_out;
>  
> -	iommu->ga_log_tail = iommu_alloc_pages(GFP_KERNEL, get_order(8));
> +	iommu->ga_log_tail = iommu_alloc_pages_sz(GFP_KERNEL, 8);
>  	if (!iommu->ga_log_tail)
>  		goto err_out;
>  
> @@ -1021,8 +1022,8 @@ static bool __copy_device_table(struct amd_iommu *iommu)
>  	if (!old_devtb)
>  		return false;
>  
> -	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
> -						     get_order(pci_seg->dev_table_size));
> +	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages_sz(
> +		GFP_KERNEL | GFP_DMA32, pci_seg->dev_table_size);
>  	if (pci_seg->old_dev_tbl_cpy == NULL) {
>  		pr_err("Failed to allocate memory for copying old device table!\n");
>  		memunmap(old_devtb);
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index c812c83d77da10..4c7ce92acf6976 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1681,7 +1681,6 @@ int dmar_enable_qi(struct intel_iommu *iommu)
>  {
>  	struct q_inval *qi;
>  	void *desc;
> -	int order;
>  
>  	if (!ecap_qis(iommu->ecap))
>  		return -ENOENT;
> @@ -1702,8 +1701,9 @@ int dmar_enable_qi(struct intel_iommu *iommu)
>  	 * Need two pages to accommodate 256 descriptors of 256 bits each
>  	 * if the remapping hardware supports scalable mode translation.
>  	 */
> -	order = ecap_smts(iommu->ecap) ? 1 : 0;
> -	desc = iommu_alloc_pages_node(iommu->node, GFP_ATOMIC, order);
> +	desc = iommu_alloc_pages_node_sz(iommu->node, GFP_ATOMIC,
> +					 ecap_smts(iommu->ecap) ? SZ_8K :
> +								  SZ_4K);
>  	if (!desc) {
>  		kfree(qi);
>  		iommu->qi = NULL;
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 08d0f62abe8a09..d13149ec5be77e 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -263,14 +263,13 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>  				    void *cookie)
>  {
>  	struct device *dev = cfg->iommu_dev;
> -	int order = get_order(size);
>  	dma_addr_t dma;
>  	void *pages;
>  
>  	if (cfg->alloc)
>  		pages = cfg->alloc(cookie, size, gfp);
>  	else
> -		pages = iommu_alloc_pages_node(dev_to_node(dev), gfp, order);
> +		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, size);

Although, the current implementation of iommu_alloc_pages_node_sz() would round
the size to order, but this is not correct according to the API definition
"The returned allocation is round_up_pow_two(size) big, and is physically aligned
to its size."

SMMUv3 has special alignment with small number of entries at the start level,
according the manual:
	A 64-byte minimum alignment on starting-level translation table addresses
	is imposed when TG0 selects 64KB granules and the effective IPS value
	indicates 52-bit output. In this case bits [5:0] are treated as zero.

And according to the Arm Arm (ex D24.2.195 in Version L)
	- Bits A[(x-1):0] of the stage 1 translation table base address are zero.
	... The smallest permitted value of x is 5.
Which 32 bytes

For a case as (which is valid in Linux)
- S1 with IAS 40-bits and 4K, start level has 2 entries (16 bytes) but alignment
  must be at least 32 bytes.

- Similarly with 16K and 48 bits.

I'd say we can align the size or use min with 64 bytes before calling the
function would be enough (or change the API to state that allocations
are rounded to order)

Thanks,
Mostafa

>  
>  	if (!pages)
>  		return NULL;
> diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
> index ebf330e67bfa30..a0988669bb951a 100644
> --- a/drivers/iommu/io-pgtable-dart.c
> +++ b/drivers/iommu/io-pgtable-dart.c
> @@ -107,13 +107,6 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
>  	return paddr;
>  }
>  
> -static void *__dart_alloc_pages(size_t size, gfp_t gfp)
> -{
> -	int order = get_order(size);
> -
> -	return iommu_alloc_pages(gfp, order);
> -}
> -
>  static int dart_init_pte(struct dart_io_pgtable *data,
>  			     unsigned long iova, phys_addr_t paddr,
>  			     dart_iopte prot, int num_entries,
> @@ -255,7 +248,7 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
>  
>  	/* no L2 table present */
>  	if (!pte) {
> -		cptep = __dart_alloc_pages(tblsz, gfp);
> +		cptep = iommu_alloc_pages_sz(gfp, tblsz);
>  		if (!cptep)
>  			return -ENOMEM;
>  
> @@ -412,7 +405,8 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>  	cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits;
>  
>  	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) {
> -		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL);
> +		data->pgd[i] =
> +			iommu_alloc_pages_sz(GFP_KERNEL, DART_GRANULE(data));
>  		if (!data->pgd[i])
>  			goto out_free_data;
>  		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]);
> diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
> index 6385560dbc3fb0..76c9620af4bba8 100644
> --- a/drivers/iommu/sun50i-iommu.c
> +++ b/drivers/iommu/sun50i-iommu.c
> @@ -690,8 +690,8 @@ sun50i_iommu_domain_alloc_paging(struct device *dev)
>  	if (!sun50i_domain)
>  		return NULL;
>  
> -	sun50i_domain->dt = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
> -					      get_order(DT_SIZE));
> +	sun50i_domain->dt =
> +		iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32, DT_SIZE);
>  	if (!sun50i_domain->dt)
>  		goto err_free_domain;
>  
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
  2025-03-12 12:59   ` Mostafa Saleh
@ 2025-03-17 13:35     ` Jason Gunthorpe
  2025-03-18 10:46       ` Mostafa Saleh
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2025-03-17 13:35 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Wed, Mar 12, 2025 at 12:59:00PM +0000, Mostafa Saleh wrote:
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -263,14 +263,13 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> >  				    void *cookie)
> >  {
> >  	struct device *dev = cfg->iommu_dev;
> > -	int order = get_order(size);
> >  	dma_addr_t dma;
> >  	void *pages;
> >  
> >  	if (cfg->alloc)
> >  		pages = cfg->alloc(cookie, size, gfp);
> >  	else
> > -		pages = iommu_alloc_pages_node(dev_to_node(dev), gfp, order);
> > +		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, size);
> 
> Although, the current implementation of iommu_alloc_pages_node_sz() would round
> the size to order, but this is not correct according to the API definition
> "The returned allocation is round_up_pow_two(size) big, and is physically aligned
> to its size."

Yes.. The current implementation is limited to full PAGE_SIZE only,
the documentation imagines a future where it is not. Drivers should
ideally not assume the PAGE_SIZE limit during this conversion.

> I'd say we can align the size or use min with 64 bytes before calling the
> function would be enough (or change the API to state that allocations
> are rounded to order)

OK, like this:

	if (cfg->alloc) {
		pages = cfg->alloc(cookie, size, gfp);
	} else {
		/*
		 * For very small starting-level translation tables the HW
		 * requires a minimum alignment of at least 64 to cover all
		 * cases.
		 */
		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp,
						  max(size, 64));
	}

Thanks,
Jason


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
  2025-03-17 13:35     ` Jason Gunthorpe
@ 2025-03-18 10:46       ` Mostafa Saleh
  2025-03-18 10:57         ` Robin Murphy
  0 siblings, 1 reply; 55+ messages in thread
From: Mostafa Saleh @ 2025-03-18 10:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Robin Murphy, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On Mon, Mar 17, 2025 at 10:35:00AM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 12, 2025 at 12:59:00PM +0000, Mostafa Saleh wrote:
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -263,14 +263,13 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> > >  				    void *cookie)
> > >  {
> > >  	struct device *dev = cfg->iommu_dev;
> > > -	int order = get_order(size);
> > >  	dma_addr_t dma;
> > >  	void *pages;
> > >  
> > >  	if (cfg->alloc)
> > >  		pages = cfg->alloc(cookie, size, gfp);
> > >  	else
> > > -		pages = iommu_alloc_pages_node(dev_to_node(dev), gfp, order);
> > > +		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, size);
> > 
> > Although, the current implementation of iommu_alloc_pages_node_sz() would round
> > the size to order, but this is not correct according to the API definition
> > "The returned allocation is round_up_pow_two(size) big, and is physically aligned
> > to its size."
> 
> Yes.. The current implementation is limited to full PAGE_SIZE only,
> the documentation imagines a future where it is not. Drivers should
> ideally not assume the PAGE_SIZE limit during this conversion.
> 
> > I'd say we can align the size or use min with 64 bytes before calling the
> > function would be enough (or change the API to state that allocations
> > are rounded to order)
> 
> OK, like this:
> 
> 	if (cfg->alloc) {
> 		pages = cfg->alloc(cookie, size, gfp);
> 	} else {
> 		/*
> 		 * For very small starting-level translation tables the HW
> 		 * requires a minimum alignment of at least 64 to cover all
> 		 * cases.
> 		 */
> 		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp,
> 						  max(size, 64));
> 	}

Yes, that looks good.

Thanks,
Mostafa

> 
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages
  2025-03-18 10:46       ` Mostafa Saleh
@ 2025-03-18 10:57         ` Robin Murphy
  0 siblings, 0 replies; 55+ messages in thread
From: Robin Murphy @ 2025-03-18 10:57 UTC (permalink / raw)
  To: Mostafa Saleh, Jason Gunthorpe
  Cc: Alim Akhtar, Alyssa Rosenzweig, Albert Ou, asahi, Lu Baolu,
	David Woodhouse, Heiko Stuebner, iommu, Jernej Skrabec,
	Jonathan Hunter, Joerg Roedel, Krzysztof Kozlowski,
	linux-arm-kernel, linux-riscv, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, Marek Szyprowski, Hector Martin,
	Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Suravee Suthikulpanit, Sven Peter, Thierry Reding, Tomasz Jeznach,
	Krishna Reddy, Chen-Yu Tsai, Will Deacon, Bagas Sanjaya,
	Joerg Roedel, Pasha Tatashin, patches, David Rientjes,
	Matthew Wilcox

On 2025-03-18 10:46 am, Mostafa Saleh wrote:
> On Mon, Mar 17, 2025 at 10:35:00AM -0300, Jason Gunthorpe wrote:
>> On Wed, Mar 12, 2025 at 12:59:00PM +0000, Mostafa Saleh wrote:
>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>> @@ -263,14 +263,13 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>>>>   				    void *cookie)
>>>>   {
>>>>   	struct device *dev = cfg->iommu_dev;
>>>> -	int order = get_order(size);
>>>>   	dma_addr_t dma;
>>>>   	void *pages;
>>>>   
>>>>   	if (cfg->alloc)
>>>>   		pages = cfg->alloc(cookie, size, gfp);
>>>>   	else
>>>> -		pages = iommu_alloc_pages_node(dev_to_node(dev), gfp, order);
>>>> +		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp, size);
>>>
>>> Although, the current implementation of iommu_alloc_pages_node_sz() would round
>>> the size to order, but this is not correct according to the API definition
>>> "The returned allocation is round_up_pow_two(size) big, and is physically aligned
>>> to its size."
>>
>> Yes.. The current implementation is limited to full PAGE_SIZE only,
>> the documentation imagines a future where it is not. Drivers should
>> ideally not assume the PAGE_SIZE limit during this conversion.
>>
>>> I'd say we can align the size or use min with 64 bytes before calling the
>>> function would be enough (or change the API to state that allocations
>>> are rounded to order)
>>
>> OK, like this:
>>
>> 	if (cfg->alloc) {
>> 		pages = cfg->alloc(cookie, size, gfp);
>> 	} else {
>> 		/*
>> 		 * For very small starting-level translation tables the HW
>> 		 * requires a minimum alignment of at least 64 to cover all
>> 		 * cases.
>> 		 */
>> 		pages = iommu_alloc_pages_node_sz(dev_to_node(dev), gfp,
>> 						  max(size, 64));
>> 	}
> 
> Yes, that looks good.

Although for completeness it really wants to cover both paths, so an 
unconditional "size = max(size, 64);" further up would be even better.

Thanks,
Robin.


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-03-18 10:59 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25 19:39 [PATCH v3 00/23] iommu: Further abstract iommu-pages Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 01/23] iommu/terga: Do not use struct page as the handle for as->pd memory Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 02/23] iommu/tegra: Do not use struct page as the handle for pts Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 03/23] iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages() Jason Gunthorpe
2025-02-26  6:25   ` Baolu Lu
2025-03-12 11:43   ` Mostafa Saleh
2025-02-25 19:39 ` [PATCH v3 04/23] iommu/pages: Make iommu_put_pages_list() work with high order allocations Jason Gunthorpe
2025-02-26  6:28   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 05/23] iommu/pages: Remove the order argument to iommu_free_pages() Jason Gunthorpe
2025-02-26  6:32   ` Baolu Lu
2025-03-12 11:43   ` Mostafa Saleh
2025-02-25 19:39 ` [PATCH v3 06/23] iommu/pages: Remove iommu_free_page() Jason Gunthorpe
2025-02-26  6:34   ` Baolu Lu
2025-03-12 11:44   ` Mostafa Saleh
2025-02-25 19:39 ` [PATCH v3 07/23] iommu/pages: De-inline the substantial functions Jason Gunthorpe
2025-02-26  6:43   ` Baolu Lu
2025-03-12 12:45   ` Mostafa Saleh
2025-02-25 19:39 ` [PATCH v3 08/23] iommu/vtd: Use virt_to_phys() Jason Gunthorpe
2025-03-10  2:21   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 09/23] iommu/pages: Formalize the freelist API Jason Gunthorpe
2025-02-26  6:56   ` Baolu Lu
2025-02-26 17:31     ` Jason Gunthorpe
2025-02-27  5:11       ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 10/23] iommu/riscv: Convert to use struct iommu_pages_list Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 11/23] iommu/amd: " Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 12/23] iommu: Change iommu_iotlb_gather to use iommu_page_list Jason Gunthorpe
2025-02-26  7:02   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 13/23] iommu/pages: Remove iommu_put_pages_list_old and the _Generic Jason Gunthorpe
2025-02-26  7:04   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 14/23] iommu/pages: Move from struct page to struct ioptdesc and folio Jason Gunthorpe
2025-02-26 12:42   ` Baolu Lu
2025-02-26 13:51     ` Jason Gunthorpe
2025-02-27  5:17       ` Baolu Lu
2025-02-27  5:17   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 15/23] iommu/pages: Move the __GFP_HIGHMEM checks into the common code Jason Gunthorpe
2025-03-12 12:45   ` Mostafa Saleh
2025-02-25 19:39 ` [PATCH v3 16/23] iommu/pages: Allow sub page sizes to be passed into the allocator Jason Gunthorpe
2025-02-26 12:22   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 17/23] iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc() Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 18/23] iommu/amd: Use roundup_pow_two() instead of get_order() Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 19/23] iommu/riscv: Update to use iommu_alloc_pages_node_lg2() Jason Gunthorpe
2025-02-25 19:39 ` [PATCH v3 20/23] iommu: Update various drivers to pass in lg2sz instead of order to iommu pages Jason Gunthorpe
2025-02-26 12:24   ` Baolu Lu
2025-03-12 12:59   ` Mostafa Saleh
2025-03-17 13:35     ` Jason Gunthorpe
2025-03-18 10:46       ` Mostafa Saleh
2025-03-18 10:57         ` Robin Murphy
2025-02-25 19:39 ` [PATCH v3 21/23] iommu/pages: Remove iommu_alloc_page/pages() Jason Gunthorpe
2025-02-26  9:15   ` Marek Szyprowski
2025-02-25 19:39 ` [PATCH v3 22/23] iommu/pages: Remove iommu_alloc_page_node() Jason Gunthorpe
2025-02-26 12:26   ` Baolu Lu
2025-02-25 19:39 ` [PATCH v3 23/23] iommu/pages: Remove iommu_alloc_pages_node() Jason Gunthorpe
2025-02-26 12:30   ` Baolu Lu
2025-02-25 20:18 ` [PATCH v3 00/23] iommu: Further abstract iommu-pages Nicolin Chen
2025-02-25 23:17 ` Alejandro Jimenez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).