public inbox for iommu@lists.linux-foundation.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Let iommupt manage changes in page size internally
@ 2026-01-12 14:49 Jason Gunthorpe
  2026-01-12 14:49 ` [PATCH 1/3] iommupt: Make pt_feature() always_inline Jason Gunthorpe
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2026-01-12 14:49 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, Kevin Tian, kernel test robot,
	Pasha Tatashin, patches, Samiullah Khawaja

Currently the core code has some helpers that use iommu_pgsize() to
fragment operations into single page-size chunks and then the driver has a
simplified single-page size implementation. This was helpful in
simplifying the driver code.

However, iommupt has a single shared implementation for all formats so we
can accept a little more complexity. Have the core code directly call
iommupt with the requested range to map/unmap and rely on it to change the
page size across the range as required.

The iommupt implementation of unmap is already fine to work like this, and
the map implementation can reset its walking paramters in-place with a
little more code.

The net result is about a 5% performance bump in the simple iommupt
map/unmap benchmarks of mapped alignment, and probably more for
unaligned/oddly sized ranges that are changing page sizes.

Introduce a iommupt_from_domain() function as a general way to convert
an iommu_domain to a struct pt_iommu if it is a iommupt based domain. I
expect to keep using this as more optimizations are introduced.

Jason Gunthorpe (3):
  iommupt: Make pt_feature() always_inline
  iommupt: Directly call iommupt's unmap_range()
  iommupt: Avoid rewalking during map

 drivers/iommu/generic_pt/iommu_pt.h         | 156 ++++++++++----------
 drivers/iommu/generic_pt/kunit_generic_pt.h |  12 ++
 drivers/iommu/generic_pt/pt_defs.h          |   4 +-
 drivers/iommu/generic_pt/pt_iter.h          |  22 +++
 drivers/iommu/iommu.c                       |  43 +++++-
 include/linux/generic_pt/iommu.h            |  69 +++++++--
 include/linux/iommu.h                       |   1 +
 7 files changed, 213 insertions(+), 94 deletions(-)


base-commit: 0816b0730a71ba553b2000e7ecd6429ee61d9c2c
-- 
2.43.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] iommupt: Make pt_feature() always_inline
  2026-01-12 14:49 [PATCH 0/3] Let iommupt manage changes in page size internally Jason Gunthorpe
@ 2026-01-12 14:49 ` Jason Gunthorpe
  2026-01-14  8:15   ` Tian, Kevin
  2026-01-12 14:49 ` [PATCH 2/3] iommupt: Directly call iommupt's unmap_range() Jason Gunthorpe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2026-01-12 14:49 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, Kevin Tian, kernel test robot,
	Pasha Tatashin, patches, Samiullah Khawaja

gcc 8.5 on powerpc does not automatically inline these functions even
though they evaluate to constants in key cases. Since the constant
propagation is essential for some code elimination and built-time checks
this causes a build failure:

 ERROR: modpost: "__pt_no_sw_bit" [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined!

Caused by this:

	if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) &&
	    !pt_test_sw_bit_acquire(&pts,
				    SW_BIT_CACHE_FLUSH_DONE))
		flush_writes_item(&pts);

Where pts_feature() evaluates to a constant false. Mark them as
__always_inline to force it to evaluate to a constant and trigger the code
elimination.

Fixes: 7c5b184db714 ("genpt: Generic Page Table base API")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-lkp@intel.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/generic_pt/pt_defs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/generic_pt/pt_defs.h b/drivers/iommu/generic_pt/pt_defs.h
index c25544d72f979a..707b3b0282fad7 100644
--- a/drivers/iommu/generic_pt/pt_defs.h
+++ b/drivers/iommu/generic_pt/pt_defs.h
@@ -202,7 +202,7 @@ static inline bool pt_table_install32(struct pt_state *pts, u32 table_entry)
 
 #define PT_SUPPORTED_FEATURE(feature_nr) (PT_SUPPORTED_FEATURES & BIT(feature_nr))
 
-static inline bool pt_feature(const struct pt_common *common,
+static __always_inline bool pt_feature(const struct pt_common *common,
 			      unsigned int feature_nr)
 {
 	if (PT_FORCE_ENABLED_FEATURES & BIT(feature_nr))
@@ -212,7 +212,7 @@ static inline bool pt_feature(const struct pt_common *common,
 	return common->features & BIT(feature_nr);
 }
 
-static inline bool pts_feature(const struct pt_state *pts,
+static __always_inline bool pts_feature(const struct pt_state *pts,
 			       unsigned int feature_nr)
 {
 	return pt_feature(pts->range->common, feature_nr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] iommupt: Directly call iommupt's unmap_range()
  2026-01-12 14:49 [PATCH 0/3] Let iommupt manage changes in page size internally Jason Gunthorpe
  2026-01-12 14:49 ` [PATCH 1/3] iommupt: Make pt_feature() always_inline Jason Gunthorpe
@ 2026-01-12 14:49 ` Jason Gunthorpe
  2026-01-15  6:17   ` Tian, Kevin
  2026-01-15 18:26   ` Samiullah Khawaja
  2026-01-12 14:49 ` [PATCH 3/3] iommupt: Avoid rewalking during map Jason Gunthorpe
  2026-01-18  9:47 ` [PATCH 0/3] Let iommupt manage changes in page size internally Joerg Roedel
  3 siblings, 2 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2026-01-12 14:49 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, Kevin Tian, kernel test robot,
	Pasha Tatashin, patches, Samiullah Khawaja

The common algorithm in iommupt does not require the iommu_pgsize()
calculations, it can directly unmap any arbitrary range. Add a new function
pointer to directly call an iommupt unmap_range op and make
__iommu_unmap() call it directly.

Gives about a 5% gain on single page unmappings.

The function pointer is run through pt_iommu_ops instead of
iommu_domain_ops to discourage using it outside iommupt. All drivers with
their own page tables should continue to use the simplified
map/unmap_pages() style interfaces.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/generic_pt/iommu_pt.h | 29 ++++------------------
 drivers/iommu/iommu.c               | 18 +++++++++++---
 include/linux/generic_pt/iommu.h    | 37 ++++++++++++++++++++++++-----
 include/linux/iommu.h               |  1 +
 4 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h
index 3327116a441cac..5c898c01659798 100644
--- a/drivers/iommu/generic_pt/iommu_pt.h
+++ b/drivers/iommu/generic_pt/iommu_pt.h
@@ -1012,34 +1012,12 @@ static __maybe_unused int __unmap_range(struct pt_range *range, void *arg,
 	return ret;
 }
 
-/**
- * unmap_pages() - Make a range of IOVA empty/not present
- * @domain: Domain to manipulate
- * @iova: IO virtual address to start
- * @pgsize: Length of each page
- * @pgcount: Length of the range in pgsize units starting from @iova
- * @iotlb_gather: Gather struct that must be flushed on return
- *
- * unmap_pages() will remove a translation created by map_pages(). It cannot
- * subdivide a mapping created by map_pages(), so it should be called with IOVA
- * ranges that match those passed to map_pages(). The IOVA range can aggregate
- * contiguous map_pages() calls so long as no individual range is split.
- *
- * Context: The caller must hold a write range lock that includes
- * the whole range.
- *
- * Returns: Number of bytes of VA unmapped. iova + res will be the point
- * unmapping stopped.
- */
-size_t DOMAIN_NS(unmap_pages)(struct iommu_domain *domain, unsigned long iova,
-			      size_t pgsize, size_t pgcount,
+static size_t NS(unmap_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
+			      dma_addr_t len,
 			      struct iommu_iotlb_gather *iotlb_gather)
 {
-	struct pt_iommu *iommu_table =
-		container_of(domain, struct pt_iommu, domain);
 	struct pt_unmap_args unmap = { .free_list = IOMMU_PAGES_LIST_INIT(
 					       unmap.free_list) };
-	pt_vaddr_t len = pgsize * pgcount;
 	struct pt_range range;
 	int ret;
 
@@ -1054,7 +1032,6 @@ size_t DOMAIN_NS(unmap_pages)(struct iommu_domain *domain, unsigned long iova,
 
 	return unmap.unmapped;
 }
-EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(unmap_pages), "GENERIC_PT_IOMMU");
 
 static void NS(get_info)(struct pt_iommu *iommu_table,
 			 struct pt_iommu_info *info)
@@ -1102,6 +1079,7 @@ static void NS(deinit)(struct pt_iommu *iommu_table)
 }
 
 static const struct pt_iommu_ops NS(ops) = {
+	.unmap_range = NS(unmap_range),
 #if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) && defined(pt_entry_is_write_dirty) && \
 	IS_ENABLED(CONFIG_IOMMUFD_TEST) && defined(pt_entry_make_write_dirty)
 	.set_dirty = NS(set_dirty),
@@ -1164,6 +1142,7 @@ static int pt_iommu_init_domain(struct pt_iommu *iommu_table,
 
 	domain->type = __IOMMU_DOMAIN_PAGING;
 	domain->pgsize_bitmap = info.pgsize_bitmap;
+	domain->is_iommupt = true;
 
 	if (pt_feature(common, PT_FEAT_DYNAMIC_TOP))
 		range = _pt_top_range(common,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2ca990dfbb884f..000dd6c374877b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -34,6 +34,7 @@
 #include <linux/sched/mm.h>
 #include <linux/msi.h>
 #include <uapi/linux/iommufd.h>
+#include <linux/generic_pt/iommu.h>
 
 #include "dma-iommu.h"
 #include "iommu-priv.h"
@@ -2596,9 +2597,9 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
-static size_t __iommu_unmap(struct iommu_domain *domain,
-			    unsigned long iova, size_t size,
-			    struct iommu_iotlb_gather *iotlb_gather)
+static size_t
+__iommu_unmap_domain_pgtbl(struct iommu_domain *domain, unsigned long iova,
+			   size_t size, struct iommu_iotlb_gather *iotlb_gather)
 {
 	const struct iommu_domain_ops *ops = domain->ops;
 	size_t unmapped_page, unmapped = 0;
@@ -2650,6 +2651,17 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 	return unmapped;
 }
 
+static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long iova,
+			    size_t size,
+			    struct iommu_iotlb_gather *iotlb_gather)
+{
+	struct pt_iommu *pt = iommupt_from_domain(domain);
+
+	if (pt)
+		return pt->ops->unmap_range(pt, iova, size, iotlb_gather);
+	return __iommu_unmap_domain_pgtbl(domain, iova, size, iotlb_gather);
+}
+
 /**
  * iommu_unmap() - Remove mappings from a range of IOVA
  * @domain: Domain to manipulate
diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
index 9eefbb74efd087..f094f8f44e4e8a 100644
--- a/include/linux/generic_pt/iommu.h
+++ b/include/linux/generic_pt/iommu.h
@@ -66,6 +66,13 @@ struct pt_iommu {
 	struct device *iommu_device;
 };
 
+static inline struct pt_iommu *iommupt_from_domain(struct iommu_domain *domain)
+{
+	if (!IS_ENABLED(CONFIG_IOMMU_PT) || !domain->is_iommupt)
+		return NULL;
+	return container_of(domain, struct pt_iommu, domain);
+}
+
 /**
  * struct pt_iommu_info - Details about the IOMMU page table
  *
@@ -80,6 +87,29 @@ struct pt_iommu_info {
 };
 
 struct pt_iommu_ops {
+	/**
+	 * @unmap_range: Make a range of IOVA empty/not present
+	 * @iommu_table: Table to manipulate
+	 * @iova: IO virtual address to start
+	 * @len: Length of the range starting from @iova
+	 * @iotlb_gather: Gather struct that must be flushed on return
+	 *
+	 * unmap_range() will remove a translation created by map_range(). It
+	 * cannot subdivide a mapping created by map_range(), so it should be
+	 * called with IOVA ranges that match those passed to map_pages. The
+	 * IOVA range can aggregate contiguous map_range() calls so long as no
+	 * individual range is split.
+	 *
+	 * Context: The caller must hold a write range lock that includes
+	 * the whole range.
+	 *
+	 * Returns: Number of bytes of VA unmapped. iova + res will be the
+	 * point unmapping stopped.
+	 */
+	size_t (*unmap_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
+			      dma_addr_t len,
+			      struct iommu_iotlb_gather *iotlb_gather);
+
 	/**
 	 * @set_dirty: Make the iova write dirty
 	 * @iommu_table: Table to manipulate
@@ -198,10 +228,6 @@ struct pt_iommu_cfg {
 				       unsigned long iova, phys_addr_t paddr,  \
 				       size_t pgsize, size_t pgcount,          \
 				       int prot, gfp_t gfp, size_t *mapped);   \
-	size_t pt_iommu_##fmt##_unmap_pages(                                   \
-		struct iommu_domain *domain, unsigned long iova,               \
-		size_t pgsize, size_t pgcount,                                 \
-		struct iommu_iotlb_gather *iotlb_gather);                      \
 	int pt_iommu_##fmt##_read_and_clear_dirty(                             \
 		struct iommu_domain *domain, unsigned long iova, size_t size,  \
 		unsigned long flags, struct iommu_dirty_bitmap *dirty);        \
@@ -223,8 +249,7 @@ struct pt_iommu_cfg {
  */
 #define IOMMU_PT_DOMAIN_OPS(fmt)                        \
 	.iova_to_phys = &pt_iommu_##fmt##_iova_to_phys, \
-	.map_pages = &pt_iommu_##fmt##_map_pages,       \
-	.unmap_pages = &pt_iommu_##fmt##_unmap_pages
+	.map_pages = &pt_iommu_##fmt##_map_pages
 #define IOMMU_PT_DIRTY_OPS(fmt) \
 	.read_and_clear_dirty = &pt_iommu_##fmt##_read_and_clear_dirty
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8c66284a91a8b0..0e8c9d31796bfd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -223,6 +223,7 @@ enum iommu_domain_cookie_type {
 struct iommu_domain {
 	unsigned type;
 	enum iommu_domain_cookie_type cookie_type;
+	bool is_iommupt;
 	const struct iommu_domain_ops *ops;
 	const struct iommu_dirty_ops *dirty_ops;
 	const struct iommu_ops *owner; /* Whose domain_alloc we came from */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] iommupt: Avoid rewalking during map
  2026-01-12 14:49 [PATCH 0/3] Let iommupt manage changes in page size internally Jason Gunthorpe
  2026-01-12 14:49 ` [PATCH 1/3] iommupt: Make pt_feature() always_inline Jason Gunthorpe
  2026-01-12 14:49 ` [PATCH 2/3] iommupt: Directly call iommupt's unmap_range() Jason Gunthorpe
@ 2026-01-12 14:49 ` Jason Gunthorpe
  2026-01-15  4:12   ` Samiullah Khawaja
  2026-01-15  6:44   ` Tian, Kevin
  2026-01-18  9:47 ` [PATCH 0/3] Let iommupt manage changes in page size internally Joerg Roedel
  3 siblings, 2 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2026-01-12 14:49 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, Kevin Tian, kernel test robot,
	Pasha Tatashin, patches, Samiullah Khawaja

Currently the core code provides a simplified interface to drivers where
it fragments a requested multi-page map into single page size steps after
doing all the calculations to figure out what page size is
appropriate. Each step rewalks the page tables from the start.

Since iommupt has a single implementation of the mapping algorithm it can
internally compute each step as it goes while retaining its current
position in the walk.

Add a new function pt_pgsz_count() which computes the same page size
fragement of a large mapping operations.

Compute the next fragment when all the leaf entries of the current
fragement have been written, then continue walking from the current
point.

The function pointer is run through pt_iommu_ops instead of
iommu_domain_ops to discourage using it outside iommupt. All drivers with
their own page tables should continue to use the simplified map_pages()
style interfaces.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/generic_pt/iommu_pt.h         | 127 ++++++++++++--------
 drivers/iommu/generic_pt/kunit_generic_pt.h |  12 ++
 drivers/iommu/generic_pt/pt_iter.h          |  22 ++++
 drivers/iommu/iommu.c                       |  25 +++-
 include/linux/generic_pt/iommu.h            |  34 +++++-
 5 files changed, 161 insertions(+), 59 deletions(-)

diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h
index 5c898c01659798..14a73d3b291b42 100644
--- a/drivers/iommu/generic_pt/iommu_pt.h
+++ b/drivers/iommu/generic_pt/iommu_pt.h
@@ -467,6 +467,7 @@ struct pt_iommu_map_args {
 	pt_oaddr_t oa;
 	unsigned int leaf_pgsize_lg2;
 	unsigned int leaf_level;
+	pt_vaddr_t num_leaves;
 };
 
 /*
@@ -519,11 +520,15 @@ static int clear_contig(const struct pt_state *start_pts,
 static int __map_range_leaf(struct pt_range *range, void *arg,
 			    unsigned int level, struct pt_table_p *table)
 {
+	struct pt_iommu *iommu_table = iommu_from_common(range->common);
 	struct pt_state pts = pt_init(range, level, table);
 	struct pt_iommu_map_args *map = arg;
 	unsigned int leaf_pgsize_lg2 = map->leaf_pgsize_lg2;
 	unsigned int start_index;
 	pt_oaddr_t oa = map->oa;
+	unsigned int num_leaves;
+	unsigned int orig_end;
+	pt_vaddr_t last_va;
 	unsigned int step;
 	bool need_contig;
 	int ret = 0;
@@ -537,6 +542,15 @@ static int __map_range_leaf(struct pt_range *range, void *arg,
 
 	_pt_iter_first(&pts);
 	start_index = pts.index;
+	orig_end = pts.end_index;
+	if (pts.index + map->num_leaves < pts.end_index) {
+		/* Need to stop in the middle of the table to change sizes */
+		pts.end_index = pts.index + map->num_leaves;
+		num_leaves = 0;
+	} else {
+		num_leaves = map->num_leaves - (pts.end_index - pts.index);
+	}
+
 	do {
 		pts.type = pt_load_entry_raw(&pts);
 		if (pts.type != PT_ENTRY_EMPTY || need_contig) {
@@ -562,7 +576,40 @@ static int __map_range_leaf(struct pt_range *range, void *arg,
 	flush_writes_range(&pts, start_index, pts.index);
 
 	map->oa = oa;
-	return ret;
+	map->num_leaves = num_leaves;
+	if (ret || num_leaves)
+		return ret;
+
+	/* range->va is not valid if we reached the end of the table */
+	pts.index -= step;
+	pt_index_to_va(&pts);
+	pts.index += step;
+	last_va = range->va + log2_to_int(leaf_pgsize_lg2);
+
+	if (last_va - 1 == range->last_va) {
+		PT_WARN_ON(pts.index != orig_end);
+		return 0;
+	}
+
+	/*
+	 * Reached a point where the page size changed, compute the new
+	 * parameters.
+	 */
+	map->leaf_pgsize_lg2 = pt_compute_best_pgsize(
+		iommu_table->domain.pgsize_bitmap, last_va, range->last_va, oa);
+	map->leaf_level =
+		pt_pgsz_lg2_to_level(range->common, map->leaf_pgsize_lg2);
+	map->num_leaves = pt_pgsz_count(iommu_table->domain.pgsize_bitmap,
+					last_va, range->last_va, oa,
+					map->leaf_pgsize_lg2);
+
+	/* Didn't finish this table level, caller will repeat it */
+	if (pts.index != orig_end) {
+		if (pts.index != start_index)
+			pt_index_to_va(&pts);
+		return -EAGAIN;
+	}
+	return 0;
 }
 
 static int __map_range(struct pt_range *range, void *arg, unsigned int level,
@@ -585,14 +632,9 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
 			if (pts.type != PT_ENTRY_EMPTY)
 				return -EADDRINUSE;
 			ret = pt_iommu_new_table(&pts, &map->attrs);
-			if (ret) {
-				/*
-				 * Racing with another thread installing a table
-				 */
-				if (ret == -EAGAIN)
-					continue;
+			/* EAGAIN on a race will loop again */
+			if (ret)
 				return ret;
-			}
 		} else {
 			pts.table_lower = pt_table_ptr(&pts);
 			/*
@@ -616,10 +658,12 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
 		 * The already present table can possibly be shared with another
 		 * concurrent map.
 		 */
-		if (map->leaf_level == level - 1)
-			ret = pt_descend(&pts, arg, __map_range_leaf);
-		else
-			ret = pt_descend(&pts, arg, __map_range);
+		do {
+			if (map->leaf_level == level - 1)
+				ret = pt_descend(&pts, arg, __map_range_leaf);
+			else
+				ret = pt_descend(&pts, arg, __map_range);
+		} while (ret == -EAGAIN);
 		if (ret)
 			return ret;
 
@@ -627,6 +671,8 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
 		pt_index_to_va(&pts);
 		if (pts.index >= pts.end_index)
 			break;
+		if (map->leaf_level == level)
+			return -EAGAIN;
 	} while (true);
 	return 0;
 }
@@ -798,12 +844,13 @@ static int check_map_range(struct pt_iommu *iommu_table, struct pt_range *range,
 static int do_map(struct pt_range *range, struct pt_common *common,
 		  bool single_page, struct pt_iommu_map_args *map)
 {
+	int ret;
+
 	/*
 	 * The __map_single_page() fast path does not support DMA_INCOHERENT
 	 * flushing to keep its .text small.
 	 */
 	if (single_page && !pt_feature(common, PT_FEAT_DMA_INCOHERENT)) {
-		int ret;
 
 		ret = pt_walk_range(range, __map_single_page, map);
 		if (ret != -EAGAIN)
@@ -811,50 +858,25 @@ static int do_map(struct pt_range *range, struct pt_common *common,
 		/* EAGAIN falls through to the full path */
 	}
 
-	if (map->leaf_level == range->top_level)
-		return pt_walk_range(range, __map_range_leaf, map);
-	return pt_walk_range(range, __map_range, map);
+	do {
+		if (map->leaf_level == range->top_level)
+			ret = pt_walk_range(range, __map_range_leaf, map);
+		else
+			ret = pt_walk_range(range, __map_range, map);
+	} while (ret == -EAGAIN);
+	return ret;
 }
 
-/**
- * map_pages() - Install translation for an IOVA range
- * @domain: Domain to manipulate
- * @iova: IO virtual address to start
- * @paddr: Physical/Output address to start
- * @pgsize: Length of each page
- * @pgcount: Length of the range in pgsize units starting from @iova
- * @prot: A bitmap of IOMMU_READ/WRITE/CACHE/NOEXEC/MMIO
- * @gfp: GFP flags for any memory allocations
- * @mapped: Total bytes successfully mapped
- *
- * The range starting at IOVA will have paddr installed into it. The caller
- * must specify a valid pgsize and pgcount to segment the range into compatible
- * blocks.
- *
- * On error the caller will probably want to invoke unmap on the range from iova
- * up to the amount indicated by @mapped to return the table back to an
- * unchanged state.
- *
- * Context: The caller must hold a write range lock that includes the whole
- * range.
- *
- * Returns: -ERRNO on failure, 0 on success. The number of bytes of VA that were
- * mapped are added to @mapped, @mapped is not zerod first.
- */
-int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
-			 phys_addr_t paddr, size_t pgsize, size_t pgcount,
-			 int prot, gfp_t gfp, size_t *mapped)
+static int NS(map_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
+			 phys_addr_t paddr, dma_addr_t len, unsigned int prot,
+			 gfp_t gfp, size_t *mapped)
 {
-	struct pt_iommu *iommu_table =
-		container_of(domain, struct pt_iommu, domain);
 	pt_vaddr_t pgsize_bitmap = iommu_table->domain.pgsize_bitmap;
 	struct pt_common *common = common_from_iommu(iommu_table);
 	struct iommu_iotlb_gather iotlb_gather;
-	pt_vaddr_t len = pgsize * pgcount;
 	struct pt_iommu_map_args map = {
 		.iotlb_gather = &iotlb_gather,
 		.oa = paddr,
-		.leaf_pgsize_lg2 = vaffs(pgsize),
 	};
 	bool single_page = false;
 	struct pt_range range;
@@ -882,13 +904,13 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
 		return ret;
 
 	/* Calculate target page size and level for the leaves */
-	if (pt_has_system_page_size(common) && pgsize == PAGE_SIZE &&
-	    pgcount == 1) {
+	if (pt_has_system_page_size(common) && len == PAGE_SIZE) {
 		PT_WARN_ON(!(pgsize_bitmap & PAGE_SIZE));
 		if (log2_mod(iova | paddr, PAGE_SHIFT))
 			return -ENXIO;
 		map.leaf_pgsize_lg2 = PAGE_SHIFT;
 		map.leaf_level = 0;
+		map.num_leaves = 1;
 		single_page = true;
 	} else {
 		map.leaf_pgsize_lg2 = pt_compute_best_pgsize(
@@ -897,6 +919,9 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
 			return -ENXIO;
 		map.leaf_level =
 			pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
+		map.num_leaves = pt_pgsz_count(pgsize_bitmap, range.va,
+					       range.last_va, paddr,
+					       map.leaf_pgsize_lg2);
 	}
 
 	ret = check_map_range(iommu_table, &range, &map);
@@ -919,7 +944,6 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
 	*mapped += map.oa - paddr;
 	return ret;
 }
-EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(map_pages), "GENERIC_PT_IOMMU");
 
 struct pt_unmap_args {
 	struct iommu_pages_list free_list;
@@ -1079,6 +1103,7 @@ static void NS(deinit)(struct pt_iommu *iommu_table)
 }
 
 static const struct pt_iommu_ops NS(ops) = {
+	.map_range = NS(map_range),
 	.unmap_range = NS(unmap_range),
 #if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) && defined(pt_entry_is_write_dirty) && \
 	IS_ENABLED(CONFIG_IOMMUFD_TEST) && defined(pt_entry_make_write_dirty)
diff --git a/drivers/iommu/generic_pt/kunit_generic_pt.h b/drivers/iommu/generic_pt/kunit_generic_pt.h
index 68278bf15cfe07..374e475f591e15 100644
--- a/drivers/iommu/generic_pt/kunit_generic_pt.h
+++ b/drivers/iommu/generic_pt/kunit_generic_pt.h
@@ -312,6 +312,17 @@ static void test_best_pgsize(struct kunit *test)
 	}
 }
 
+static void test_pgsz_count(struct kunit *test)
+{
+	KUNIT_EXPECT_EQ(test,
+			pt_pgsz_count(SZ_4K, 0, SZ_1G - 1, 0, ilog2(SZ_4K)),
+			SZ_1G / SZ_4K);
+	KUNIT_EXPECT_EQ(test,
+			pt_pgsz_count(SZ_2M | SZ_4K, SZ_4K, SZ_1G - 1, SZ_4K,
+				      ilog2(SZ_4K)),
+			(SZ_2M - SZ_4K) / SZ_4K);
+}
+
 /*
  * Check that pt_install_table() and pt_table_pa() match
  */
@@ -770,6 +781,7 @@ static struct kunit_case generic_pt_test_cases[] = {
 	KUNIT_CASE_FMT(test_init),
 	KUNIT_CASE_FMT(test_bitops),
 	KUNIT_CASE_FMT(test_best_pgsize),
+	KUNIT_CASE_FMT(test_pgsz_count),
 	KUNIT_CASE_FMT(test_table_ptr),
 	KUNIT_CASE_FMT(test_max_va),
 	KUNIT_CASE_FMT(test_table_radix),
diff --git a/drivers/iommu/generic_pt/pt_iter.h b/drivers/iommu/generic_pt/pt_iter.h
index c0d8617cce2928..3e45dbde6b8327 100644
--- a/drivers/iommu/generic_pt/pt_iter.h
+++ b/drivers/iommu/generic_pt/pt_iter.h
@@ -569,6 +569,28 @@ static inline unsigned int pt_compute_best_pgsize(pt_vaddr_t pgsz_bitmap,
 	return pgsz_lg2;
 }
 
+/*
+ * Return the number of pgsize_lg2 leaf entries that can be mapped for
+ * va to oa. This accounts for any requirement to reduce or increase the page
+ * size across the VA range.
+ */
+static inline pt_vaddr_t pt_pgsz_count(pt_vaddr_t pgsz_bitmap, pt_vaddr_t va,
+				       pt_vaddr_t last_va, pt_oaddr_t oa,
+				       unsigned int pgsize_lg2)
+{
+	pt_vaddr_t len = last_va - va + 1;
+	pt_vaddr_t next_pgsizes = log2_set_mod(pgsz_bitmap, 0, pgsize_lg2 + 1);
+
+	if (next_pgsizes) {
+		unsigned int next_pgsize_lg2 = vaffs(next_pgsizes);
+
+		if (log2_mod(va ^ oa, next_pgsize_lg2) == 0)
+			len = min(len, log2_set_mod_max(va, next_pgsize_lg2) -
+					       va + 1);
+	}
+	return log2_div(len, pgsize_lg2);
+}
+
 #define _PT_MAKE_CALL_LEVEL(fn)                                          \
 	static __always_inline int fn(struct pt_range *range, void *arg, \
 				      unsigned int level,                \
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 000dd6c374877b..4c3b8184a2c28b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2501,8 +2501,9 @@ static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
 	return pgsize;
 }
 
-int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
-		phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int __iommu_map_domain_pgtbl(struct iommu_domain *domain,
+				    unsigned long iova, phys_addr_t paddr,
+				    size_t size, int prot, gfp_t gfp)
 {
 	const struct iommu_domain_ops *ops = domain->ops;
 	unsigned long orig_iova = iova;
@@ -2580,6 +2581,26 @@ int iommu_sync_map(struct iommu_domain *domain, unsigned long iova, size_t size)
 	return ops->iotlb_sync_map(domain, iova, size);
 }
 
+int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
+		phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+	struct pt_iommu *pt = iommupt_from_domain(domain);
+
+	if (pt) {
+		size_t mapped = 0;
+		int ret;
+
+		ret = pt->ops->map_range(pt, iova, paddr, size, prot, gfp,
+					 &mapped);
+		if (ret) {
+			iommu_unmap(domain, iova, mapped);
+			return ret;
+		}
+		return 0;
+	}
+	return __iommu_map_domain_pgtbl(domain, iova, paddr, size, prot, gfp);
+}
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
index f094f8f44e4e8a..43cc98c9c55f70 100644
--- a/include/linux/generic_pt/iommu.h
+++ b/include/linux/generic_pt/iommu.h
@@ -87,6 +87,33 @@ struct pt_iommu_info {
 };
 
 struct pt_iommu_ops {
+	/**
+	 * @map_range: Install translation for an IOVA range
+	 * @iommu_table: Table to manipulate
+	 * @iova: IO virtual address to start
+	 * @paddr: Physical/Output address to start
+	 * @len: Length of the range starting from @iova
+	 * @prot: A bitmap of IOMMU_READ/WRITE/CACHE/NOEXEC/MMIO
+	 * @gfp: GFP flags for any memory allocations
+	 *
+	 * The range starting at IOVA will have paddr installed into it. The
+	 * rage is automatically segmented into optimally sized table entries,
+	 * and can have any valid alignment.
+	 *
+	 * On error the caller will probably want to invoke unmap on the range
+	 * from iova up to the amount indicated by @mapped to return the table
+	 * back to an unchanged state.
+	 *
+	 * Context: The caller must hold a write range lock that includes
+	 * the whole range.
+	 *
+	 * Returns: -ERRNO on failure, 0 on success. The number of bytes of VA
+	 * that were mapped are added to @mapped, @mapped is not zerod first.
+	 */
+	int (*map_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
+			 phys_addr_t paddr, dma_addr_t len, unsigned int prot,
+			 gfp_t gfp, size_t *mapped);
+
 	/**
 	 * @unmap_range: Make a range of IOVA empty/not present
 	 * @iommu_table: Table to manipulate
@@ -224,10 +251,6 @@ struct pt_iommu_cfg {
 #define IOMMU_PROTOTYPES(fmt)                                                  \
 	phys_addr_t pt_iommu_##fmt##_iova_to_phys(struct iommu_domain *domain, \
 						  dma_addr_t iova);            \
-	int pt_iommu_##fmt##_map_pages(struct iommu_domain *domain,            \
-				       unsigned long iova, phys_addr_t paddr,  \
-				       size_t pgsize, size_t pgcount,          \
-				       int prot, gfp_t gfp, size_t *mapped);   \
 	int pt_iommu_##fmt##_read_and_clear_dirty(                             \
 		struct iommu_domain *domain, unsigned long iova, size_t size,  \
 		unsigned long flags, struct iommu_dirty_bitmap *dirty);        \
@@ -248,8 +271,7 @@ struct pt_iommu_cfg {
  * iommu_pt
  */
 #define IOMMU_PT_DOMAIN_OPS(fmt)                        \
-	.iova_to_phys = &pt_iommu_##fmt##_iova_to_phys, \
-	.map_pages = &pt_iommu_##fmt##_map_pages
+	.iova_to_phys = &pt_iommu_##fmt##_iova_to_phys
 #define IOMMU_PT_DIRTY_OPS(fmt) \
 	.read_and_clear_dirty = &pt_iommu_##fmt##_read_and_clear_dirty
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH 1/3] iommupt: Make pt_feature() always_inline
  2026-01-12 14:49 ` [PATCH 1/3] iommupt: Make pt_feature() always_inline Jason Gunthorpe
@ 2026-01-14  8:15   ` Tian, Kevin
  0 siblings, 0 replies; 12+ messages in thread
From: Tian, Kevin @ 2026-01-14  8:15 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, lkp, Pasha Tatashin,
	patches@lists.linux.dev, Samiullah Khawaja

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, January 12, 2026 10:49 PM
> 
> gcc 8.5 on powerpc does not automatically inline these functions even
> though they evaluate to constants in key cases. Since the constant
> propagation is essential for some code elimination and built-time checks
> this causes a build failure:
> 
>  ERROR: modpost: "__pt_no_sw_bit"
> [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined!
> 
> Caused by this:
> 
> 	if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) &&
> 	    !pt_test_sw_bit_acquire(&pts,
> 				    SW_BIT_CACHE_FLUSH_DONE))
> 		flush_writes_item(&pts);
> 
> Where pts_feature() evaluates to a constant false. Mark them as
> __always_inline to force it to evaluate to a constant and trigger the code
> elimination.
> 
> Fixes: 7c5b184db714 ("genpt: Generic Page Table base API")
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-
> lkp@intel.com/
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

this was already sent separately and queued by Joerg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] iommupt: Avoid rewalking during map
  2026-01-12 14:49 ` [PATCH 3/3] iommupt: Avoid rewalking during map Jason Gunthorpe
@ 2026-01-15  4:12   ` Samiullah Khawaja
  2026-01-19 23:40     ` Jason Gunthorpe
  2026-01-15  6:44   ` Tian, Kevin
  1 sibling, 1 reply; 12+ messages in thread
From: Samiullah Khawaja @ 2026-01-15  4:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Alejandro Jimenez,
	Joerg Roedel, Kevin Tian, kernel test robot, Pasha Tatashin,
	patches

On Mon, Jan 12, 2026 at 6:49 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> Currently the core code provides a simplified interface to drivers where
> it fragments a requested multi-page map into single page size steps after
> doing all the calculations to figure out what page size is
> appropriate. Each step rewalks the page tables from the start.
>
> Since iommupt has a single implementation of the mapping algorithm it can
> internally compute each step as it goes while retaining its current
> position in the walk.
>
> Add a new function pt_pgsz_count() which computes the same page size
> fragement of a large mapping operations.
>
> Compute the next fragment when all the leaf entries of the current
> fragement have been written, then continue walking from the current
> point.
>
> The function pointer is run through pt_iommu_ops instead of
> iommu_domain_ops to discourage using it outside iommupt. All drivers with
> their own page tables should continue to use the simplified map_pages()
> style interfaces.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/generic_pt/iommu_pt.h         | 127 ++++++++++++--------
>  drivers/iommu/generic_pt/kunit_generic_pt.h |  12 ++
>  drivers/iommu/generic_pt/pt_iter.h          |  22 ++++
>  drivers/iommu/iommu.c                       |  25 +++-
>  include/linux/generic_pt/iommu.h            |  34 +++++-
>  5 files changed, 161 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h
> index 5c898c01659798..14a73d3b291b42 100644
> --- a/drivers/iommu/generic_pt/iommu_pt.h
> +++ b/drivers/iommu/generic_pt/iommu_pt.h
> @@ -467,6 +467,7 @@ struct pt_iommu_map_args {
>         pt_oaddr_t oa;
>         unsigned int leaf_pgsize_lg2;
>         unsigned int leaf_level;
> +       pt_vaddr_t num_leaves;
>  };
>
>  /*
> @@ -519,11 +520,15 @@ static int clear_contig(const struct pt_state *start_pts,
>  static int __map_range_leaf(struct pt_range *range, void *arg,
>                             unsigned int level, struct pt_table_p *table)
>  {
> +       struct pt_iommu *iommu_table = iommu_from_common(range->common);
>         struct pt_state pts = pt_init(range, level, table);
>         struct pt_iommu_map_args *map = arg;
>         unsigned int leaf_pgsize_lg2 = map->leaf_pgsize_lg2;
>         unsigned int start_index;
>         pt_oaddr_t oa = map->oa;
> +       unsigned int num_leaves;
> +       unsigned int orig_end;
> +       pt_vaddr_t last_va;
>         unsigned int step;
>         bool need_contig;
>         int ret = 0;
> @@ -537,6 +542,15 @@ static int __map_range_leaf(struct pt_range *range, void *arg,
>
>         _pt_iter_first(&pts);
>         start_index = pts.index;
> +       orig_end = pts.end_index;
> +       if (pts.index + map->num_leaves < pts.end_index) {
> +               /* Need to stop in the middle of the table to change sizes */
> +               pts.end_index = pts.index + map->num_leaves;
> +               num_leaves = 0;
> +       } else {
> +               num_leaves = map->num_leaves - (pts.end_index - pts.index);
> +       }
> +
>         do {
>                 pts.type = pt_load_entry_raw(&pts);
>                 if (pts.type != PT_ENTRY_EMPTY || need_contig) {
> @@ -562,7 +576,40 @@ static int __map_range_leaf(struct pt_range *range, void *arg,
>         flush_writes_range(&pts, start_index, pts.index);
>
>         map->oa = oa;
> -       return ret;
> +       map->num_leaves = num_leaves;
> +       if (ret || num_leaves)
> +               return ret;
> +
> +       /* range->va is not valid if we reached the end of the table */
> +       pts.index -= step;
> +       pt_index_to_va(&pts);
> +       pts.index += step;
> +       last_va = range->va + log2_to_int(leaf_pgsize_lg2);
> +
> +       if (last_va - 1 == range->last_va) {
> +               PT_WARN_ON(pts.index != orig_end);
> +               return 0;
> +       }
> +
> +       /*
> +        * Reached a point where the page size changed, compute the new
> +        * parameters.
> +        */
> +       map->leaf_pgsize_lg2 = pt_compute_best_pgsize(
> +               iommu_table->domain.pgsize_bitmap, last_va, range->last_va, oa);
> +       map->leaf_level =
> +               pt_pgsz_lg2_to_level(range->common, map->leaf_pgsize_lg2);
> +       map->num_leaves = pt_pgsz_count(iommu_table->domain.pgsize_bitmap,
> +                                       last_va, range->last_va, oa,
> +                                       map->leaf_pgsize_lg2);

The overall page walk with pgsize and leaf_level switching is great.
> +
> +       /* Didn't finish this table level, caller will repeat it */
> +       if (pts.index != orig_end) {
> +               if (pts.index != start_index)
> +                       pt_index_to_va(&pts);
> +               return -EAGAIN;
> +       }
> +       return 0;
>  }
>
>  static int __map_range(struct pt_range *range, void *arg, unsigned int level,
> @@ -585,14 +632,9 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
>                         if (pts.type != PT_ENTRY_EMPTY)
>                                 return -EADDRINUSE;
>                         ret = pt_iommu_new_table(&pts, &map->attrs);
> -                       if (ret) {
> -                               /*
> -                                * Racing with another thread installing a table
> -                                */
> -                               if (ret == -EAGAIN)
> -                                       continue;
> +                       /* EAGAIN on a race will loop again */
> +                       if (ret)
>                                 return ret;
> -                       }
>                 } else {
>                         pts.table_lower = pt_table_ptr(&pts);
>                         /*
> @@ -616,10 +658,12 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
>                  * The already present table can possibly be shared with another
>                  * concurrent map.
>                  */
> -               if (map->leaf_level == level - 1)
> -                       ret = pt_descend(&pts, arg, __map_range_leaf);
> -               else
> -                       ret = pt_descend(&pts, arg, __map_range);
> +               do {
> +                       if (map->leaf_level == level - 1)
> +                               ret = pt_descend(&pts, arg, __map_range_leaf);
> +                       else
> +                               ret = pt_descend(&pts, arg, __map_range);
> +               } while (ret == -EAGAIN);
>                 if (ret)
>                         return ret;
>
> @@ -627,6 +671,8 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
>                 pt_index_to_va(&pts);
>                 if (pts.index >= pts.end_index)
>                         break;
> +               if (map->leaf_level == level)
> +                       return -EAGAIN;

Could you please add a comment here. Maybe something that says that a
level switch happened and the caller needs to retry?
>         } while (true);
>         return 0;
>  }
> @@ -798,12 +844,13 @@ static int check_map_range(struct pt_iommu *iommu_table, struct pt_range *range,
>  static int do_map(struct pt_range *range, struct pt_common *common,
>                   bool single_page, struct pt_iommu_map_args *map)
>  {
> +       int ret;
> +
>         /*
>          * The __map_single_page() fast path does not support DMA_INCOHERENT
>          * flushing to keep its .text small.
>          */
>         if (single_page && !pt_feature(common, PT_FEAT_DMA_INCOHERENT)) {
> -               int ret;
>
>                 ret = pt_walk_range(range, __map_single_page, map);
>                 if (ret != -EAGAIN)
> @@ -811,50 +858,25 @@ static int do_map(struct pt_range *range, struct pt_common *common,
>                 /* EAGAIN falls through to the full path */
>         }
>
> -       if (map->leaf_level == range->top_level)
> -               return pt_walk_range(range, __map_range_leaf, map);
> -       return pt_walk_range(range, __map_range, map);
> +       do {
> +               if (map->leaf_level == range->top_level)
> +                       ret = pt_walk_range(range, __map_range_leaf, map);
> +               else
> +                       ret = pt_walk_range(range, __map_range, map);
> +       } while (ret == -EAGAIN);
> +       return ret;
>  }
>
> -/**
> - * map_pages() - Install translation for an IOVA range
> - * @domain: Domain to manipulate
> - * @iova: IO virtual address to start
> - * @paddr: Physical/Output address to start
> - * @pgsize: Length of each page
> - * @pgcount: Length of the range in pgsize units starting from @iova
> - * @prot: A bitmap of IOMMU_READ/WRITE/CACHE/NOEXEC/MMIO
> - * @gfp: GFP flags for any memory allocations
> - * @mapped: Total bytes successfully mapped
> - *
> - * The range starting at IOVA will have paddr installed into it. The caller
> - * must specify a valid pgsize and pgcount to segment the range into compatible
> - * blocks.
> - *
> - * On error the caller will probably want to invoke unmap on the range from iova
> - * up to the amount indicated by @mapped to return the table back to an
> - * unchanged state.
> - *
> - * Context: The caller must hold a write range lock that includes the whole
> - * range.
> - *
> - * Returns: -ERRNO on failure, 0 on success. The number of bytes of VA that were
> - * mapped are added to @mapped, @mapped is not zerod first.
> - */
> -int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
> -                        phys_addr_t paddr, size_t pgsize, size_t pgcount,
> -                        int prot, gfp_t gfp, size_t *mapped)
> +static int NS(map_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
> +                        phys_addr_t paddr, dma_addr_t len, unsigned int prot,
> +                        gfp_t gfp, size_t *mapped)
>  {
> -       struct pt_iommu *iommu_table =
> -               container_of(domain, struct pt_iommu, domain);
>         pt_vaddr_t pgsize_bitmap = iommu_table->domain.pgsize_bitmap;
>         struct pt_common *common = common_from_iommu(iommu_table);
>         struct iommu_iotlb_gather iotlb_gather;
> -       pt_vaddr_t len = pgsize * pgcount;
>         struct pt_iommu_map_args map = {
>                 .iotlb_gather = &iotlb_gather,
>                 .oa = paddr,
> -               .leaf_pgsize_lg2 = vaffs(pgsize),
>         };
>         bool single_page = false;
>         struct pt_range range;
> @@ -882,13 +904,13 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
>                 return ret;
>
>         /* Calculate target page size and level for the leaves */
> -       if (pt_has_system_page_size(common) && pgsize == PAGE_SIZE &&
> -           pgcount == 1) {
> +       if (pt_has_system_page_size(common) && len == PAGE_SIZE) {
>                 PT_WARN_ON(!(pgsize_bitmap & PAGE_SIZE));
>                 if (log2_mod(iova | paddr, PAGE_SHIFT))
>                         return -ENXIO;
>                 map.leaf_pgsize_lg2 = PAGE_SHIFT;
>                 map.leaf_level = 0;
> +               map.num_leaves = 1;
>                 single_page = true;
>         } else {
>                 map.leaf_pgsize_lg2 = pt_compute_best_pgsize(
> @@ -897,6 +919,9 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
>                         return -ENXIO;
>                 map.leaf_level =
>                         pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
> +               map.num_leaves = pt_pgsz_count(pgsize_bitmap, range.va,
> +                                              range.last_va, paddr,
> +                                              map.leaf_pgsize_lg2);
>         }
>
>         ret = check_map_range(iommu_table, &range, &map);
> @@ -919,7 +944,6 @@ int DOMAIN_NS(map_pages)(struct iommu_domain *domain, unsigned long iova,
>         *mapped += map.oa - paddr;
>         return ret;
>  }
> -EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(map_pages), "GENERIC_PT_IOMMU");
>
>  struct pt_unmap_args {
>         struct iommu_pages_list free_list;
> @@ -1079,6 +1103,7 @@ static void NS(deinit)(struct pt_iommu *iommu_table)
>  }
>
>  static const struct pt_iommu_ops NS(ops) = {
> +       .map_range = NS(map_range),
>         .unmap_range = NS(unmap_range),
>  #if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) && defined(pt_entry_is_write_dirty) && \
>         IS_ENABLED(CONFIG_IOMMUFD_TEST) && defined(pt_entry_make_write_dirty)
> diff --git a/drivers/iommu/generic_pt/kunit_generic_pt.h b/drivers/iommu/generic_pt/kunit_generic_pt.h
> index 68278bf15cfe07..374e475f591e15 100644
> --- a/drivers/iommu/generic_pt/kunit_generic_pt.h
> +++ b/drivers/iommu/generic_pt/kunit_generic_pt.h
> @@ -312,6 +312,17 @@ static void test_best_pgsize(struct kunit *test)
>         }
>  }
>
> +static void test_pgsz_count(struct kunit *test)
> +{
> +       KUNIT_EXPECT_EQ(test,
> +                       pt_pgsz_count(SZ_4K, 0, SZ_1G - 1, 0, ilog2(SZ_4K)),
> +                       SZ_1G / SZ_4K);
> +       KUNIT_EXPECT_EQ(test,
> +                       pt_pgsz_count(SZ_2M | SZ_4K, SZ_4K, SZ_1G - 1, SZ_4K,
> +                                     ilog2(SZ_4K)),
> +                       (SZ_2M - SZ_4K) / SZ_4K);
> +}
> +
>  /*
>   * Check that pt_install_table() and pt_table_pa() match
>   */
> @@ -770,6 +781,7 @@ static struct kunit_case generic_pt_test_cases[] = {
>         KUNIT_CASE_FMT(test_init),
>         KUNIT_CASE_FMT(test_bitops),
>         KUNIT_CASE_FMT(test_best_pgsize),
> +       KUNIT_CASE_FMT(test_pgsz_count),
>         KUNIT_CASE_FMT(test_table_ptr),
>         KUNIT_CASE_FMT(test_max_va),
>         KUNIT_CASE_FMT(test_table_radix),
> diff --git a/drivers/iommu/generic_pt/pt_iter.h b/drivers/iommu/generic_pt/pt_iter.h
> index c0d8617cce2928..3e45dbde6b8327 100644
> --- a/drivers/iommu/generic_pt/pt_iter.h
> +++ b/drivers/iommu/generic_pt/pt_iter.h
> @@ -569,6 +569,28 @@ static inline unsigned int pt_compute_best_pgsize(pt_vaddr_t pgsz_bitmap,
>         return pgsz_lg2;
>  }
>
> +/*
> + * Return the number of pgsize_lg2 leaf entries that can be mapped for
> + * va to oa. This accounts for any requirement to reduce or increase the page
> + * size across the VA range.
> + */
> +static inline pt_vaddr_t pt_pgsz_count(pt_vaddr_t pgsz_bitmap, pt_vaddr_t va,
> +                                      pt_vaddr_t last_va, pt_oaddr_t oa,
> +                                      unsigned int pgsize_lg2)
> +{
> +       pt_vaddr_t len = last_va - va + 1;
> +       pt_vaddr_t next_pgsizes = log2_set_mod(pgsz_bitmap, 0, pgsize_lg2 + 1);
> +
> +       if (next_pgsizes) {
> +               unsigned int next_pgsize_lg2 = vaffs(next_pgsizes);
> +
> +               if (log2_mod(va ^ oa, next_pgsize_lg2) == 0)
> +                       len = min(len, log2_set_mod_max(va, next_pgsize_lg2) -
> +                                              va + 1);
> +       }
> +       return log2_div(len, pgsize_lg2);
> +}
> +
>  #define _PT_MAKE_CALL_LEVEL(fn)                                          \
>         static __always_inline int fn(struct pt_range *range, void *arg, \
>                                       unsigned int level,                \
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 000dd6c374877b..4c3b8184a2c28b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2501,8 +2501,9 @@ static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
>         return pgsize;
>  }
>
> -int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
> -               phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +static int __iommu_map_domain_pgtbl(struct iommu_domain *domain,
> +                                   unsigned long iova, phys_addr_t paddr,
> +                                   size_t size, int prot, gfp_t gfp)
>  {
>         const struct iommu_domain_ops *ops = domain->ops;
>         unsigned long orig_iova = iova;
> @@ -2580,6 +2581,26 @@ int iommu_sync_map(struct iommu_domain *domain, unsigned long iova, size_t size)
>         return ops->iotlb_sync_map(domain, iova, size);
>  }
>
> +int iommu_map_nosync(struct iommu_domain *domain, unsigned long iova,
> +               phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +{
> +       struct pt_iommu *pt = iommupt_from_domain(domain);
> +
> +       if (pt) {
> +               size_t mapped = 0;
> +               int ret;
> +
> +               ret = pt->ops->map_range(pt, iova, paddr, size, prot, gfp,
> +                                        &mapped);
> +               if (ret) {
> +                       iommu_unmap(domain, iova, mapped);
> +                       return ret;
> +               }
> +               return 0;
> +       }
> +       return __iommu_map_domain_pgtbl(domain, iova, paddr, size, prot, gfp);
> +}
> +
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>               phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
>  {
> diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
> index f094f8f44e4e8a..43cc98c9c55f70 100644
> --- a/include/linux/generic_pt/iommu.h
> +++ b/include/linux/generic_pt/iommu.h
> @@ -87,6 +87,33 @@ struct pt_iommu_info {
>  };
>
>  struct pt_iommu_ops {
> +       /**
> +        * @map_range: Install translation for an IOVA range
> +        * @iommu_table: Table to manipulate
> +        * @iova: IO virtual address to start
> +        * @paddr: Physical/Output address to start
> +        * @len: Length of the range starting from @iova
> +        * @prot: A bitmap of IOMMU_READ/WRITE/CACHE/NOEXEC/MMIO
> +        * @gfp: GFP flags for any memory allocations
> +        *
> +        * The range starting at IOVA will have paddr installed into it. The
> +        * rage is automatically segmented into optimally sized table entries,
> +        * and can have any valid alignment.
> +        *
> +        * On error the caller will probably want to invoke unmap on the range
> +        * from iova up to the amount indicated by @mapped to return the table
> +        * back to an unchanged state.
> +        *
> +        * Context: The caller must hold a write range lock that includes
> +        * the whole range.
> +        *
> +        * Returns: -ERRNO on failure, 0 on success. The number of bytes of VA
> +        * that were mapped are added to @mapped, @mapped is not zerod first.
> +        */
> +       int (*map_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
> +                        phys_addr_t paddr, dma_addr_t len, unsigned int prot,
> +                        gfp_t gfp, size_t *mapped);
> +
>         /**
>          * @unmap_range: Make a range of IOVA empty/not present
>          * @iommu_table: Table to manipulate
> @@ -224,10 +251,6 @@ struct pt_iommu_cfg {
>  #define IOMMU_PROTOTYPES(fmt)                                                  \
>         phys_addr_t pt_iommu_##fmt##_iova_to_phys(struct iommu_domain *domain, \
>                                                   dma_addr_t iova);            \
> -       int pt_iommu_##fmt##_map_pages(struct iommu_domain *domain,            \
> -                                      unsigned long iova, phys_addr_t paddr,  \
> -                                      size_t pgsize, size_t pgcount,          \
> -                                      int prot, gfp_t gfp, size_t *mapped);   \
>         int pt_iommu_##fmt##_read_and_clear_dirty(                             \
>                 struct iommu_domain *domain, unsigned long iova, size_t size,  \
>                 unsigned long flags, struct iommu_dirty_bitmap *dirty);        \
> @@ -248,8 +271,7 @@ struct pt_iommu_cfg {
>   * iommu_pt
>   */
>  #define IOMMU_PT_DOMAIN_OPS(fmt)                        \
> -       .iova_to_phys = &pt_iommu_##fmt##_iova_to_phys, \
> -       .map_pages = &pt_iommu_##fmt##_map_pages
> +       .iova_to_phys = &pt_iommu_##fmt##_iova_to_phys
>  #define IOMMU_PT_DIRTY_OPS(fmt) \
>         .read_and_clear_dirty = &pt_iommu_##fmt##_read_and_clear_dirty
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] iommupt: Directly call iommupt's unmap_range()
  2026-01-12 14:49 ` [PATCH 2/3] iommupt: Directly call iommupt's unmap_range() Jason Gunthorpe
@ 2026-01-15  6:17   ` Tian, Kevin
  2026-01-15 18:26   ` Samiullah Khawaja
  1 sibling, 0 replies; 12+ messages in thread
From: Tian, Kevin @ 2026-01-15  6:17 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, lkp, Pasha Tatashin,
	patches@lists.linux.dev, Samiullah Khawaja

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, January 12, 2026 10:49 PM
> 
> The common algorithm in iommupt does not require the iommu_pgsize()
> calculations, it can directly unmap any arbitrary range. Add a new function
> pointer to directly call an iommupt unmap_range op and make
> __iommu_unmap() call it directly.
> 
> Gives about a 5% gain on single page unmappings.
> 
> The function pointer is run through pt_iommu_ops instead of
> iommu_domain_ops to discourage using it outside iommupt. All drivers with
> their own page tables should continue to use the simplified
> map/unmap_pages() style interfaces.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 3/3] iommupt: Avoid rewalking during map
  2026-01-12 14:49 ` [PATCH 3/3] iommupt: Avoid rewalking during map Jason Gunthorpe
  2026-01-15  4:12   ` Samiullah Khawaja
@ 2026-01-15  6:44   ` Tian, Kevin
  2026-01-19 23:30     ` Jason Gunthorpe
  1 sibling, 1 reply; 12+ messages in thread
From: Tian, Kevin @ 2026-01-15  6:44 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon
  Cc: Alejandro Jimenez, Joerg Roedel, lkp, Pasha Tatashin,
	patches@lists.linux.dev, Samiullah Khawaja

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, January 12, 2026 10:49 PM
>
> +int iommu_map_nosync(struct iommu_domain *domain, unsigned long
> iova,
> +		phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +{
> +	struct pt_iommu *pt = iommupt_from_domain(domain);
> +
> +	if (pt) {
> +		size_t mapped = 0;
> +		int ret;
> +
> +		ret = pt->ops->map_range(pt, iova, paddr, size, prot, gfp,
> +					 &mapped);
> +		if (ret) {
> +			iommu_unmap(domain, iova, mapped);
> +			return ret;
> +		}

lack of trace_map() here if succeeds

> +		return 0;
> +	}
> +	return __iommu_map_domain_pgtbl(domain, iova, paddr, size, prot,
> gfp);
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] iommupt: Directly call iommupt's unmap_range()
  2026-01-12 14:49 ` [PATCH 2/3] iommupt: Directly call iommupt's unmap_range() Jason Gunthorpe
  2026-01-15  6:17   ` Tian, Kevin
@ 2026-01-15 18:26   ` Samiullah Khawaja
  1 sibling, 0 replies; 12+ messages in thread
From: Samiullah Khawaja @ 2026-01-15 18:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Alejandro Jimenez,
	Joerg Roedel, Kevin Tian, kernel test robot, Pasha Tatashin,
	patches

On Mon, Jan 12, 2026 at 6:49 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> The common algorithm in iommupt does not require the iommu_pgsize()
> calculations, it can directly unmap any arbitrary range. Add a new function
> pointer to directly call an iommupt unmap_range op and make
> __iommu_unmap() call it directly.
>
> Gives about a 5% gain on single page unmappings.
>
> The function pointer is run through pt_iommu_ops instead of
> iommu_domain_ops to discourage using it outside iommupt. All drivers with
> their own page tables should continue to use the simplified
> map/unmap_pages() style interfaces.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/generic_pt/iommu_pt.h | 29 ++++------------------
>  drivers/iommu/iommu.c               | 18 +++++++++++---
>  include/linux/generic_pt/iommu.h    | 37 ++++++++++++++++++++++++-----
>  include/linux/iommu.h               |  1 +
>  4 files changed, 51 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h
> index 3327116a441cac..5c898c01659798 100644
> --- a/drivers/iommu/generic_pt/iommu_pt.h
> +++ b/drivers/iommu/generic_pt/iommu_pt.h
> @@ -1012,34 +1012,12 @@ static __maybe_unused int __unmap_range(struct pt_range *range, void *arg,
>         return ret;
>  }
>
> -/**
> - * unmap_pages() - Make a range of IOVA empty/not present
> - * @domain: Domain to manipulate
> - * @iova: IO virtual address to start
> - * @pgsize: Length of each page
> - * @pgcount: Length of the range in pgsize units starting from @iova
> - * @iotlb_gather: Gather struct that must be flushed on return
> - *
> - * unmap_pages() will remove a translation created by map_pages(). It cannot
> - * subdivide a mapping created by map_pages(), so it should be called with IOVA
> - * ranges that match those passed to map_pages(). The IOVA range can aggregate
> - * contiguous map_pages() calls so long as no individual range is split.
> - *
> - * Context: The caller must hold a write range lock that includes
> - * the whole range.
> - *
> - * Returns: Number of bytes of VA unmapped. iova + res will be the point
> - * unmapping stopped.
> - */
> -size_t DOMAIN_NS(unmap_pages)(struct iommu_domain *domain, unsigned long iova,
> -                             size_t pgsize, size_t pgcount,
> +static size_t NS(unmap_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
> +                             dma_addr_t len,
>                               struct iommu_iotlb_gather *iotlb_gather)
>  {
> -       struct pt_iommu *iommu_table =
> -               container_of(domain, struct pt_iommu, domain);
>         struct pt_unmap_args unmap = { .free_list = IOMMU_PAGES_LIST_INIT(
>                                                unmap.free_list) };
> -       pt_vaddr_t len = pgsize * pgcount;
>         struct pt_range range;
>         int ret;
>
> @@ -1054,7 +1032,6 @@ size_t DOMAIN_NS(unmap_pages)(struct iommu_domain *domain, unsigned long iova,
>
>         return unmap.unmapped;
>  }
> -EXPORT_SYMBOL_NS_GPL(DOMAIN_NS(unmap_pages), "GENERIC_PT_IOMMU");
>
>  static void NS(get_info)(struct pt_iommu *iommu_table,
>                          struct pt_iommu_info *info)
> @@ -1102,6 +1079,7 @@ static void NS(deinit)(struct pt_iommu *iommu_table)
>  }
>
>  static const struct pt_iommu_ops NS(ops) = {
> +       .unmap_range = NS(unmap_range),
>  #if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) && defined(pt_entry_is_write_dirty) && \
>         IS_ENABLED(CONFIG_IOMMUFD_TEST) && defined(pt_entry_make_write_dirty)
>         .set_dirty = NS(set_dirty),
> @@ -1164,6 +1142,7 @@ static int pt_iommu_init_domain(struct pt_iommu *iommu_table,
>
>         domain->type = __IOMMU_DOMAIN_PAGING;
>         domain->pgsize_bitmap = info.pgsize_bitmap;
> +       domain->is_iommupt = true;
>
>         if (pt_feature(common, PT_FEAT_DYNAMIC_TOP))
>                 range = _pt_top_range(common,
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2ca990dfbb884f..000dd6c374877b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -34,6 +34,7 @@
>  #include <linux/sched/mm.h>
>  #include <linux/msi.h>
>  #include <uapi/linux/iommufd.h>
> +#include <linux/generic_pt/iommu.h>
>
>  #include "dma-iommu.h"
>  #include "iommu-priv.h"
> @@ -2596,9 +2597,9 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
>
> -static size_t __iommu_unmap(struct iommu_domain *domain,
> -                           unsigned long iova, size_t size,
> -                           struct iommu_iotlb_gather *iotlb_gather)
> +static size_t
> +__iommu_unmap_domain_pgtbl(struct iommu_domain *domain, unsigned long iova,
> +                          size_t size, struct iommu_iotlb_gather *iotlb_gather)
>  {
>         const struct iommu_domain_ops *ops = domain->ops;
>         size_t unmapped_page, unmapped = 0;
> @@ -2650,6 +2651,17 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>         return unmapped;
>  }
>
> +static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long iova,
> +                           size_t size,
> +                           struct iommu_iotlb_gather *iotlb_gather)
> +{
> +       struct pt_iommu *pt = iommupt_from_domain(domain);
> +
> +       if (pt)
> +               return pt->ops->unmap_range(pt, iova, size, iotlb_gather);
> +       return __iommu_unmap_domain_pgtbl(domain, iova, size, iotlb_gather);
> +}
> +
>  /**
>   * iommu_unmap() - Remove mappings from a range of IOVA
>   * @domain: Domain to manipulate
> diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
> index 9eefbb74efd087..f094f8f44e4e8a 100644
> --- a/include/linux/generic_pt/iommu.h
> +++ b/include/linux/generic_pt/iommu.h
> @@ -66,6 +66,13 @@ struct pt_iommu {
>         struct device *iommu_device;
>  };
>
> +static inline struct pt_iommu *iommupt_from_domain(struct iommu_domain *domain)
> +{
> +       if (!IS_ENABLED(CONFIG_IOMMU_PT) || !domain->is_iommupt)
> +               return NULL;
> +       return container_of(domain, struct pt_iommu, domain);
> +}
> +
>  /**
>   * struct pt_iommu_info - Details about the IOMMU page table
>   *
> @@ -80,6 +87,29 @@ struct pt_iommu_info {
>  };
>
>  struct pt_iommu_ops {
> +       /**
> +        * @unmap_range: Make a range of IOVA empty/not present
> +        * @iommu_table: Table to manipulate
> +        * @iova: IO virtual address to start
> +        * @len: Length of the range starting from @iova
> +        * @iotlb_gather: Gather struct that must be flushed on return
> +        *
> +        * unmap_range() will remove a translation created by map_range(). It
> +        * cannot subdivide a mapping created by map_range(), so it should be
> +        * called with IOVA ranges that match those passed to map_pages. The
> +        * IOVA range can aggregate contiguous map_range() calls so long as no
> +        * individual range is split.
> +        *
> +        * Context: The caller must hold a write range lock that includes
> +        * the whole range.
> +        *
> +        * Returns: Number of bytes of VA unmapped. iova + res will be the
> +        * point unmapping stopped.
> +        */
> +       size_t (*unmap_range)(struct pt_iommu *iommu_table, dma_addr_t iova,
> +                             dma_addr_t len,
> +                             struct iommu_iotlb_gather *iotlb_gather);
> +
>         /**
>          * @set_dirty: Make the iova write dirty
>          * @iommu_table: Table to manipulate
> @@ -198,10 +228,6 @@ struct pt_iommu_cfg {
>                                        unsigned long iova, phys_addr_t paddr,  \
>                                        size_t pgsize, size_t pgcount,          \
>                                        int prot, gfp_t gfp, size_t *mapped);   \
> -       size_t pt_iommu_##fmt##_unmap_pages(                                   \
> -               struct iommu_domain *domain, unsigned long iova,               \
> -               size_t pgsize, size_t pgcount,                                 \
> -               struct iommu_iotlb_gather *iotlb_gather);                      \
>         int pt_iommu_##fmt##_read_and_clear_dirty(                             \
>                 struct iommu_domain *domain, unsigned long iova, size_t size,  \
>                 unsigned long flags, struct iommu_dirty_bitmap *dirty);        \
> @@ -223,8 +249,7 @@ struct pt_iommu_cfg {
>   */
>  #define IOMMU_PT_DOMAIN_OPS(fmt)                        \
>         .iova_to_phys = &pt_iommu_##fmt##_iova_to_phys, \
> -       .map_pages = &pt_iommu_##fmt##_map_pages,       \
> -       .unmap_pages = &pt_iommu_##fmt##_unmap_pages
> +       .map_pages = &pt_iommu_##fmt##_map_pages
>  #define IOMMU_PT_DIRTY_OPS(fmt) \
>         .read_and_clear_dirty = &pt_iommu_##fmt##_read_and_clear_dirty
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8c66284a91a8b0..0e8c9d31796bfd 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -223,6 +223,7 @@ enum iommu_domain_cookie_type {
>  struct iommu_domain {
>         unsigned type;
>         enum iommu_domain_cookie_type cookie_type;
> +       bool is_iommupt;
>         const struct iommu_domain_ops *ops;
>         const struct iommu_dirty_ops *dirty_ops;
>         const struct iommu_ops *owner; /* Whose domain_alloc we came from */
> --
> 2.43.0
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Let iommupt manage changes in page size internally
  2026-01-12 14:49 [PATCH 0/3] Let iommupt manage changes in page size internally Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2026-01-12 14:49 ` [PATCH 3/3] iommupt: Avoid rewalking during map Jason Gunthorpe
@ 2026-01-18  9:47 ` Joerg Roedel
  3 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2026-01-18  9:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Robin Murphy, Will Deacon, Alejandro Jimenez, Joerg Roedel,
	Kevin Tian, kernel test robot, Pasha Tatashin, patches,
	Samiullah Khawaja

Jason,

On Mon, Jan 12, 2026 at 10:49:01AM -0400, Jason Gunthorpe wrote:
> Jason Gunthorpe (3):
>   iommupt: Make pt_feature() always_inline
>   iommupt: Directly call iommupt's unmap_range()
>   iommupt: Avoid rewalking during map

Please re-spin patch 3 based on the comments and resend it together with patch
2.

-Joerg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] iommupt: Avoid rewalking during map
  2026-01-15  6:44   ` Tian, Kevin
@ 2026-01-19 23:30     ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2026-01-19 23:30 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu@lists.linux.dev, Joerg Roedel, Robin Murphy, Will Deacon,
	Alejandro Jimenez, Joerg Roedel, lkp, Pasha Tatashin,
	patches@lists.linux.dev, Samiullah Khawaja

On Thu, Jan 15, 2026 at 06:44:06AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, January 12, 2026 10:49 PM
> >
> > +int iommu_map_nosync(struct iommu_domain *domain, unsigned long
> > iova,
> > +		phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> > +{
> > +	struct pt_iommu *pt = iommupt_from_domain(domain);
> > +
> > +	if (pt) {
> > +		size_t mapped = 0;
> > +		int ret;
> > +
> > +		ret = pt->ops->map_range(pt, iova, paddr, size, prot, gfp,
> > +					 &mapped);
> > +		if (ret) {
> > +			iommu_unmap(domain, iova, mapped);
> > +			return ret;
> > +		}
> 
> lack of trace_map() here if succeeds

Ah unmap has the same miss too thanks

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] iommupt: Avoid rewalking during map
  2026-01-15  4:12   ` Samiullah Khawaja
@ 2026-01-19 23:40     ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2026-01-19 23:40 UTC (permalink / raw)
  To: Samiullah Khawaja
  Cc: iommu, Joerg Roedel, Robin Murphy, Will Deacon, Alejandro Jimenez,
	Joerg Roedel, Kevin Tian, kernel test robot, Pasha Tatashin,
	patches

On Wed, Jan 14, 2026 at 08:12:50PM -0800, Samiullah Khawaja wrote:

> > @@ -627,6 +671,8 @@ static int __map_range(struct pt_range *range, void *arg, unsigned int level,
> >                 pt_index_to_va(&pts);
> >                 if (pts.index >= pts.end_index)
> >                         break;
> > +               if (map->leaf_level == level)
> > +                       return -EAGAIN;
> 
> Could you please add a comment here. Maybe something that says that a
> level switch happened and the caller needs to retry?

		/*
		 * This level is currently running __map_range_leaf() which is
		 * not correct if the target level has been updated to this
		 * level. Have the caller invoke __map_range_leaf.
		 */
		if (map->leaf_level == level)
			return -EAGAIN;

Thanks,
Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-01-19 23:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 14:49 [PATCH 0/3] Let iommupt manage changes in page size internally Jason Gunthorpe
2026-01-12 14:49 ` [PATCH 1/3] iommupt: Make pt_feature() always_inline Jason Gunthorpe
2026-01-14  8:15   ` Tian, Kevin
2026-01-12 14:49 ` [PATCH 2/3] iommupt: Directly call iommupt's unmap_range() Jason Gunthorpe
2026-01-15  6:17   ` Tian, Kevin
2026-01-15 18:26   ` Samiullah Khawaja
2026-01-12 14:49 ` [PATCH 3/3] iommupt: Avoid rewalking during map Jason Gunthorpe
2026-01-15  4:12   ` Samiullah Khawaja
2026-01-19 23:40     ` Jason Gunthorpe
2026-01-15  6:44   ` Tian, Kevin
2026-01-19 23:30     ` Jason Gunthorpe
2026-01-18  9:47 ` [PATCH 0/3] Let iommupt manage changes in page size internally Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox