public inbox for patches@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH 0/5] Improve the invalidation path in AMD
@ 2026-03-27 15:23 Jason Gunthorpe
  2026-03-27 15:23 ` [PATCH 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-27 15:23 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon
  Cc: patches

Auditing this path reveals some unnecessary complexity and some
confusion around end/size/last and types that may cause issues at the
ULONG_MAX limit.

First, fix up the call chains so that the gather start/last is
retained faithfully throughout. This avoids converting to a
start/size, and avoids various type conversions along this path. Fix
everything so the theoretical cases like start=0 last=U64_MAX work
correctly.

Simplify all the calculations for alignment/etc to use trivial bit
maths instead of the complicated versions. These are all intended to
be no functional change.

Finally, add support for PDE=0, deriving it from the gather. PDE=0
allows the HW to avoid scanning the page directory cache and
supporting it should improve performance.

Jason Gunthorpe (5):
  iommu/amd: Simplify build_inv_address()
  iommu/amd: Pass last in through to build_inv_address()
  iommu/amd: Have amd_iommu_domain_flush_pages() use last
  iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
  iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather

 drivers/iommu/amd/amd_iommu.h       |   4 +-
 drivers/iommu/amd/amd_iommu_types.h |   2 +-
 drivers/iommu/amd/iommu.c           | 175 +++++++++++++---------------
 drivers/iommu/amd/pasid.c           |   2 +-
 4 files changed, 87 insertions(+), 96 deletions(-)


base-commit: 6c8f61b36b7efec0f8bc284142f998d30f0a28ba
-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] iommu/amd: Simplify build_inv_address()
  2026-03-27 15:23 [PATCH 0/5] Improve the invalidation path in AMD Jason Gunthorpe
@ 2026-03-27 15:23 ` Jason Gunthorpe
  2026-03-30  7:39   ` Wei Wang
  2026-03-27 15:23 ` [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-27 15:23 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon
  Cc: patches

This function is doing more work than it needs to:

 - iommu_num_pages() is pointless, the fls() is going to compute the
   required page size already.

 - It is easier to understand as sz_lg2, which is 12 if size is 4K,
   than msb_diff which is 11 if size is 4K.

 - Simplify the control flow to early exit on the out of range cases.

 - Use the usual last instead of end to signify an inclusive last
   address.

 - Use GENMASK to compute the 1's mask.

 - Use GENMASK to compute the address mask for the command layout,
   not PAGE_MASK.

 - Directly reference the spec language that defines the 52 bit
   limit.

No functional change intended.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/iommu.c | 50 +++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 528ebdd26d71c9..28308759bb3e4d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1267,39 +1267,37 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
  */
 static inline u64 build_inv_address(u64 address, size_t size)
 {
-	u64 pages, end, msb_diff;
+	u64 last = address + size - 1;
+	unsigned int sz_lg2;
 
-	pages = iommu_num_pages(address, size, PAGE_SIZE);
-
-	if (pages == 1)
-		return address & PAGE_MASK;
-
-	end = address + size - 1;
+	address &= GENMASK_U64(63, 12);
+	sz_lg2 = fls64(address ^ last);
+	if (sz_lg2 <= 12)
+		return address;
 
 	/*
-	 * msb_diff would hold the index of the most significant bit that
-	 * flipped between the start and end.
+	 * Encode sz_lg2 according to Table 14: Example Page Size Encodings
+	 *
+	 * See "Note *":
+	 *   Address bits 51:32 can be used to encode page sizes greater
+	 *   that 4 Gbytes.
+	 * Which we take to mean that the highest page size has bit
+	 *  [51]=0, [50:12]=1
+	 * and that coding happens when sz_lg2 is 52. Fall back to full
+	 * invalidation if the size is too big.
+	 *
 	 */
-	msb_diff = fls64(end ^ address) - 1;
+	if (unlikely(sz_lg2 > 52))
+		return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
+		       CMD_INV_IOMMU_PAGES_SIZE_MASK;
 
 	/*
-	 * Bits 63:52 are sign extended. If for some reason bit 51 is different
-	 * between the start and the end, invalidate everything.
+	 * The sz_lg2 calculation with fls() ensures that:
+	 *   address & BIT(sz_lg2 - 1) == 0
+	 * Therefore only the 1's need to be added. 8KB requires no 1's
 	 */
-	if (unlikely(msb_diff > 51)) {
-		address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
-	} else {
-		/*
-		 * The msb-bit must be clear on the address. Just set all the
-		 * lower bits.
-		 */
-		address |= (1ull << msb_diff) - 1;
-	}
-
-	/* Clear bits 11:0 */
-	address &= PAGE_MASK;
-
-	/* Set the size bit - we flush more than one 4kb page */
+	if (sz_lg2 > 13)
+		address |= GENMASK_U64(sz_lg2 - 2, 12);
 	return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address()
  2026-03-27 15:23 [PATCH 0/5] Improve the invalidation path in AMD Jason Gunthorpe
  2026-03-27 15:23 ` [PATCH 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
@ 2026-03-27 15:23 ` Jason Gunthorpe
  2026-03-30  8:03   ` Wei Wang
  2026-03-27 15:23 ` [PATCH 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-27 15:23 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon
  Cc: patches

This is the trivial call chain below amd_iommu_domain_flush_pages().

Cases that are doing a full invalidate will pass a last of U64_MAX.

This avoids converting between size and last, and type confusion with
size_t, unsigned long and u64 all being used in different places along
the driver's invalidation path. Consistently use u64 in the internals.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/iommu.c     | 56 +++++++++++++++++------------------
 drivers/iommu/amd/pasid.c     |  2 +-
 3 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1342e764a5486d..ef4ebecb003494 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -90,7 +90,7 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
 void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 				  u64 address, size_t size);
 void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
-				     ioasid_t pasid, u64 address, size_t size);
+				     ioasid_t pasid, u64 address, u64 last);
 
 #ifdef CONFIG_IRQ_REMAP
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 28308759bb3e4d..caab941b531bfe 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1265,9 +1265,8 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
  * Builds an invalidation address which is suitable for one page or multiple
  * pages. Sets the size bit (S) as needed is more than one page is flushed.
  */
-static inline u64 build_inv_address(u64 address, size_t size)
+static inline u64 build_inv_address(u64 address, u64 last)
 {
-	u64 last = address + size - 1;
 	unsigned int sz_lg2;
 
 	address &= GENMASK_U64(63, 12);
@@ -1302,10 +1301,10 @@ static inline u64 build_inv_address(u64 address, size_t size)
 }
 
 static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
-				  size_t size, u16 domid,
+				  u64 last, u16 domid,
 				  ioasid_t pasid, bool gn)
 {
-	u64 inv_address = build_inv_address(address, size);
+	u64 inv_address = build_inv_address(address, last);
 
 	memset(cmd, 0, sizeof(*cmd));
 
@@ -1322,10 +1321,10 @@ static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
 }
 
 static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
-				  u64 address, size_t size,
+				  u64 address, u64 last,
 				  ioasid_t pasid, bool gn)
 {
-	u64 inv_address = build_inv_address(address, size);
+	u64 inv_address = build_inv_address(address, last);
 
 	memset(cmd, 0, sizeof(*cmd));
 
@@ -1523,7 +1522,7 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 
 	for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
 		struct iommu_cmd cmd;
-		build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+		build_inv_iommu_pages(&cmd, 0, U64_MAX,
 				      dom_id, IOMMU_NO_PASID, false);
 		iommu_queue_command(iommu, &cmd);
 	}
@@ -1535,14 +1534,14 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
 {
 	struct iommu_cmd cmd;
 
-	build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+	build_inv_iommu_pages(&cmd, 0, U64_MAX,
 			      dom_id, IOMMU_NO_PASID, false);
 	iommu_queue_command(iommu, &cmd);
 
 	iommu_completion_wait(iommu);
 }
 
-static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
+static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, u64 last)
 {
 	int ret = 0;
 	struct amd_iommu_viommu *aviommu;
@@ -1559,7 +1558,7 @@ static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 add
 
 			pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
 				 iommu->devid, gdom_info->hdom_id);
-			build_inv_iommu_pages(&cmd, address, size, gdom_info->hdom_id,
+			build_inv_iommu_pages(&cmd, address, last, gdom_info->hdom_id,
 					      IOMMU_NO_PASID, false);
 			ret |= iommu_queue_command(iommu, &cmd);
 		}
@@ -1616,14 +1615,14 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
  * Command send function for flushing on-device TLB
  */
 static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 address,
-			      size_t size, ioasid_t pasid, bool gn)
+			      u64 last, ioasid_t pasid, bool gn)
 {
 	struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
 	struct iommu_cmd cmd;
 	int qdep = dev_data->ats_qdep;
 
 	build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, address,
-			      size, pasid, gn);
+			      last, pasid, gn);
 
 	return iommu_queue_command(iommu, &cmd);
 }
@@ -1667,7 +1666,7 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
 
 	if (dev_data->ats_enabled) {
 		/* Invalidate the entire contents of an IOTLB */
-		ret = device_flush_iotlb(dev_data, 0, ~0UL,
+		ret = device_flush_iotlb(dev_data, 0, U64_MAX,
 					 IOMMU_NO_PASID, false);
 	}
 
@@ -1675,7 +1674,7 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
 }
 
 static int domain_flush_pages_v2(struct protection_domain *pdom,
-				 u64 address, size_t size)
+				 u64 address, u64 last)
 {
 	struct iommu_dev_data *dev_data;
 	struct iommu_cmd cmd;
@@ -1686,7 +1685,7 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
 		struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
 		u16 domid = dev_data->gcr3_info.domid;
 
-		build_inv_iommu_pages(&cmd, address, size,
+		build_inv_iommu_pages(&cmd, address, last,
 				      domid, IOMMU_NO_PASID, true);
 
 		ret |= iommu_queue_command(iommu, &cmd);
@@ -1696,7 +1695,7 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
 }
 
 static int domain_flush_pages_v1(struct protection_domain *pdom,
-				 u64 address, size_t size)
+				 u64 address, u64 last)
 {
 	struct pdom_iommu_info *pdom_iommu_info;
 	struct iommu_cmd cmd;
@@ -1705,7 +1704,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
 
 	lockdep_assert_held(&pdom->lock);
 
-	build_inv_iommu_pages(&cmd, address, size,
+	build_inv_iommu_pages(&cmd, address, last,
 			      pdom->id, IOMMU_NO_PASID, false);
 
 	xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) {
@@ -1725,7 +1724,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
 	 * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
 	 */
 	if (!list_empty(&pdom->viommu_list))
-		ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, size);
+		ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, last);
 
 	return ret;
 }
@@ -1735,7 +1734,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
  * It flushes range of PTEs of the domain.
  */
 static void __domain_flush_pages(struct protection_domain *domain,
-				 u64 address, size_t size)
+				 u64 address, u64 last)
 {
 	struct iommu_dev_data *dev_data;
 	int ret = 0;
@@ -1746,9 +1745,9 @@ static void __domain_flush_pages(struct protection_domain *domain,
 
 	if (pdom_is_v2_pgtbl_mode(domain)) {
 		gn = true;
-		ret = domain_flush_pages_v2(domain, address, size);
+		ret = domain_flush_pages_v2(domain, address, last);
 	} else {
-		ret = domain_flush_pages_v1(domain, address, size);
+		ret = domain_flush_pages_v1(domain, address, last);
 	}
 
 	list_for_each_entry(dev_data, &domain->dev_list, list) {
@@ -1756,7 +1755,7 @@ static void __domain_flush_pages(struct protection_domain *domain,
 		if (!dev_data->ats_enabled)
 			continue;
 
-		ret |= device_flush_iotlb(dev_data, address, size, pasid, gn);
+		ret |= device_flush_iotlb(dev_data, address, last, pasid, gn);
 	}
 
 	WARN_ON(ret);
@@ -1768,7 +1767,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 	lockdep_assert_held(&domain->lock);
 
 	if (likely(!amd_iommu_np_cache)) {
-		__domain_flush_pages(domain, address, size);
+		__domain_flush_pages(domain, address, address + size - 1);
 
 		/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
 		domain_flush_complete(domain);
@@ -1805,7 +1804,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 
 		flush_size = 1ul << min_alignment;
 
-		__domain_flush_pages(domain, address, flush_size);
+		__domain_flush_pages(domain, address, address + flush_size - 1);
 		address += flush_size;
 		size -= flush_size;
 	}
@@ -1822,17 +1821,17 @@ static void amd_iommu_domain_flush_all(struct protection_domain *domain)
 }
 
 void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
-				     ioasid_t pasid, u64 address, size_t size)
+				     ioasid_t pasid, u64 address, u64 last)
 {
 	struct iommu_cmd cmd;
 	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
 
-	build_inv_iommu_pages(&cmd, address, size,
+	build_inv_iommu_pages(&cmd, address, last,
 			      dev_data->gcr3_info.domid, pasid, true);
 	iommu_queue_command(iommu, &cmd);
 
 	if (dev_data->ats_enabled)
-		device_flush_iotlb(dev_data, address, size, pasid, true);
+		device_flush_iotlb(dev_data, address, last, pasid, true);
 
 	iommu_completion_wait(iommu);
 }
@@ -1840,8 +1839,7 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
 static void dev_flush_pasid_all(struct iommu_dev_data *dev_data,
 				ioasid_t pasid)
 {
-	amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0,
-					CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
+	amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0, U64_MAX);
 }
 
 int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
diff --git a/drivers/iommu/amd/pasid.c b/drivers/iommu/amd/pasid.c
index 67eace9809f16a..d708c6532480a4 100644
--- a/drivers/iommu/amd/pasid.c
+++ b/drivers/iommu/amd/pasid.c
@@ -71,7 +71,7 @@ static void sva_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
 	for_each_pdom_dev_data(pdom_dev_data, sva_pdom) {
 		amd_iommu_dev_flush_pasid_pages(pdom_dev_data->dev_data,
 						pdom_dev_data->pasid,
-						start, end - start);
+						start, end - 1);
 	}
 
 	spin_unlock_irqrestore(&sva_pdom->lock, flags);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last
  2026-03-27 15:23 [PATCH 0/5] Improve the invalidation path in AMD Jason Gunthorpe
  2026-03-27 15:23 ` [PATCH 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
  2026-03-27 15:23 ` [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
@ 2026-03-27 15:23 ` Jason Gunthorpe
  2026-04-01  6:33   ` Wei Wang
  2026-03-27 15:23 ` [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
  2026-03-27 15:23 ` [PATCH 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-27 15:23 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon
  Cc: patches

Finish clearing out the size/last/end switching by converting
amd_iommu_domain_flush_pages() to use last-based logic.

This algorithm is simpler than the previous. Ultimately all this wants
to do is select powers of two that are aligned to address and not
longer than the distance to last.

The new version is fully safe for size = U64_MAX and last = U64_MAX.

Finally, the gather can be passed through natively without risking an
overflow in (gather->end - gather->start + 1).

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/iommu.c     | 44 +++++++++++++----------------------
 2 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index ef4ebecb003494..67d4daaaf6f0ce 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -88,7 +88,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
  */
 void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
 void amd_iommu_domain_flush_pages(struct protection_domain *domain,
-				  u64 address, size_t size);
+				  u64 address, u64 last);
 void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
 				     ioasid_t pasid, u64 address, u64 last);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index caab941b531bfe..a3773a145af811 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1762,12 +1762,13 @@ static void __domain_flush_pages(struct protection_domain *domain,
 }
 
 void amd_iommu_domain_flush_pages(struct protection_domain *domain,
-				  u64 address, size_t size)
+				  u64 address, u64 last)
 {
 	lockdep_assert_held(&domain->lock);
 
-	if (likely(!amd_iommu_np_cache)) {
-		__domain_flush_pages(domain, address, address + size - 1);
+	if (likely(!amd_iommu_np_cache) ||
+	    unlikely(address == 0 && last == U64_MAX)) {
+		__domain_flush_pages(domain, address, last);
 
 		/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
 		domain_flush_complete(domain);
@@ -1785,28 +1786,17 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 	 * between the natural alignment of the address that we flush and the
 	 * greatest naturally aligned region that fits in the range.
 	 */
-	while (size != 0) {
-		int addr_alignment = __ffs(address);
-		int size_alignment = __fls(size);
-		int min_alignment;
-		size_t flush_size;
+	while (address <= last) {
+		unsigned int sz_lg2 = ilog2(last - address + 1);
+		u64 flush_last;
 
-		/*
-		 * size is always non-zero, but address might be zero, causing
-		 * addr_alignment to be negative. As the casting of the
-		 * argument in __ffs(address) to long might trim the high bits
-		 * of the address on x86-32, cast to long when doing the check.
-		 */
-		if (likely((unsigned long)address != 0))
-			min_alignment = min(addr_alignment, size_alignment);
-		else
-			min_alignment = size_alignment;
+		if (likely(address))
+			sz_lg2 = min_t(unsigned int, sz_lg2, __ffs64(address));
 
-		flush_size = 1ul << min_alignment;
-
-		__domain_flush_pages(domain, address, address + flush_size - 1);
-		address += flush_size;
-		size -= flush_size;
+		flush_last = address + (1ULL << sz_lg2) - 1;
+		__domain_flush_pages(domain, address, flush_last);
+		if (check_add_overflow(flush_last, 1, &address))
+			break;
 	}
 
 	/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
@@ -1816,8 +1806,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 /* Flush the whole IO/TLB for a given protection domain - including PDE */
 static void amd_iommu_domain_flush_all(struct protection_domain *domain)
 {
-	amd_iommu_domain_flush_pages(domain, 0,
-				     CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
+	amd_iommu_domain_flush_pages(domain, 0, U64_MAX);
 }
 
 void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
@@ -2617,7 +2606,7 @@ static int amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
 		return 0;
 
 	spin_lock_irqsave(&domain->lock, flags);
-	amd_iommu_domain_flush_pages(domain, iova, size);
+	amd_iommu_domain_flush_pages(domain, iova, iova + size - 1);
 	spin_unlock_irqrestore(&domain->lock, flags);
 	return 0;
 }
@@ -2639,8 +2628,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
 	unsigned long flags;
 
 	spin_lock_irqsave(&dom->lock, flags);
-	amd_iommu_domain_flush_pages(dom, gather->start,
-				     gather->end - gather->start + 1);
+	amd_iommu_domain_flush_pages(dom, gather->start, gather->end);
 	spin_unlock_irqrestore(&dom->lock, flags);
 	iommu_put_pages_list(&gather->freelist);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
  2026-03-27 15:23 [PATCH 0/5] Improve the invalidation path in AMD Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2026-03-27 15:23 ` [PATCH 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
@ 2026-03-27 15:23 ` Jason Gunthorpe
  2026-04-01  6:42   ` Wei Wang
  2026-03-27 15:23 ` [PATCH 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-27 15:23 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon
  Cc: patches

The spec in Table 14 defines the "Entire Cache" case as having the low
12 bits as zero. Indeed the command format doesn't even have the low
12 bits. Since there is only one user now, fix the constant to have 0
in the low 12 bits instead of 1 and remove the masking.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 drivers/iommu/amd/iommu.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index c685d3771436a2..c239c5fbe47461 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -221,7 +221,7 @@
 #define PPR_STATUS_MASK			0xf
 #define PPR_STATUS_SHIFT		12
 
-#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS	0x7fffffffffffffffULL
+#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS	0x7ffffffffffff000ULL
 
 /* macros and definitions for device table entries */
 #define DEV_ENTRY_VALID         0x00
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a3773a145af811..4e057fbc20021e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1287,7 +1287,7 @@ static inline u64 build_inv_address(u64 address, u64 last)
 	 *
 	 */
 	if (unlikely(sz_lg2 > 52))
-		return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
+		return CMD_INV_IOMMU_ALL_PAGES_ADDRESS |
 		       CMD_INV_IOMMU_PAGES_SIZE_MASK;
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
  2026-03-27 15:23 [PATCH 0/5] Improve the invalidation path in AMD Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2026-03-27 15:23 ` [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
@ 2026-03-27 15:23 ` Jason Gunthorpe
  2026-04-01  8:51   ` Wei Wang
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-27 15:23 UTC (permalink / raw)
  To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon
  Cc: patches

Now that AMD uses iommupt, it is easy to make use of the PDE bit. If
the gather has no free list then no page directory entries were
changed.

Pass GN/PDE through the invalidation call chain in a u32 flags field
that is OR'd into data[2] and set it properly from the gather.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/iommu.c     | 61 +++++++++++++++++++----------------
 2 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 67d4daaaf6f0ce..364878a923edd0 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -88,7 +88,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
  */
 void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
 void amd_iommu_domain_flush_pages(struct protection_domain *domain,
-				  u64 address, u64 last);
+				  u64 address, u64 last, u32 flags);
 void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
 				     ioasid_t pasid, u64 address, u64 last);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4e057fbc20021e..95f7dc6cf67eaf 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1301,8 +1301,8 @@ static inline u64 build_inv_address(u64 address, u64 last)
 }
 
 static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
-				  u64 last, u16 domid,
-				  ioasid_t pasid, bool gn)
+				  u64 last, u16 domid, ioasid_t pasid,
+				  u32 flags)
 {
 	u64 inv_address = build_inv_address(address, last);
 
@@ -1311,12 +1311,9 @@ static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
 	cmd->data[1] |= domid;
 	cmd->data[2]  = lower_32_bits(inv_address);
 	cmd->data[3]  = upper_32_bits(inv_address);
-	/* PDE bit - we want to flush everything, not only the PTEs */
-	cmd->data[2] |= CMD_INV_IOMMU_PAGES_PDE_MASK;
-	if (gn) {
+	cmd->data[2] |= flags;
+	if (flags & CMD_INV_IOMMU_PAGES_GN_MASK)
 		cmd->data[0] |= pasid;
-		cmd->data[2] |= CMD_INV_IOMMU_PAGES_GN_MASK;
-	}
 	CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
 }
 
@@ -1523,7 +1520,8 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 	for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
 		struct iommu_cmd cmd;
 		build_inv_iommu_pages(&cmd, 0, U64_MAX,
-				      dom_id, IOMMU_NO_PASID, false);
+				      dom_id, IOMMU_NO_PASID,
+				      CMD_INV_IOMMU_PAGES_PDE_MASK);
 		iommu_queue_command(iommu, &cmd);
 	}
 
@@ -1535,13 +1533,15 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
 	struct iommu_cmd cmd;
 
 	build_inv_iommu_pages(&cmd, 0, U64_MAX,
-			      dom_id, IOMMU_NO_PASID, false);
+			      dom_id, IOMMU_NO_PASID,
+			      CMD_INV_IOMMU_PAGES_PDE_MASK);
 	iommu_queue_command(iommu, &cmd);
 
 	iommu_completion_wait(iommu);
 }
 
-static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, u64 last)
+static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom,
+					 u64 address, u64 last, u32 flags)
 {
 	int ret = 0;
 	struct amd_iommu_viommu *aviommu;
@@ -1559,7 +1559,7 @@ static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 add
 			pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
 				 iommu->devid, gdom_info->hdom_id);
 			build_inv_iommu_pages(&cmd, address, last, gdom_info->hdom_id,
-					      IOMMU_NO_PASID, false);
+					      IOMMU_NO_PASID, flags);
 			ret |= iommu_queue_command(iommu, &cmd);
 		}
 		xa_unlock(&aviommu->gdomid_array);
@@ -1674,7 +1674,7 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
 }
 
 static int domain_flush_pages_v2(struct protection_domain *pdom,
-				 u64 address, u64 last)
+				 u64 address, u64 last, u32 flags)
 {
 	struct iommu_dev_data *dev_data;
 	struct iommu_cmd cmd;
@@ -1685,8 +1685,9 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
 		struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
 		u16 domid = dev_data->gcr3_info.domid;
 
-		build_inv_iommu_pages(&cmd, address, last,
-				      domid, IOMMU_NO_PASID, true);
+		build_inv_iommu_pages(&cmd, address, last, domid,
+				      IOMMU_NO_PASID,
+				      flags | CMD_INV_IOMMU_PAGES_GN_MASK);
 
 		ret |= iommu_queue_command(iommu, &cmd);
 	}
@@ -1695,7 +1696,7 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
 }
 
 static int domain_flush_pages_v1(struct protection_domain *pdom,
-				 u64 address, u64 last)
+				 u64 address, u64 last, u32 flags)
 {
 	struct pdom_iommu_info *pdom_iommu_info;
 	struct iommu_cmd cmd;
@@ -1705,7 +1706,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
 	lockdep_assert_held(&pdom->lock);
 
 	build_inv_iommu_pages(&cmd, address, last,
-			      pdom->id, IOMMU_NO_PASID, false);
+			      pdom->id, IOMMU_NO_PASID, flags);
 
 	xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) {
 		/*
@@ -1724,7 +1725,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
 	 * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
 	 */
 	if (!list_empty(&pdom->viommu_list))
-		ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, last);
+		ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, last, flags);
 
 	return ret;
 }
@@ -1734,7 +1735,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
  * It flushes range of PTEs of the domain.
  */
 static void __domain_flush_pages(struct protection_domain *domain,
-				 u64 address, u64 last)
+				 u64 address, u64 last, u32 flags)
 {
 	struct iommu_dev_data *dev_data;
 	int ret = 0;
@@ -1745,9 +1746,9 @@ static void __domain_flush_pages(struct protection_domain *domain,
 
 	if (pdom_is_v2_pgtbl_mode(domain)) {
 		gn = true;
-		ret = domain_flush_pages_v2(domain, address, last);
+		ret = domain_flush_pages_v2(domain, address, last, flags);
 	} else {
-		ret = domain_flush_pages_v1(domain, address, last);
+		ret = domain_flush_pages_v1(domain, address, last, flags);
 	}
 
 	list_for_each_entry(dev_data, &domain->dev_list, list) {
@@ -1762,13 +1763,13 @@ static void __domain_flush_pages(struct protection_domain *domain,
 }
 
 void amd_iommu_domain_flush_pages(struct protection_domain *domain,
-				  u64 address, u64 last)
+				  u64 address, u64 last, u32 flags)
 {
 	lockdep_assert_held(&domain->lock);
 
 	if (likely(!amd_iommu_np_cache) ||
 	    unlikely(address == 0 && last == U64_MAX)) {
-		__domain_flush_pages(domain, address, last);
+		__domain_flush_pages(domain, address, last, flags);
 
 		/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
 		domain_flush_complete(domain);
@@ -1794,7 +1795,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 			sz_lg2 = min_t(unsigned int, sz_lg2, __ffs64(address));
 
 		flush_last = address + (1ULL << sz_lg2) - 1;
-		__domain_flush_pages(domain, address, flush_last);
+		__domain_flush_pages(domain, address, flush_last, flags);
 		if (check_add_overflow(flush_last, 1, &address))
 			break;
 	}
@@ -1806,7 +1807,8 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 /* Flush the whole IO/TLB for a given protection domain - including PDE */
 static void amd_iommu_domain_flush_all(struct protection_domain *domain)
 {
-	amd_iommu_domain_flush_pages(domain, 0, U64_MAX);
+	amd_iommu_domain_flush_pages(domain, 0, U64_MAX,
+				     CMD_INV_IOMMU_PAGES_PDE_MASK);
 }
 
 void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
@@ -1816,7 +1818,9 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
 	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
 
 	build_inv_iommu_pages(&cmd, address, last,
-			      dev_data->gcr3_info.domid, pasid, true);
+			      dev_data->gcr3_info.domid, pasid,
+			      CMD_INV_IOMMU_PAGES_GN_MASK |
+			      CMD_INV_IOMMU_PAGES_PDE_MASK);
 	iommu_queue_command(iommu, &cmd);
 
 	if (dev_data->ats_enabled)
@@ -2606,7 +2610,8 @@ static int amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
 		return 0;
 
 	spin_lock_irqsave(&domain->lock, flags);
-	amd_iommu_domain_flush_pages(domain, iova, iova + size - 1);
+	amd_iommu_domain_flush_pages(domain, iova, iova + size - 1,
+				     CMD_INV_IOMMU_PAGES_PDE_MASK);
 	spin_unlock_irqrestore(&domain->lock, flags);
 	return 0;
 }
@@ -2628,7 +2633,9 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
 	unsigned long flags;
 
 	spin_lock_irqsave(&dom->lock, flags);
-	amd_iommu_domain_flush_pages(dom, gather->start, gather->end);
+	amd_iommu_domain_flush_pages(dom, gather->start, gather->end,
+				     iommu_pages_list_empty(&gather->freelist) ?
+				     0 : CMD_INV_IOMMU_PAGES_PDE_MASK);
 	spin_unlock_irqrestore(&dom->lock, flags);
 	iommu_put_pages_list(&gather->freelist);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] iommu/amd: Simplify build_inv_address()
  2026-03-27 15:23 ` [PATCH 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
@ 2026-03-30  7:39   ` Wei Wang
  2026-03-30 12:05     ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Wang @ 2026-03-30  7:39 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	Suravee Suthikulpanit, Will Deacon
  Cc: patches

On 3/27/26 11:23 PM, Jason Gunthorpe wrote:
> This function is doing more work than it needs to:
> 
>   - iommu_num_pages() is pointless, the fls() is going to compute the
>     required page size already.
> 
>   - It is easier to understand as sz_lg2, which is 12 if size is 4K,
>     than msb_diff which is 11 if size is 4K.
> 
>   - Simplify the control flow to early exit on the out of range cases.
> 
>   - Use the usual last instead of end to signify an inclusive last
>     address.
> 
>   - Use GENMASK to compute the 1's mask.
> 
>   - Use GENMASK to compute the address mask for the command layout,
>     not PAGE_MASK.
> 
>   - Directly reference the spec language that defines the 52 bit
>     limit.
> 
> No functional change intended.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/amd/iommu.c | 50 +++++++++++++++++++--------------------
>   1 file changed, 24 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 528ebdd26d71c9..28308759bb3e4d 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1267,39 +1267,37 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
>    */
>   static inline u64 build_inv_address(u64 address, size_t size)
>   {
> -	u64 pages, end, msb_diff;
> +	u64 last = address + size - 1;
> +	unsigned int sz_lg2;

Would naming it shift (or order) be more conventional?


>   
> -	pages = iommu_num_pages(address, size, PAGE_SIZE);
> -
> -	if (pages == 1)
> -		return address & PAGE_MASK;
> -
> -	end = address + size - 1;
> +	address &= GENMASK_U64(63, 12);
> +	sz_lg2 = fls64(address ^ last);
> +	if (sz_lg2 <= 12)

then could use "if (shift < PAGE_SHIFT)" here.

> +		return address;
>   
>   	/*
> -	 * msb_diff would hold the index of the most significant bit that
> -	 * flipped between the start and end.
> +	 * Encode sz_lg2 according to Table 14: Example Page Size Encodings
> +	 *
> +	 * See "Note *":
> +	 *   Address bits 51:32 can be used to encode page sizes greater
> +	 *   that 4 Gbytes.

"than" (seems a spec typo)

> +	 * Which we take to mean that the highest page size has bit
> +	 *  [51]=0, [50:12]=1
> +	 * and that coding happens when sz_lg2 is 52. Fall back to full
> +	 * invalidation if the size is too big.
> +	 *
>   	 */
> -	msb_diff = fls64(end ^ address) - 1;
> +	if (unlikely(sz_lg2 > 52))
> +		return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
> +		       CMD_INV_IOMMU_PAGES_SIZE_MASK;


The spec mentions "Address bits 63:52 are zero-extended.", should we 
enforce it:

-#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS        0x7ffffffffffff000ULL
+#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS        GENMASK_ULL(51, 12)


also for the returned "address"

>   
>   	/*
> -	 * Bits 63:52 are sign extended. If for some reason bit 51 is different
> -	 * between the start and the end, invalidate everything.
> +	 * The sz_lg2 calculation with fls() ensures that:
> +	 *   address & BIT(sz_lg2 - 1) == 0
> +	 * Therefore only the 1's need to be added. 8KB requires no 1's
>   	 */
> -	if (unlikely(msb_diff > 51)) {
> -		address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
> -	} else {
> -		/*
> -		 * The msb-bit must be clear on the address. Just set all the
> -		 * lower bits.
> -		 */
> -		address |= (1ull << msb_diff) - 1;
> -	}
> -
> -	/* Clear bits 11:0 */
> -	address &= PAGE_MASK;
> -
> -	/* Set the size bit - we flush more than one 4kb page */
> +	if (sz_lg2 > 13)
> +		address |= GENMASK_U64(sz_lg2 - 2, 12);
>   	return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
>   }
>   


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address()
  2026-03-27 15:23 ` [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
@ 2026-03-30  8:03   ` Wei Wang
  2026-04-01 19:55     ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Wang @ 2026-03-30  8:03 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	Suravee Suthikulpanit, Will Deacon
  Cc: patches

On 3/27/26 11:23 PM, Jason Gunthorpe wrote:
> This is the trivial call chain below amd_iommu_domain_flush_pages().
> 
> Cases that are doing a full invalidate will pass a last of U64_MAX.
> 
> This avoids converting between size and last, and type confusion with
> size_t, unsigned long and u64 all being used in different places along
> the driver's invalidation path. Consistently use u64 in the internals.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/amd/amd_iommu.h |  2 +-
>   drivers/iommu/amd/iommu.c     | 56 +++++++++++++++++------------------
>   drivers/iommu/amd/pasid.c     |  2 +-
>   3 files changed, 29 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index 1342e764a5486d..ef4ebecb003494 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -90,7 +90,7 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
>   void amd_iommu_domain_flush_pages(struct protection_domain *domain,
>   				  u64 address, size_t size);
>   void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> -				     ioasid_t pasid, u64 address, size_t size);
> +				     ioasid_t pasid, u64 address, u64 last);

Using “addr_start/addr_last” or “start/last” would be more symmetrical 
and intuitive.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] iommu/amd: Simplify build_inv_address()
  2026-03-30  7:39   ` Wei Wang
@ 2026-03-30 12:05     ` Jason Gunthorpe
  2026-03-31  2:01       ` Wei Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-03-30 12:05 UTC (permalink / raw)
  To: Wei Wang
  Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon, patches

On Mon, Mar 30, 2026 at 03:39:16PM +0800, Wei Wang wrote:

> > @@ -1267,39 +1267,37 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
> >    */
> >   static inline u64 build_inv_address(u64 address, size_t size)
> >   {
> > -	u64 pages, end, msb_diff;
> > +	u64 last = address + size - 1;
> > +	unsigned int sz_lg2;
> 
> Would naming it shift (or order) be more conventional?

I find that confusing, shift of what exactly? It is size of the
invalidation encoded in log 2.

> > -	end = address + size - 1;
> > +	address &= GENMASK_U64(63, 12);
> > +	sz_lg2 = fls64(address ^ last);
> > +	if (sz_lg2 <= 12)
> 
> then could use "if (shift < PAGE_SHIFT)" here.

PAGE_SHIFT defines the kernel mm configuration, "12" is a hardware
constant from the iommu spec. They should not be intermixed.
 
> > +		return address;
> >   	/*
> > -	 * msb_diff would hold the index of the most significant bit that
> > -	 * flipped between the start and end.
> > +	 * Encode sz_lg2 according to Table 14: Example Page Size Encodings
> > +	 *
> > +	 * See "Note *":
> > +	 *   Address bits 51:32 can be used to encode page sizes greater
> > +	 *   that 4 Gbytes.
> 
> "than" (seems a spec typo)

Yeah, I'll leave it as is.
 
> > +	 * Which we take to mean that the highest page size has bit
> > +	 *  [51]=0, [50:12]=1
> > +	 * and that coding happens when sz_lg2 is 52. Fall back to full
> > +	 * invalidation if the size is too big.
> > +	 *
> >   	 */
> > -	msb_diff = fls64(end ^ address) - 1;
> > +	if (unlikely(sz_lg2 > 52))
> > +		return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
> > +		       CMD_INV_IOMMU_PAGES_SIZE_MASK;
> 
> 
> The spec mentions "Address bits 63:52 are zero-extended.", should we enforce
> it:
> 
> -#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS        0x7ffffffffffff000ULL
> +#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS        GENMASK_ULL(51, 12)
> 
> also for the returned "address"

2.4.3 specifically calls out 

 The IOMMU invalidates all translation
 information associated with the DomainID for both nested and guest levels when S=1, PDE=1,
 GN=0, and Address[63:12]=7_FFFF_FFFF_FFFFh.

I wouldn't change it.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] iommu/amd: Simplify build_inv_address()
  2026-03-30 12:05     ` Jason Gunthorpe
@ 2026-03-31  2:01       ` Wei Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Wei Wang @ 2026-03-31  2:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon, patches

On 3/30/26 8:05 PM, Jason Gunthorpe wrote:
> On Mon, Mar 30, 2026 at 03:39:16PM +0800, Wei Wang wrote:
> 
>>> @@ -1267,39 +1267,37 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
>>>     */
>>>    static inline u64 build_inv_address(u64 address, size_t size)
>>>    {
>>> -	u64 pages, end, msb_diff;
>>> +	u64 last = address + size - 1;
>>> +	unsigned int sz_lg2;
>>
>> Would naming it shift (or order) be more conventional?
> 
> I find that confusing, shift of what exactly? It is size of the
> invalidation encoded in log 2.

Just the number of bits to shift the value 1 to get the the invalidate 
size: 1 << shift = size.

I thought it's similar to PAGE_SHIFT - the number of bits to shift the 
value 1 to get PAGE_SIZE (though it's also used to shift an address to 
get a frame number).

It's not critical though, up to your preference.

>>
>> The spec mentions "Address bits 63:52 are zero-extended.", should we enforce
>> it:
>>
>> -#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS        0x7ffffffffffff000ULL
>> +#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS        GENMASK_ULL(51, 12)
>>
>> also for the returned "address"
> 
> 2.4.3 specifically calls out
> 
>   The IOMMU invalidates all translation
>   information associated with the DomainID for both nested and guest levels when S=1, PDE=1,
>   GN=0, and Address[63:12]=7_FFFF_FFFF_FFFFh.
> 

OK. The INVALIDATE_ALL case is specialized out. It seems the “Address 
bits 63:52 are zero‑extended” requirement in the spec applies only to 
the encoded addresses (i.e., the returned address), as stated in Table 14.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last
  2026-03-27 15:23 ` [PATCH 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
@ 2026-04-01  6:33   ` Wei Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Wei Wang @ 2026-04-01  6:33 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	Suravee Suthikulpanit, Will Deacon
  Cc: patches

On 3/27/26 11:23 PM, Jason Gunthorpe wrote:
> Finish clearing out the size/last/end switching by converting
> amd_iommu_domain_flush_pages() to use last-based logic.
> 
> This algorithm is simpler than the previous. Ultimately all this wants
> to do is select powers of two that are aligned to address and not
> longer than the distance to last.
> 
> The new version is fully safe for size = U64_MAX and last = U64_MAX.
> 
> Finally, the gather can be passed through natively without risking an
> overflow in (gather->end - gather->start + 1).
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/amd/amd_iommu.h |  2 +-
>   drivers/iommu/amd/iommu.c     | 44 +++++++++++++----------------------
>   2 files changed, 17 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index ef4ebecb003494..67d4daaaf6f0ce 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -88,7 +88,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
>    */
>   void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
>   void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> -				  u64 address, size_t size);
> +				  u64 address, u64 last);
>   void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
>   				     ioasid_t pasid, u64 address, u64 last);
>   
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index caab941b531bfe..a3773a145af811 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1762,12 +1762,13 @@ static void __domain_flush_pages(struct protection_domain *domain,
>   }
>   
>   void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> -				  u64 address, size_t size)
> +				  u64 address, u64 last)
>   {
>   	lockdep_assert_held(&domain->lock);
>   
> -	if (likely(!amd_iommu_np_cache)) {
> -		__domain_flush_pages(domain, address, address + size - 1);
> +	if (likely(!amd_iommu_np_cache) ||
> +	    unlikely(address == 0 && last == U64_MAX)) {
> +		__domain_flush_pages(domain, address, last);
>   
>   		/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
>   		domain_flush_complete(domain);
> @@ -1785,28 +1786,17 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
>   	 * between the natural alignment of the address that we flush and the
>   	 * greatest naturally aligned region that fits in the range.
>   	 */
> -	while (size != 0) {
> -		int addr_alignment = __ffs(address);
> -		int size_alignment = __fls(size);
> -		int min_alignment;
> -		size_t flush_size;
> +	while (address <= last) {
> +		unsigned int sz_lg2 = ilog2(last - address + 1);
> +		u64 flush_last;
>   
> -		/*
> -		 * size is always non-zero, but address might be zero, causing
> -		 * addr_alignment to be negative. As the casting of the
> -		 * argument in __ffs(address) to long might trim the high bits
> -		 * of the address on x86-32, cast to long when doing the check.
> -		 */
> -		if (likely((unsigned long)address != 0))
> -			min_alignment = min(addr_alignment, size_alignment);
> -		else
> -			min_alignment = size_alignment;
> +		if (likely(address))
> +			sz_lg2 = min_t(unsigned int, sz_lg2, __ffs64(address));
>   
> -		flush_size = 1ul << min_alignment;
> -
> -		__domain_flush_pages(domain, address, address + flush_size - 1);
> -		address += flush_size;
> -		size -= flush_size;
> +		flush_last = address + (1ULL << sz_lg2) - 1;
> +		__domain_flush_pages(domain, address, flush_last);
> +		if (check_add_overflow(flush_last, 1, &address))
> +			break;
>   	}
>   
>   	/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
> @@ -1816,8 +1806,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
>   /* Flush the whole IO/TLB for a given protection domain - including PDE */
>   static void amd_iommu_domain_flush_all(struct protection_domain *domain)
>   {
> -	amd_iommu_domain_flush_pages(domain, 0,
> -				     CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
> +	amd_iommu_domain_flush_pages(domain, 0, U64_MAX);
>   }
>   
>   void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> @@ -2617,7 +2606,7 @@ static int amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
>   		return 0;
>   
>   	spin_lock_irqsave(&domain->lock, flags);
> -	amd_iommu_domain_flush_pages(domain, iova, size);
> +	amd_iommu_domain_flush_pages(domain, iova, iova + size - 1);
>   	spin_unlock_irqrestore(&domain->lock, flags);
>   	return 0;
>   }
> @@ -2639,8 +2628,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&dom->lock, flags);
> -	amd_iommu_domain_flush_pages(dom, gather->start,
> -				     gather->end - gather->start + 1);
> +	amd_iommu_domain_flush_pages(dom, gather->start, gather->end);
>   	spin_unlock_irqrestore(&dom->lock, flags);
>   	iommu_put_pages_list(&gather->freelist);
>   }

Looks good to me.

Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
  2026-03-27 15:23 ` [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
@ 2026-04-01  6:42   ` Wei Wang
  2026-04-01 13:17     ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Wang @ 2026-04-01  6:42 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	Suravee Suthikulpanit, Will Deacon
  Cc: patches



On 3/27/26 11:23 PM, Jason Gunthorpe wrote:
> The spec in Table 14 defines the "Entire Cache" case as having the low
> 12 bits as zero. Indeed the command format doesn't even have the low
> 12 bits. Since there is only one user now, fix the constant to have 0
> in the low 12 bits instead of 1 and remove the masking.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/amd/amd_iommu_types.h | 2 +-
>   drivers/iommu/amd/iommu.c           | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index c685d3771436a2..c239c5fbe47461 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -221,7 +221,7 @@
>   #define PPR_STATUS_MASK			0xf
>   #define PPR_STATUS_SHIFT		12
>   
> -#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS	0x7fffffffffffffffULL
> +#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS	0x7ffffffffffff000ULL


Nit: Using GENMASK_ULL(62, 12) here might be easier to read

Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>


>   
>   /* macros and definitions for device table entries */
>   #define DEV_ENTRY_VALID         0x00
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a3773a145af811..4e057fbc20021e 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1287,7 +1287,7 @@ static inline u64 build_inv_address(u64 address, u64 last)
>   	 *
>   	 */
>   	if (unlikely(sz_lg2 > 52))
> -		return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
> +		return CMD_INV_IOMMU_ALL_PAGES_ADDRESS |
>   		       CMD_INV_IOMMU_PAGES_SIZE_MASK;
>   
>   	/*


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
  2026-03-27 15:23 ` [PATCH 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
@ 2026-04-01  8:51   ` Wei Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Wei Wang @ 2026-04-01  8:51 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
	Suravee Suthikulpanit, Will Deacon
  Cc: patches

On 3/27/26 11:23 PM, Jason Gunthorpe wrote:
> Now that AMD uses iommupt, it is easy to make use of the PDE bit. If
> the gather has no free list then no page directory entries were
> changed.
> 
> Pass GN/PDE through the invalidation call chain in a u32 flags field
> that is OR'd into data[2] and set it properly from the gather.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/amd/amd_iommu.h |  2 +-
>   drivers/iommu/amd/iommu.c     | 61 +++++++++++++++++++----------------
>   2 files changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index 67d4daaaf6f0ce..364878a923edd0 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -88,7 +88,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
>    */
>   void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
>   void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> -				  u64 address, u64 last);
> +				  u64 address, u64 last, u32 flags);
>   void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
>   				     ioasid_t pasid, u64 address, u64 last);
>   
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 4e057fbc20021e..95f7dc6cf67eaf 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1301,8 +1301,8 @@ static inline u64 build_inv_address(u64 address, u64 last)
>   }
>   
>   static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
> -				  u64 last, u16 domid,
> -				  ioasid_t pasid, bool gn)
> +				  u64 last, u16 domid, ioasid_t pasid,
> +				  u32 flags)
>   {
>   	u64 inv_address = build_inv_address(address, last);
>   
> @@ -1311,12 +1311,9 @@ static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
>   	cmd->data[1] |= domid;
>   	cmd->data[2]  = lower_32_bits(inv_address);
>   	cmd->data[3]  = upper_32_bits(inv_address);
> -	/* PDE bit - we want to flush everything, not only the PTEs */
> -	cmd->data[2] |= CMD_INV_IOMMU_PAGES_PDE_MASK;
> -	if (gn) {
> +	cmd->data[2] |= flags;
> +	if (flags & CMD_INV_IOMMU_PAGES_GN_MASK)
>   		cmd->data[0] |= pasid;
> -		cmd->data[2] |= CMD_INV_IOMMU_PAGES_GN_MASK;
> -	}
>   	CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
>   }
>   
> @@ -1523,7 +1520,8 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
>   	for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
>   		struct iommu_cmd cmd;
>   		build_inv_iommu_pages(&cmd, 0, U64_MAX,
> -				      dom_id, IOMMU_NO_PASID, false);
> +				      dom_id, IOMMU_NO_PASID,
> +				      CMD_INV_IOMMU_PAGES_PDE_MASK);
>   		iommu_queue_command(iommu, &cmd);
>   	}
>   
> @@ -1535,13 +1533,15 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
>   	struct iommu_cmd cmd;
>   
>   	build_inv_iommu_pages(&cmd, 0, U64_MAX,
> -			      dom_id, IOMMU_NO_PASID, false);
> +			      dom_id, IOMMU_NO_PASID,
> +			      CMD_INV_IOMMU_PAGES_PDE_MASK);
>   	iommu_queue_command(iommu, &cmd);
>   
>   	iommu_completion_wait(iommu);
>   }
>   
> -static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, u64 last)
> +static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom,
> +					 u64 address, u64 last, u32 flags)
>   {
>   	int ret = 0;
>   	struct amd_iommu_viommu *aviommu;
> @@ -1559,7 +1559,7 @@ static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 add
>   			pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
>   				 iommu->devid, gdom_info->hdom_id);
>   			build_inv_iommu_pages(&cmd, address, last, gdom_info->hdom_id,
> -					      IOMMU_NO_PASID, false);
> +					      IOMMU_NO_PASID, flags);
>   			ret |= iommu_queue_command(iommu, &cmd);
>   		}
>   		xa_unlock(&aviommu->gdomid_array);
> @@ -1674,7 +1674,7 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
>   }
>   
>   static int domain_flush_pages_v2(struct protection_domain *pdom,
> -				 u64 address, u64 last)
> +				 u64 address, u64 last, u32 flags)
>   {
>   	struct iommu_dev_data *dev_data;
>   	struct iommu_cmd cmd;
> @@ -1685,8 +1685,9 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
>   		struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
>   		u16 domid = dev_data->gcr3_info.domid;
>   
> -		build_inv_iommu_pages(&cmd, address, last,
> -				      domid, IOMMU_NO_PASID, true);
> +		build_inv_iommu_pages(&cmd, address, last, domid,
> +				      IOMMU_NO_PASID,
> +				      flags | CMD_INV_IOMMU_PAGES_GN_MASK);
>   
>   		ret |= iommu_queue_command(iommu, &cmd);
>   	}
> @@ -1695,7 +1696,7 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
>   }
>   
>   static int domain_flush_pages_v1(struct protection_domain *pdom,
> -				 u64 address, u64 last)
> +				 u64 address, u64 last, u32 flags)
>   {
>   	struct pdom_iommu_info *pdom_iommu_info;
>   	struct iommu_cmd cmd;
> @@ -1705,7 +1706,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
>   	lockdep_assert_held(&pdom->lock);
>   
>   	build_inv_iommu_pages(&cmd, address, last,
> -			      pdom->id, IOMMU_NO_PASID, false);
> +			      pdom->id, IOMMU_NO_PASID, flags);
>   
>   	xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) {
>   		/*
> @@ -1724,7 +1725,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
>   	 * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
>   	 */
>   	if (!list_empty(&pdom->viommu_list))
> -		ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, last);
> +		ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, last, flags);
>   
>   	return ret;
>   }
> @@ -1734,7 +1735,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
>    * It flushes range of PTEs of the domain.
>    */
>   static void __domain_flush_pages(struct protection_domain *domain,
> -				 u64 address, u64 last)
> +				 u64 address, u64 last, u32 flags)
>   {
>   	struct iommu_dev_data *dev_data;
>   	int ret = 0;
> @@ -1745,9 +1746,9 @@ static void __domain_flush_pages(struct protection_domain *domain,
>   
>   	if (pdom_is_v2_pgtbl_mode(domain)) {
>   		gn = true;
> -		ret = domain_flush_pages_v2(domain, address, last);
> +		ret = domain_flush_pages_v2(domain, address, last, flags);
>   	} else {
> -		ret = domain_flush_pages_v1(domain, address, last);
> +		ret = domain_flush_pages_v1(domain, address, last, flags);
>   	}
>   
>   	list_for_each_entry(dev_data, &domain->dev_list, list) {
> @@ -1762,13 +1763,13 @@ static void __domain_flush_pages(struct protection_domain *domain,
>   }
>   
>   void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> -				  u64 address, u64 last)
> +				  u64 address, u64 last, u32 flags)
>   {
>   	lockdep_assert_held(&domain->lock);
>   
>   	if (likely(!amd_iommu_np_cache) ||
>   	    unlikely(address == 0 && last == U64_MAX)) {
> -		__domain_flush_pages(domain, address, last);
> +		__domain_flush_pages(domain, address, last, flags);
>   
>   		/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
>   		domain_flush_complete(domain);
> @@ -1794,7 +1795,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
>   			sz_lg2 = min_t(unsigned int, sz_lg2, __ffs64(address));
>   
>   		flush_last = address + (1ULL << sz_lg2) - 1;
> -		__domain_flush_pages(domain, address, flush_last);
> +		__domain_flush_pages(domain, address, flush_last, flags);
>   		if (check_add_overflow(flush_last, 1, &address))
>   			break;
>   	}
> @@ -1806,7 +1807,8 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
>   /* Flush the whole IO/TLB for a given protection domain - including PDE */
>   static void amd_iommu_domain_flush_all(struct protection_domain *domain)
>   {
> -	amd_iommu_domain_flush_pages(domain, 0, U64_MAX);
> +	amd_iommu_domain_flush_pages(domain, 0, U64_MAX,
> +				     CMD_INV_IOMMU_PAGES_PDE_MASK);
>   }
>   
>   void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> @@ -1816,7 +1818,9 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
>   	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
>   
>   	build_inv_iommu_pages(&cmd, address, last,
> -			      dev_data->gcr3_info.domid, pasid, true);
> +			      dev_data->gcr3_info.domid, pasid,
> +			      CMD_INV_IOMMU_PAGES_GN_MASK |
> +			      CMD_INV_IOMMU_PAGES_PDE_MASK);
>   	iommu_queue_command(iommu, &cmd);
>   
>   	if (dev_data->ats_enabled)
> @@ -2606,7 +2610,8 @@ static int amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
>   		return 0;
>   
>   	spin_lock_irqsave(&domain->lock, flags);
> -	amd_iommu_domain_flush_pages(domain, iova, iova + size - 1);
> +	amd_iommu_domain_flush_pages(domain, iova, iova + size - 1,
> +				     CMD_INV_IOMMU_PAGES_PDE_MASK);
>   	spin_unlock_irqrestore(&domain->lock, flags);
>   	return 0;
>   }
> @@ -2628,7 +2633,9 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&dom->lock, flags);
> -	amd_iommu_domain_flush_pages(dom, gather->start, gather->end);
> +	amd_iommu_domain_flush_pages(dom, gather->start, gather->end,
> +				     iommu_pages_list_empty(&gather->freelist) ?
> +				     0 : CMD_INV_IOMMU_PAGES_PDE_MASK);
>   	spin_unlock_irqrestore(&dom->lock, flags);
>   	iommu_put_pages_list(&gather->freelist);
>   }


The logic for checking the gather freelist to conditionally set the PDE 
bit is a nice optimization.

Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
  2026-04-01  6:42   ` Wei Wang
@ 2026-04-01 13:17     ` Jason Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2026-04-01 13:17 UTC (permalink / raw)
  To: Wei Wang
  Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon, patches

On Wed, Apr 01, 2026 at 02:42:52PM +0800, Wei Wang wrote:

> > diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> > index c685d3771436a2..c239c5fbe47461 100644
> > --- a/drivers/iommu/amd/amd_iommu_types.h
> > +++ b/drivers/iommu/amd/amd_iommu_types.h
> > @@ -221,7 +221,7 @@
> >   #define PPR_STATUS_MASK			0xf
> >   #define PPR_STATUS_SHIFT		12
> > -#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS	0x7fffffffffffffffULL
> > +#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS	0x7ffffffffffff000ULL
> 
> 
> Nit: Using GENMASK_ULL(62, 12) here might be easier to read
>
> Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>

I left it like this because it is the raw value the spec has:

 2.4.3:
 
   The IOMMU invalidates all translation
   information associated with the DomainID for both nested and guest levels when S=1, PDE=1,
   GN=0, and Address[63:12]=7_FFFF_FFFF_FFFFh.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address()
  2026-03-30  8:03   ` Wei Wang
@ 2026-04-01 19:55     ` Jason Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2026-04-01 19:55 UTC (permalink / raw)
  To: Wei Wang
  Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
	Will Deacon, patches

On Mon, Mar 30, 2026 at 04:03:38PM +0800, Wei Wang wrote:
> On 3/27/26 11:23 PM, Jason Gunthorpe wrote:
> > This is the trivial call chain below amd_iommu_domain_flush_pages().
> > 
> > Cases that are doing a full invalidate will pass a last of U64_MAX.
> > 
> > This avoids converting between size and last, and type confusion with
> > size_t, unsigned long and u64 all being used in different places along
> > the driver's invalidation path. Consistently use u64 in the internals.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> >   drivers/iommu/amd/amd_iommu.h |  2 +-
> >   drivers/iommu/amd/iommu.c     | 56 +++++++++++++++++------------------
> >   drivers/iommu/amd/pasid.c     |  2 +-
> >   3 files changed, 29 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> > index 1342e764a5486d..ef4ebecb003494 100644
> > --- a/drivers/iommu/amd/amd_iommu.h
> > +++ b/drivers/iommu/amd/amd_iommu.h
> > @@ -90,7 +90,7 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
> >   void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> >   				  u64 address, size_t size);
> >   void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> > -				     ioasid_t pasid, u64 address, size_t size);
> > +				     ioasid_t pasid, u64 address, u64 last);
> 
> Using “addr_start/addr_last” or “start/last” would be more symmetrical and
> intuitive.

Okay, it doesn't make the diff look to bad!

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-04-01 19:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 15:23 [PATCH 0/5] Improve the invalidation path in AMD Jason Gunthorpe
2026-03-27 15:23 ` [PATCH 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
2026-03-30  7:39   ` Wei Wang
2026-03-30 12:05     ` Jason Gunthorpe
2026-03-31  2:01       ` Wei Wang
2026-03-27 15:23 ` [PATCH 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
2026-03-30  8:03   ` Wei Wang
2026-04-01 19:55     ` Jason Gunthorpe
2026-03-27 15:23 ` [PATCH 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
2026-04-01  6:33   ` Wei Wang
2026-03-27 15:23 ` [PATCH 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
2026-04-01  6:42   ` Wei Wang
2026-04-01 13:17     ` Jason Gunthorpe
2026-03-27 15:23 ` [PATCH 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
2026-04-01  8:51   ` Wei Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox