* [PATCH v2 0/5] Improve the invalidation path in AMD
@ 2026-04-03 12:26 Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-03 12:26 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon
Cc: patches, Wei Wang
Auditing this path reveals some unnecessary complexity and some
confusion around end/size/last and types that may cause issues at the
ULONG_MAX limit.
First, fix up the call chains so that the gather start/last is
retained faithfully throughout. This avoids converting to a
start/size, and avoids various type conversions along this path. Fix
everything so the theoretical cases like start=0 last=U64_MAX work
correctly.
Simplify all the calculations for alignment/etc to use trivial bit
maths instead of the complicated versions. These are all intended to
be no functional change.
Finally, add support for PDE=0, deriving it from the gather. PDE=0
allows the HW to avoid scanning the page directory cache and
supporting it should improve performance.
v2:
- Use start instead of address
v1: https://patch.msgid.link/r/0-v1-a7b77c25974e+115bf-iommupt_inv_amd_jgg@nvidia.com
Jason Gunthorpe (5):
iommu/amd: Simplify build_inv_address()
iommu/amd: Pass last in through to build_inv_address()
iommu/amd: Have amd_iommu_domain_flush_pages() use last
iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
drivers/iommu/amd/amd_iommu.h | 4 +-
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/iommu.c | 177 +++++++++++++---------------
drivers/iommu/amd/pasid.c | 2 +-
4 files changed, 88 insertions(+), 97 deletions(-)
base-commit: 2f8407f149b2763146903649fdebf2cc575c3ae3
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/5] iommu/amd: Simplify build_inv_address()
2026-04-03 12:26 [PATCH v2 0/5] Improve the invalidation path in AMD Jason Gunthorpe
@ 2026-04-03 12:26 ` Jason Gunthorpe
2026-04-06 13:18 ` Wei Wang
2026-04-03 12:26 ` [PATCH v2 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-03 12:26 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon
Cc: patches, Wei Wang
This function is doing more work than it needs to:
- iommu_num_pages() is pointless, the fls() is going to compute the
required page size already.
- It is easier to understand as sz_lg2, which is 12 if size is 4K,
than msb_diff which is 11 if size is 4K.
- Simplify the control flow to early exit on the out of range cases.
- Use the usual last instead of end to signify an inclusive last
address.
- Use GENMASK to compute the 1's mask.
- Use GENMASK to compute the address mask for the command layout,
not PAGE_MASK.
- Directly reference the spec language that defines the 52 bit
limit.
No functional change intended.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/amd/iommu.c | 50 +++++++++++++++++++--------------------
1 file changed, 24 insertions(+), 26 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 01171361f9bc22..e4a88879ca35d2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1268,39 +1268,37 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
*/
static inline u64 build_inv_address(u64 address, size_t size)
{
- u64 pages, end, msb_diff;
+ u64 last = address + size - 1;
+ unsigned int sz_lg2;
- pages = iommu_num_pages(address, size, PAGE_SIZE);
-
- if (pages == 1)
- return address & PAGE_MASK;
-
- end = address + size - 1;
+ address &= GENMASK_U64(63, 12);
+ sz_lg2 = fls64(address ^ last);
+ if (sz_lg2 <= 12)
+ return address;
/*
- * msb_diff would hold the index of the most significant bit that
- * flipped between the start and end.
+ * Encode sz_lg2 according to Table 14: Example Page Size Encodings
+ *
+ * See "Note *":
+ * Address bits 51:32 can be used to encode page sizes greater
+ * that 4 Gbytes.
+ * Which we take to mean that the highest page size has bit
+ * [51]=0, [50:12]=1
+ * and that coding happens when sz_lg2 is 52. Fall back to full
+ * invalidation if the size is too big.
+ *
*/
- msb_diff = fls64(end ^ address) - 1;
+ if (unlikely(sz_lg2 > 52))
+ return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
+ CMD_INV_IOMMU_PAGES_SIZE_MASK;
/*
- * Bits 63:52 are sign extended. If for some reason bit 51 is different
- * between the start and the end, invalidate everything.
+ * The sz_lg2 calculation with fls() ensures that:
+ * address & BIT(sz_lg2 - 1) == 0
+ * Therefore only the 1's need to be added. 8KB requires no 1's
*/
- if (unlikely(msb_diff > 51)) {
- address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
- } else {
- /*
- * The msb-bit must be clear on the address. Just set all the
- * lower bits.
- */
- address |= (1ull << msb_diff) - 1;
- }
-
- /* Clear bits 11:0 */
- address &= PAGE_MASK;
-
- /* Set the size bit - we flush more than one 4kb page */
+ if (sz_lg2 > 13)
+ address |= GENMASK_U64(sz_lg2 - 2, 12);
return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/5] iommu/amd: Pass last in through to build_inv_address()
2026-04-03 12:26 [PATCH v2 0/5] Improve the invalidation path in AMD Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
@ 2026-04-03 12:26 ` Jason Gunthorpe
2026-04-06 12:02 ` Wei Wang
2026-04-03 12:26 ` [PATCH v2 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-03 12:26 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon
Cc: patches, Wei Wang
This is the trivial call chain below amd_iommu_domain_flush_pages().
Cases that are doing a full invalidate will pass a last of U64_MAX.
This avoids converting between size and last, and type confusion with
size_t, unsigned long and u64 all being used in different places along
the driver's invalidation path. Consistently use u64 in the internals.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/amd/amd_iommu.h | 2 +-
drivers/iommu/amd/iommu.c | 71 +++++++++++++++++------------------
drivers/iommu/amd/pasid.c | 2 +-
3 files changed, 36 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1342e764a5486d..831d457c8dad13 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -90,7 +90,7 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
void amd_iommu_domain_flush_pages(struct protection_domain *domain,
u64 address, size_t size);
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
- ioasid_t pasid, u64 address, size_t size);
+ ioasid_t pasid, u64 start, u64 last);
#ifdef CONFIG_IRQ_REMAP
int amd_iommu_create_irq_domain(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e4a88879ca35d2..9ea40ab8398048 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1266,9 +1266,8 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
* Builds an invalidation address which is suitable for one page or multiple
* pages. Sets the size bit (S) as needed is more than one page is flushed.
*/
-static inline u64 build_inv_address(u64 address, size_t size)
+static inline u64 build_inv_address(u64 address, u64 last)
{
- u64 last = address + size - 1;
unsigned int sz_lg2;
address &= GENMASK_U64(63, 12);
@@ -1302,11 +1301,10 @@ static inline u64 build_inv_address(u64 address, size_t size)
return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
}
-static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
- size_t size, u16 domid,
- ioasid_t pasid, bool gn)
+static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 start, u64 last,
+ u16 domid, ioasid_t pasid, bool gn)
{
- u64 inv_address = build_inv_address(address, size);
+ u64 inv_address = build_inv_address(start, last);
memset(cmd, 0, sizeof(*cmd));
@@ -1323,10 +1321,9 @@ static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
}
static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
- u64 address, size_t size,
- ioasid_t pasid, bool gn)
+ u64 start, u64 last, ioasid_t pasid, bool gn)
{
- u64 inv_address = build_inv_address(address, size);
+ u64 inv_address = build_inv_address(start, last);
memset(cmd, 0, sizeof(*cmd));
@@ -1524,7 +1521,7 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
struct iommu_cmd cmd;
- build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+ build_inv_iommu_pages(&cmd, 0, U64_MAX,
dom_id, IOMMU_NO_PASID, false);
iommu_queue_command(iommu, &cmd);
}
@@ -1536,14 +1533,15 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
{
struct iommu_cmd cmd;
- build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+ build_inv_iommu_pages(&cmd, 0, U64_MAX,
dom_id, IOMMU_NO_PASID, false);
iommu_queue_command(iommu, &cmd);
iommu_completion_wait(iommu);
}
-static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
+static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom,
+ u64 start, u64 last)
{
int ret = 0;
struct amd_iommu_viommu *aviommu;
@@ -1560,7 +1558,7 @@ static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 add
pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
iommu->devid, gdom_info->hdom_id);
- build_inv_iommu_pages(&cmd, address, size, gdom_info->hdom_id,
+ build_inv_iommu_pages(&cmd, start, last, gdom_info->hdom_id,
IOMMU_NO_PASID, false);
ret |= iommu_queue_command(iommu, &cmd);
}
@@ -1616,15 +1614,15 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
/*
* Command send function for flushing on-device TLB
*/
-static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 address,
- size_t size, ioasid_t pasid, bool gn)
+static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 start,
+ u64 last, ioasid_t pasid, bool gn)
{
struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
struct iommu_cmd cmd;
int qdep = dev_data->ats_qdep;
- build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, address,
- size, pasid, gn);
+ build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, start,
+ last, pasid, gn);
return iommu_queue_command(iommu, &cmd);
}
@@ -1668,15 +1666,15 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
if (dev_data->ats_enabled) {
/* Invalidate the entire contents of an IOTLB */
- ret = device_flush_iotlb(dev_data, 0, ~0UL,
+ ret = device_flush_iotlb(dev_data, 0, U64_MAX,
IOMMU_NO_PASID, false);
}
return ret;
}
-static int domain_flush_pages_v2(struct protection_domain *pdom,
- u64 address, size_t size)
+static int domain_flush_pages_v2(struct protection_domain *pdom, u64 start,
+ u64 last)
{
struct iommu_dev_data *dev_data;
struct iommu_cmd cmd;
@@ -1687,7 +1685,7 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
u16 domid = dev_data->gcr3_info.domid;
- build_inv_iommu_pages(&cmd, address, size,
+ build_inv_iommu_pages(&cmd, start, last,
domid, IOMMU_NO_PASID, true);
ret |= iommu_queue_command(iommu, &cmd);
@@ -1696,8 +1694,8 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
return ret;
}
-static int domain_flush_pages_v1(struct protection_domain *pdom,
- u64 address, size_t size)
+static int domain_flush_pages_v1(struct protection_domain *pdom, u64 start,
+ u64 last)
{
struct pdom_iommu_info *pdom_iommu_info;
struct iommu_cmd cmd;
@@ -1706,7 +1704,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
lockdep_assert_held(&pdom->lock);
- build_inv_iommu_pages(&cmd, address, size,
+ build_inv_iommu_pages(&cmd, start, last,
pdom->id, IOMMU_NO_PASID, false);
xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) {
@@ -1726,7 +1724,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
* See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
*/
if (!list_empty(&pdom->viommu_list))
- ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, size);
+ ret |= iommu_flush_pages_v1_hdom_ids(pdom, start, last);
return ret;
}
@@ -1735,8 +1733,8 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
* TLB invalidation function which is called from the mapping functions.
* It flushes range of PTEs of the domain.
*/
-static void __domain_flush_pages(struct protection_domain *domain,
- u64 address, size_t size)
+static void __domain_flush_pages(struct protection_domain *domain, u64 start,
+ u64 last)
{
struct iommu_dev_data *dev_data;
int ret = 0;
@@ -1747,9 +1745,9 @@ static void __domain_flush_pages(struct protection_domain *domain,
if (pdom_is_v2_pgtbl_mode(domain)) {
gn = true;
- ret = domain_flush_pages_v2(domain, address, size);
+ ret = domain_flush_pages_v2(domain, start, last);
} else {
- ret = domain_flush_pages_v1(domain, address, size);
+ ret = domain_flush_pages_v1(domain, start, last);
}
list_for_each_entry(dev_data, &domain->dev_list, list) {
@@ -1757,7 +1755,7 @@ static void __domain_flush_pages(struct protection_domain *domain,
if (!dev_data->ats_enabled)
continue;
- ret |= device_flush_iotlb(dev_data, address, size, pasid, gn);
+ ret |= device_flush_iotlb(dev_data, start, last, pasid, gn);
}
WARN_ON(ret);
@@ -1769,7 +1767,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
lockdep_assert_held(&domain->lock);
if (likely(!amd_iommu_np_cache)) {
- __domain_flush_pages(domain, address, size);
+ __domain_flush_pages(domain, address, address + size - 1);
/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
domain_flush_complete(domain);
@@ -1806,7 +1804,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
flush_size = 1ul << min_alignment;
- __domain_flush_pages(domain, address, flush_size);
+ __domain_flush_pages(domain, address, address + flush_size - 1);
address += flush_size;
size -= flush_size;
}
@@ -1823,17 +1821,17 @@ static void amd_iommu_domain_flush_all(struct protection_domain *domain)
}
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
- ioasid_t pasid, u64 address, size_t size)
+ ioasid_t pasid, u64 start, u64 last)
{
struct iommu_cmd cmd;
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
- build_inv_iommu_pages(&cmd, address, size,
+ build_inv_iommu_pages(&cmd, start, last,
dev_data->gcr3_info.domid, pasid, true);
iommu_queue_command(iommu, &cmd);
if (dev_data->ats_enabled)
- device_flush_iotlb(dev_data, address, size, pasid, true);
+ device_flush_iotlb(dev_data, start, last, pasid, true);
iommu_completion_wait(iommu);
}
@@ -1841,8 +1839,7 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
static void dev_flush_pasid_all(struct iommu_dev_data *dev_data,
ioasid_t pasid)
{
- amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0,
- CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
+ amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0, U64_MAX);
}
int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
diff --git a/drivers/iommu/amd/pasid.c b/drivers/iommu/amd/pasid.c
index 67eace9809f16a..d708c6532480a4 100644
--- a/drivers/iommu/amd/pasid.c
+++ b/drivers/iommu/amd/pasid.c
@@ -71,7 +71,7 @@ static void sva_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
for_each_pdom_dev_data(pdom_dev_data, sva_pdom) {
amd_iommu_dev_flush_pasid_pages(pdom_dev_data->dev_data,
pdom_dev_data->pasid,
- start, end - start);
+ start, end - 1);
}
spin_unlock_irqrestore(&sva_pdom->lock, flags);
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last
2026-04-03 12:26 [PATCH v2 0/5] Improve the invalidation path in AMD Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
@ 2026-04-03 12:26 ` Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
4 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-03 12:26 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon
Cc: patches, Wei Wang
Finish clearing out the size/last/end switching by converting
amd_iommu_domain_flush_pages() to use last-based logic.
This algorithm is simpler than the previous. Ultimately all this wants
to do is select powers of two that are aligned to address and not
longer than the distance to last.
The new version is fully safe for size = U64_MAX and last = U64_MAX.
Finally, the gather can be passed through natively without risking an
overflow in (gather->end - gather->start + 1).
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/amd/amd_iommu.h | 2 +-
drivers/iommu/amd/iommu.c | 44 +++++++++++++----------------------
2 files changed, 17 insertions(+), 29 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 831d457c8dad13..92410b9cf544a9 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -88,7 +88,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
*/
void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
void amd_iommu_domain_flush_pages(struct protection_domain *domain,
- u64 address, size_t size);
+ u64 address, u64 last);
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
ioasid_t pasid, u64 start, u64 last);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9ea40ab8398048..a058017eef2777 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1762,12 +1762,13 @@ static void __domain_flush_pages(struct protection_domain *domain, u64 start,
}
void amd_iommu_domain_flush_pages(struct protection_domain *domain,
- u64 address, size_t size)
+ u64 address, u64 last)
{
lockdep_assert_held(&domain->lock);
- if (likely(!amd_iommu_np_cache)) {
- __domain_flush_pages(domain, address, address + size - 1);
+ if (likely(!amd_iommu_np_cache) ||
+ unlikely(address == 0 && last == U64_MAX)) {
+ __domain_flush_pages(domain, address, last);
/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
domain_flush_complete(domain);
@@ -1785,28 +1786,17 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
* between the natural alignment of the address that we flush and the
* greatest naturally aligned region that fits in the range.
*/
- while (size != 0) {
- int addr_alignment = __ffs(address);
- int size_alignment = __fls(size);
- int min_alignment;
- size_t flush_size;
+ while (address <= last) {
+ unsigned int sz_lg2 = ilog2(last - address + 1);
+ u64 flush_last;
- /*
- * size is always non-zero, but address might be zero, causing
- * addr_alignment to be negative. As the casting of the
- * argument in __ffs(address) to long might trim the high bits
- * of the address on x86-32, cast to long when doing the check.
- */
- if (likely((unsigned long)address != 0))
- min_alignment = min(addr_alignment, size_alignment);
- else
- min_alignment = size_alignment;
+ if (likely(address))
+ sz_lg2 = min_t(unsigned int, sz_lg2, __ffs64(address));
- flush_size = 1ul << min_alignment;
-
- __domain_flush_pages(domain, address, address + flush_size - 1);
- address += flush_size;
- size -= flush_size;
+ flush_last = address + (1ULL << sz_lg2) - 1;
+ __domain_flush_pages(domain, address, flush_last);
+ if (check_add_overflow(flush_last, 1, &address))
+ break;
}
/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
@@ -1816,8 +1806,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
/* Flush the whole IO/TLB for a given protection domain - including PDE */
static void amd_iommu_domain_flush_all(struct protection_domain *domain)
{
- amd_iommu_domain_flush_pages(domain, 0,
- CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
+ amd_iommu_domain_flush_pages(domain, 0, U64_MAX);
}
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
@@ -2617,7 +2606,7 @@ static int amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
return 0;
spin_lock_irqsave(&domain->lock, flags);
- amd_iommu_domain_flush_pages(domain, iova, size);
+ amd_iommu_domain_flush_pages(domain, iova, iova + size - 1);
spin_unlock_irqrestore(&domain->lock, flags);
return 0;
}
@@ -2639,8 +2628,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
unsigned long flags;
spin_lock_irqsave(&dom->lock, flags);
- amd_iommu_domain_flush_pages(dom, gather->start,
- gather->end - gather->start + 1);
+ amd_iommu_domain_flush_pages(dom, gather->start, gather->end);
spin_unlock_irqrestore(&dom->lock, flags);
iommu_put_pages_list(&gather->freelist);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
2026-04-03 12:26 [PATCH v2 0/5] Improve the invalidation path in AMD Jason Gunthorpe
` (2 preceding siblings ...)
2026-04-03 12:26 ` [PATCH v2 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
@ 2026-04-03 12:26 ` Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
4 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-03 12:26 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon
Cc: patches, Wei Wang
The spec in Table 14 defines the "Entire Cache" case as having the low
12 bits as zero. Indeed the command format doesn't even have the low
12 bits. Since there is only one user now, fix the constant to have 0
in the low 12 bits instead of 1 and remove the masking.
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index c685d3771436a2..c239c5fbe47461 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -221,7 +221,7 @@
#define PPR_STATUS_MASK 0xf
#define PPR_STATUS_SHIFT 12
-#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS 0x7fffffffffffffffULL
+#define CMD_INV_IOMMU_ALL_PAGES_ADDRESS 0x7ffffffffffff000ULL
/* macros and definitions for device table entries */
#define DEV_ENTRY_VALID 0x00
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a058017eef2777..e3842b23481c99 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1288,7 +1288,7 @@ static inline u64 build_inv_address(u64 address, u64 last)
*
*/
if (unlikely(sz_lg2 > 52))
- return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
+ return CMD_INV_IOMMU_ALL_PAGES_ADDRESS |
CMD_INV_IOMMU_PAGES_SIZE_MASK;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
2026-04-03 12:26 [PATCH v2 0/5] Improve the invalidation path in AMD Jason Gunthorpe
` (3 preceding siblings ...)
2026-04-03 12:26 ` [PATCH v2 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
@ 2026-04-03 12:26 ` Jason Gunthorpe
4 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-03 12:26 UTC (permalink / raw)
To: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon
Cc: patches, Wei Wang
Now that AMD uses iommupt, it is easy to make use of the PDE bit. If
the gather has no free list then no page directory entries were
changed.
Pass GN/PDE through the invalidation call chain in a u32 flags field
that is OR'd into data[2] and set it properly from the gather.
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/amd/amd_iommu.h | 2 +-
drivers/iommu/amd/iommu.c | 78 +++++++++++++++++++----------------
2 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 92410b9cf544a9..1a6d9cae7b23b9 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -88,7 +88,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
*/
void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
void amd_iommu_domain_flush_pages(struct protection_domain *domain,
- u64 address, u64 last);
+ u64 address, u64 last, u32 flags);
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
ioasid_t pasid, u64 start, u64 last);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e3842b23481c99..c804ea49624a1f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1301,8 +1301,9 @@ static inline u64 build_inv_address(u64 address, u64 last)
return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
}
-static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 start, u64 last,
- u16 domid, ioasid_t pasid, bool gn)
+static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 start,
+ u64 last, u16 domid, ioasid_t pasid,
+ u32 flags)
{
u64 inv_address = build_inv_address(start, last);
@@ -1311,19 +1312,17 @@ static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 start, u64 last,
cmd->data[1] |= domid;
cmd->data[2] = lower_32_bits(inv_address);
cmd->data[3] = upper_32_bits(inv_address);
- /* PDE bit - we want to flush everything, not only the PTEs */
- cmd->data[2] |= CMD_INV_IOMMU_PAGES_PDE_MASK;
- if (gn) {
+ cmd->data[2] |= flags;
+ if (flags & CMD_INV_IOMMU_PAGES_GN_MASK)
cmd->data[0] |= pasid;
- cmd->data[2] |= CMD_INV_IOMMU_PAGES_GN_MASK;
- }
CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
}
static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
- u64 start, u64 last, ioasid_t pasid, bool gn)
+ u64 address, u64 last,
+ ioasid_t pasid, bool gn)
{
- u64 inv_address = build_inv_address(start, last);
+ u64 inv_address = build_inv_address(address, last);
memset(cmd, 0, sizeof(*cmd));
@@ -1522,7 +1521,8 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
struct iommu_cmd cmd;
build_inv_iommu_pages(&cmd, 0, U64_MAX,
- dom_id, IOMMU_NO_PASID, false);
+ dom_id, IOMMU_NO_PASID,
+ CMD_INV_IOMMU_PAGES_PDE_MASK);
iommu_queue_command(iommu, &cmd);
}
@@ -1534,14 +1534,15 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
struct iommu_cmd cmd;
build_inv_iommu_pages(&cmd, 0, U64_MAX,
- dom_id, IOMMU_NO_PASID, false);
+ dom_id, IOMMU_NO_PASID,
+ CMD_INV_IOMMU_PAGES_PDE_MASK);
iommu_queue_command(iommu, &cmd);
iommu_completion_wait(iommu);
}
static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom,
- u64 start, u64 last)
+ u64 start, u64 last, u32 flags)
{
int ret = 0;
struct amd_iommu_viommu *aviommu;
@@ -1559,7 +1560,7 @@ static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom,
pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
iommu->devid, gdom_info->hdom_id);
build_inv_iommu_pages(&cmd, start, last, gdom_info->hdom_id,
- IOMMU_NO_PASID, false);
+ IOMMU_NO_PASID, flags);
ret |= iommu_queue_command(iommu, &cmd);
}
xa_unlock(&aviommu->gdomid_array);
@@ -1614,14 +1615,14 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
/*
* Command send function for flushing on-device TLB
*/
-static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 start,
+static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 address,
u64 last, ioasid_t pasid, bool gn)
{
struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
struct iommu_cmd cmd;
int qdep = dev_data->ats_qdep;
- build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, start,
+ build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, address,
last, pasid, gn);
return iommu_queue_command(iommu, &cmd);
@@ -1673,8 +1674,8 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
return ret;
}
-static int domain_flush_pages_v2(struct protection_domain *pdom, u64 start,
- u64 last)
+static int domain_flush_pages_v2(struct protection_domain *pdom,
+ u64 start, u64 last, u32 flags)
{
struct iommu_dev_data *dev_data;
struct iommu_cmd cmd;
@@ -1685,8 +1686,9 @@ static int domain_flush_pages_v2(struct protection_domain *pdom, u64 start,
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
u16 domid = dev_data->gcr3_info.domid;
- build_inv_iommu_pages(&cmd, start, last,
- domid, IOMMU_NO_PASID, true);
+ build_inv_iommu_pages(&cmd, start, last, domid,
+ IOMMU_NO_PASID,
+ flags | CMD_INV_IOMMU_PAGES_GN_MASK);
ret |= iommu_queue_command(iommu, &cmd);
}
@@ -1694,8 +1696,8 @@ static int domain_flush_pages_v2(struct protection_domain *pdom, u64 start,
return ret;
}
-static int domain_flush_pages_v1(struct protection_domain *pdom, u64 start,
- u64 last)
+static int domain_flush_pages_v1(struct protection_domain *pdom,
+ u64 start, u64 last, u32 flags)
{
struct pdom_iommu_info *pdom_iommu_info;
struct iommu_cmd cmd;
@@ -1705,7 +1707,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom, u64 start,
lockdep_assert_held(&pdom->lock);
build_inv_iommu_pages(&cmd, start, last,
- pdom->id, IOMMU_NO_PASID, false);
+ pdom->id, IOMMU_NO_PASID, flags);
xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) {
/*
@@ -1724,7 +1726,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom, u64 start,
* See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
*/
if (!list_empty(&pdom->viommu_list))
- ret |= iommu_flush_pages_v1_hdom_ids(pdom, start, last);
+ ret |= iommu_flush_pages_v1_hdom_ids(pdom, start, last, flags);
return ret;
}
@@ -1733,8 +1735,8 @@ static int domain_flush_pages_v1(struct protection_domain *pdom, u64 start,
* TLB invalidation function which is called from the mapping functions.
* It flushes range of PTEs of the domain.
*/
-static void __domain_flush_pages(struct protection_domain *domain, u64 start,
- u64 last)
+static void __domain_flush_pages(struct protection_domain *domain,
+ u64 address, u64 last, u32 flags)
{
struct iommu_dev_data *dev_data;
int ret = 0;
@@ -1745,9 +1747,9 @@ static void __domain_flush_pages(struct protection_domain *domain, u64 start,
if (pdom_is_v2_pgtbl_mode(domain)) {
gn = true;
- ret = domain_flush_pages_v2(domain, start, last);
+ ret = domain_flush_pages_v2(domain, address, last, flags);
} else {
- ret = domain_flush_pages_v1(domain, start, last);
+ ret = domain_flush_pages_v1(domain, address, last, flags);
}
list_for_each_entry(dev_data, &domain->dev_list, list) {
@@ -1755,20 +1757,20 @@ static void __domain_flush_pages(struct protection_domain *domain, u64 start,
if (!dev_data->ats_enabled)
continue;
- ret |= device_flush_iotlb(dev_data, start, last, pasid, gn);
+ ret |= device_flush_iotlb(dev_data, address, last, pasid, gn);
}
WARN_ON(ret);
}
void amd_iommu_domain_flush_pages(struct protection_domain *domain,
- u64 address, u64 last)
+ u64 address, u64 last, u32 flags)
{
lockdep_assert_held(&domain->lock);
if (likely(!amd_iommu_np_cache) ||
unlikely(address == 0 && last == U64_MAX)) {
- __domain_flush_pages(domain, address, last);
+ __domain_flush_pages(domain, address, last, flags);
/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
domain_flush_complete(domain);
@@ -1794,7 +1796,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
sz_lg2 = min_t(unsigned int, sz_lg2, __ffs64(address));
flush_last = address + (1ULL << sz_lg2) - 1;
- __domain_flush_pages(domain, address, flush_last);
+ __domain_flush_pages(domain, address, flush_last, flags);
if (check_add_overflow(flush_last, 1, &address))
break;
}
@@ -1806,7 +1808,8 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
/* Flush the whole IO/TLB for a given protection domain - including PDE */
static void amd_iommu_domain_flush_all(struct protection_domain *domain)
{
- amd_iommu_domain_flush_pages(domain, 0, U64_MAX);
+ amd_iommu_domain_flush_pages(domain, 0, U64_MAX,
+ CMD_INV_IOMMU_PAGES_PDE_MASK);
}
void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
@@ -1816,7 +1819,9 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
build_inv_iommu_pages(&cmd, start, last,
- dev_data->gcr3_info.domid, pasid, true);
+ dev_data->gcr3_info.domid, pasid,
+ CMD_INV_IOMMU_PAGES_GN_MASK |
+ CMD_INV_IOMMU_PAGES_PDE_MASK);
iommu_queue_command(iommu, &cmd);
if (dev_data->ats_enabled)
@@ -2606,7 +2611,8 @@ static int amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
return 0;
spin_lock_irqsave(&domain->lock, flags);
- amd_iommu_domain_flush_pages(domain, iova, iova + size - 1);
+ amd_iommu_domain_flush_pages(domain, iova, iova + size - 1,
+ CMD_INV_IOMMU_PAGES_PDE_MASK);
spin_unlock_irqrestore(&domain->lock, flags);
return 0;
}
@@ -2628,7 +2634,9 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
unsigned long flags;
spin_lock_irqsave(&dom->lock, flags);
- amd_iommu_domain_flush_pages(dom, gather->start, gather->end);
+ amd_iommu_domain_flush_pages(dom, gather->start, gather->end,
+ iommu_pages_list_empty(&gather->freelist) ?
+ 0 : CMD_INV_IOMMU_PAGES_PDE_MASK);
spin_unlock_irqrestore(&dom->lock, flags);
iommu_put_pages_list(&gather->freelist);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/5] iommu/amd: Pass last in through to build_inv_address()
2026-04-03 12:26 ` [PATCH v2 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
@ 2026-04-06 12:02 ` Wei Wang
0 siblings, 0 replies; 11+ messages in thread
From: Wei Wang @ 2026-04-06 12:02 UTC (permalink / raw)
To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
Suravee Suthikulpanit, Will Deacon
Cc: patches
On 4/3/26 8:26 PM, Jason Gunthorpe wrote:
> This is the trivial call chain below amd_iommu_domain_flush_pages().
>
> Cases that are doing a full invalidate will pass a last of U64_MAX.
>
> This avoids converting between size and last, and type confusion with
> size_t, unsigned long and u64 all being used in different places along
> the driver's invalidation path. Consistently use u64 in the internals.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 2 +-
> drivers/iommu/amd/iommu.c | 71 +++++++++++++++++------------------
> drivers/iommu/amd/pasid.c | 2 +-
> 3 files changed, 36 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index 1342e764a5486d..831d457c8dad13 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -90,7 +90,7 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
> void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> u64 address, size_t size);
> void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> - ioasid_t pasid, u64 address, size_t size);
> + ioasid_t pasid, u64 start, u64 last);
>
> #ifdef CONFIG_IRQ_REMAP
> int amd_iommu_create_irq_domain(struct amd_iommu *iommu);
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index e4a88879ca35d2..9ea40ab8398048 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1266,9 +1266,8 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
> * Builds an invalidation address which is suitable for one page or multiple
> * pages. Sets the size bit (S) as needed is more than one page is flushed.
> */
> -static inline u64 build_inv_address(u64 address, size_t size)
> +static inline u64 build_inv_address(u64 address, u64 last)
> {
> - u64 last = address + size - 1;
> unsigned int sz_lg2;
>
> address &= GENMASK_U64(63, 12);
> @@ -1302,11 +1301,10 @@ static inline u64 build_inv_address(u64 address, size_t size)
> return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
> }
>
> -static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
> - size_t size, u16 domid,
> - ioasid_t pasid, bool gn)
> +static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 start, u64 last,
> + u16 domid, ioasid_t pasid, bool gn)
> {
> - u64 inv_address = build_inv_address(address, size);
> + u64 inv_address = build_inv_address(start, last);
>
> memset(cmd, 0, sizeof(*cmd));
>
> @@ -1323,10 +1321,9 @@ static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
> }
>
> static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
> - u64 address, size_t size,
> - ioasid_t pasid, bool gn)
> + u64 start, u64 last, ioasid_t pasid, bool gn)
> {
> - u64 inv_address = build_inv_address(address, size);
> + u64 inv_address = build_inv_address(start, last);
>
> memset(cmd, 0, sizeof(*cmd));
>
> @@ -1524,7 +1521,7 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
>
> for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
> struct iommu_cmd cmd;
> - build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
> + build_inv_iommu_pages(&cmd, 0, U64_MAX,
> dom_id, IOMMU_NO_PASID, false);
> iommu_queue_command(iommu, &cmd);
> }
> @@ -1536,14 +1533,15 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
> {
> struct iommu_cmd cmd;
>
> - build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
> + build_inv_iommu_pages(&cmd, 0, U64_MAX,
> dom_id, IOMMU_NO_PASID, false);
> iommu_queue_command(iommu, &cmd);
>
> iommu_completion_wait(iommu);
> }
>
> -static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
> +static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom,
> + u64 start, u64 last)
> {
> int ret = 0;
> struct amd_iommu_viommu *aviommu;
> @@ -1560,7 +1558,7 @@ static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 add
>
> pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
> iommu->devid, gdom_info->hdom_id);
> - build_inv_iommu_pages(&cmd, address, size, gdom_info->hdom_id,
> + build_inv_iommu_pages(&cmd, start, last, gdom_info->hdom_id,
> IOMMU_NO_PASID, false);
> ret |= iommu_queue_command(iommu, &cmd);
> }
> @@ -1616,15 +1614,15 @@ void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
> /*
> * Command send function for flushing on-device TLB
> */
> -static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 address,
> - size_t size, ioasid_t pasid, bool gn)
> +static int device_flush_iotlb(struct iommu_dev_data *dev_data, u64 start,
> + u64 last, ioasid_t pasid, bool gn)
> {
> struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> struct iommu_cmd cmd;
> int qdep = dev_data->ats_qdep;
>
> - build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, address,
> - size, pasid, gn);
> + build_inv_iotlb_pages(&cmd, dev_data->devid, qdep, start,
> + last, pasid, gn);
>
> return iommu_queue_command(iommu, &cmd);
> }
> @@ -1668,15 +1666,15 @@ static int device_flush_dte(struct iommu_dev_data *dev_data)
>
> if (dev_data->ats_enabled) {
> /* Invalidate the entire contents of an IOTLB */
> - ret = device_flush_iotlb(dev_data, 0, ~0UL,
> + ret = device_flush_iotlb(dev_data, 0, U64_MAX,
> IOMMU_NO_PASID, false);
> }
>
> return ret;
> }
>
> -static int domain_flush_pages_v2(struct protection_domain *pdom,
> - u64 address, size_t size)
> +static int domain_flush_pages_v2(struct protection_domain *pdom, u64 start,
> + u64 last)
> {
> struct iommu_dev_data *dev_data;
> struct iommu_cmd cmd;
> @@ -1687,7 +1685,7 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
> struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
> u16 domid = dev_data->gcr3_info.domid;
>
> - build_inv_iommu_pages(&cmd, address, size,
> + build_inv_iommu_pages(&cmd, start, last,
> domid, IOMMU_NO_PASID, true);
>
> ret |= iommu_queue_command(iommu, &cmd);
> @@ -1696,8 +1694,8 @@ static int domain_flush_pages_v2(struct protection_domain *pdom,
> return ret;
> }
>
> -static int domain_flush_pages_v1(struct protection_domain *pdom,
> - u64 address, size_t size)
> +static int domain_flush_pages_v1(struct protection_domain *pdom, u64 start,
> + u64 last)
> {
> struct pdom_iommu_info *pdom_iommu_info;
> struct iommu_cmd cmd;
> @@ -1706,7 +1704,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
>
> lockdep_assert_held(&pdom->lock);
>
> - build_inv_iommu_pages(&cmd, address, size,
> + build_inv_iommu_pages(&cmd, start, last,
> pdom->id, IOMMU_NO_PASID, false);
>
> xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) {
> @@ -1726,7 +1724,7 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
> * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
> */
> if (!list_empty(&pdom->viommu_list))
> - ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, size);
> + ret |= iommu_flush_pages_v1_hdom_ids(pdom, start, last);
>
> return ret;
> }
> @@ -1735,8 +1733,8 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
> * TLB invalidation function which is called from the mapping functions.
> * It flushes range of PTEs of the domain.
> */
> -static void __domain_flush_pages(struct protection_domain *domain,
> - u64 address, size_t size)
> +static void __domain_flush_pages(struct protection_domain *domain, u64 start,
> + u64 last)
> {
> struct iommu_dev_data *dev_data;
> int ret = 0;
> @@ -1747,9 +1745,9 @@ static void __domain_flush_pages(struct protection_domain *domain,
>
> if (pdom_is_v2_pgtbl_mode(domain)) {
> gn = true;
> - ret = domain_flush_pages_v2(domain, address, size);
> + ret = domain_flush_pages_v2(domain, start, last);
> } else {
> - ret = domain_flush_pages_v1(domain, address, size);
> + ret = domain_flush_pages_v1(domain, start, last);
> }
>
> list_for_each_entry(dev_data, &domain->dev_list, list) {
> @@ -1757,7 +1755,7 @@ static void __domain_flush_pages(struct protection_domain *domain,
> if (!dev_data->ats_enabled)
> continue;
>
> - ret |= device_flush_iotlb(dev_data, address, size, pasid, gn);
> + ret |= device_flush_iotlb(dev_data, start, last, pasid, gn);
> }
>
> WARN_ON(ret);
> @@ -1769,7 +1767,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
> lockdep_assert_held(&domain->lock);
>
> if (likely(!amd_iommu_np_cache)) {
> - __domain_flush_pages(domain, address, size);
> + __domain_flush_pages(domain, address, address + size - 1);
>
> /* Wait until IOMMU TLB and all device IOTLB flushes are complete */
> domain_flush_complete(domain);
> @@ -1806,7 +1804,7 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
>
> flush_size = 1ul << min_alignment;
>
> - __domain_flush_pages(domain, address, flush_size);
> + __domain_flush_pages(domain, address, address + flush_size - 1);
> address += flush_size;
> size -= flush_size;
> }
> @@ -1823,17 +1821,17 @@ static void amd_iommu_domain_flush_all(struct protection_domain *domain)
> }
>
> void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> - ioasid_t pasid, u64 address, size_t size)
> + ioasid_t pasid, u64 start, u64 last)
> {
> struct iommu_cmd cmd;
> struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
>
> - build_inv_iommu_pages(&cmd, address, size,
> + build_inv_iommu_pages(&cmd, start, last,
> dev_data->gcr3_info.domid, pasid, true);
> iommu_queue_command(iommu, &cmd);
>
> if (dev_data->ats_enabled)
> - device_flush_iotlb(dev_data, address, size, pasid, true);
> + device_flush_iotlb(dev_data, start, last, pasid, true);
>
> iommu_completion_wait(iommu);
> }
> @@ -1841,8 +1839,7 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
> static void dev_flush_pasid_all(struct iommu_dev_data *dev_data,
> ioasid_t pasid)
> {
> - amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0,
> - CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
> + amd_iommu_dev_flush_pasid_pages(dev_data, pasid, 0, U64_MAX);
> }
>
> int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
> diff --git a/drivers/iommu/amd/pasid.c b/drivers/iommu/amd/pasid.c
> index 67eace9809f16a..d708c6532480a4 100644
> --- a/drivers/iommu/amd/pasid.c
> +++ b/drivers/iommu/amd/pasid.c
> @@ -71,7 +71,7 @@ static void sva_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
> for_each_pdom_dev_data(pdom_dev_data, sva_pdom) {
> amd_iommu_dev_flush_pasid_pages(pdom_dev_data->dev_data,
> pdom_dev_data->pasid,
> - start, end - start);
> + start, end - 1);
> }
>
> spin_unlock_irqrestore(&sva_pdom->lock, flags);
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/5] iommu/amd: Simplify build_inv_address()
2026-04-03 12:26 ` [PATCH v2 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
@ 2026-04-06 13:18 ` Wei Wang
2026-04-06 13:30 ` Jason Gunthorpe
0 siblings, 1 reply; 11+ messages in thread
From: Wei Wang @ 2026-04-06 13:18 UTC (permalink / raw)
To: Jason Gunthorpe, iommu, Joerg Roedel, Robin Murphy,
Suravee Suthikulpanit, Will Deacon
Cc: patches
On 4/3/26 8:26 PM, Jason Gunthorpe wrote:
> This function is doing more work than it needs to:
>
> - iommu_num_pages() is pointless, the fls() is going to compute the
> required page size already.
>
> - It is easier to understand as sz_lg2, which is 12 if size is 4K,
> than msb_diff which is 11 if size is 4K.
>
> - Simplify the control flow to early exit on the out of range cases.
>
> - Use the usual last instead of end to signify an inclusive last
> address.
>
> - Use GENMASK to compute the 1's mask.
>
> - Use GENMASK to compute the address mask for the command layout,
> not PAGE_MASK.
>
> - Directly reference the spec language that defines the 52 bit
> limit.
>
> No functional change intended.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/amd/iommu.c | 50 +++++++++++++++++++--------------------
> 1 file changed, 24 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 01171361f9bc22..e4a88879ca35d2 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1268,39 +1268,37 @@ static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
> */
> static inline u64 build_inv_address(u64 address, size_t size)
> {
> - u64 pages, end, msb_diff;
> + u64 last = address + size - 1;
> + unsigned int sz_lg2;
>
> - pages = iommu_num_pages(address, size, PAGE_SIZE);
> -
> - if (pages == 1)
> - return address & PAGE_MASK;
> -
> - end = address + size - 1;
> + address &= GENMASK_U64(63, 12);
> + sz_lg2 = fls64(address ^ last);
> + if (sz_lg2 <= 12)
> + return address;
>
> /*
> - * msb_diff would hold the index of the most significant bit that
> - * flipped between the start and end.
> + * Encode sz_lg2 according to Table 14: Example Page Size Encodings
> + *
> + * See "Note *":
> + * Address bits 51:32 can be used to encode page sizes greater
> + * that 4 Gbytes.
> + * Which we take to mean that the highest page size has bit
> + * [51]=0, [50:12]=1
> + * and that coding happens when sz_lg2 is 52. Fall back to full
> + * invalidation if the size is too big.
> + *
> */
> - msb_diff = fls64(end ^ address) - 1;
> + if (unlikely(sz_lg2 > 52))
> + return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
> + CMD_INV_IOMMU_PAGES_SIZE_MASK;
>
> /*
> - * Bits 63:52 are sign extended. If for some reason bit 51 is different
> - * between the start and the end, invalidate everything.
> + * The sz_lg2 calculation with fls() ensures that:
> + * address & BIT(sz_lg2 - 1) == 0
> + * Therefore only the 1's need to be added. 8KB requires no 1's
> */
> - if (unlikely(msb_diff > 51)) {
> - address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
> - } else {
> - /*
> - * The msb-bit must be clear on the address. Just set all the
> - * lower bits.
> - */
> - address |= (1ull << msb_diff) - 1;
> - }
> -
> - /* Clear bits 11:0 */
> - address &= PAGE_MASK;
> -
> - /* Set the size bit - we flush more than one 4kb page */
> + if (sz_lg2 > 13)
> + address |= GENMASK_U64(sz_lg2 - 2, 12);
> return address | CMD_INV_IOMMU_PAGES_SIZE_MASK;
> }
>
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/5] iommu/amd: Simplify build_inv_address()
2026-04-06 13:18 ` Wei Wang
@ 2026-04-06 13:30 ` Jason Gunthorpe
2026-04-07 1:15 ` Wei Wang
0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-06 13:30 UTC (permalink / raw)
To: Wei Wang
Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon, patches
On Mon, Apr 06, 2026 at 09:18:19PM +0800, Wei Wang wrote:
> > - msb_diff = fls64(end ^ address) - 1;
> > + if (unlikely(sz_lg2 > 52))
> > + return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
> > + CMD_INV_IOMMU_PAGES_SIZE_MASK;
I was looking at Ankit's series and thinking this is slightly better as:
+ if (unlikely(sz_lg2 > min(52, BITS_PER_LONG - 1)))
So a ULONG_MAX gather will trigger full invalidation on 32 bit as
well, though it shouldn't be necessary it seems a tish cleaner and
matches how vt-d ended up
It is probably too late for this cycle, I'll repost it after the merge
window.
Thanks,
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/5] iommu/amd: Simplify build_inv_address()
2026-04-06 13:30 ` Jason Gunthorpe
@ 2026-04-07 1:15 ` Wei Wang
2026-04-07 1:26 ` Jason Gunthorpe
0 siblings, 1 reply; 11+ messages in thread
From: Wei Wang @ 2026-04-07 1:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon, patches
On 4/6/26 9:30 PM, Jason Gunthorpe wrote:
> On Mon, Apr 06, 2026 at 09:18:19PM +0800, Wei Wang wrote:
>
>>> - msb_diff = fls64(end ^ address) - 1;
>>> + if (unlikely(sz_lg2 > 52))
>>> + return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
>>> + CMD_INV_IOMMU_PAGES_SIZE_MASK;
>
> I was looking at Ankit's series and thinking this is slightly better as:
>
> + if (unlikely(sz_lg2 > min(52, BITS_PER_LONG - 1)))
>
> So a ULONG_MAX gather will trigger full invalidation on 32 bit as
> well, though it shouldn't be necessary it seems a tish cleaner and
> matches how vt-d ended up
>
I see the value in matching how VT‑d handles this. I’m slightly
concerned that adding the BITS_PER_LONG check might imply that AMD‑IOMMU
supports 32‑bit kernels.
Unlike VT‑d, the AMD IOMMU driver doesn’t build on 32‑bit at all. From
Kconfig:
config AMD_IOMMU
depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
config INTEL_IOMMU
depends on PCI_MSI && ACPI && X86
Given that the 32‑bit case is unreachable for AMD‑IOMMU, keeping the
existing check might actually be clearer?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/5] iommu/amd: Simplify build_inv_address()
2026-04-07 1:15 ` Wei Wang
@ 2026-04-07 1:26 ` Jason Gunthorpe
0 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-04-07 1:26 UTC (permalink / raw)
To: Wei Wang
Cc: iommu, Joerg Roedel, Robin Murphy, Suravee Suthikulpanit,
Will Deacon, patches
On Tue, Apr 07, 2026 at 09:15:40AM +0800, Wei Wang wrote:
> On 4/6/26 9:30 PM, Jason Gunthorpe wrote:
> > On Mon, Apr 06, 2026 at 09:18:19PM +0800, Wei Wang wrote:
> >
> > > > - msb_diff = fls64(end ^ address) - 1;
> > > > + if (unlikely(sz_lg2 > 52))
> > > > + return (CMD_INV_IOMMU_ALL_PAGES_ADDRESS & PAGE_MASK) |
> > > > + CMD_INV_IOMMU_PAGES_SIZE_MASK;
> >
> > I was looking at Ankit's series and thinking this is slightly better as:
> >
> > + if (unlikely(sz_lg2 > min(52, BITS_PER_LONG - 1)))
> >
> > So a ULONG_MAX gather will trigger full invalidation on 32 bit as
> > well, though it shouldn't be necessary it seems a tish cleaner and
> > matches how vt-d ended up
> >
>
> I see the value in matching how VT‑d handles this. I’m slightly concerned
> that adding the BITS_PER_LONG check might imply that AMD‑IOMMU supports
> 32‑bit kernels.
>
> Unlike VT‑d, the AMD IOMMU driver doesn’t build on 32‑bit at all. From
> Kconfig:
>
> config AMD_IOMMU
> depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
>
> config INTEL_IOMMU
> depends on PCI_MSI && ACPI && X86
>
> Given that the 32‑bit case is unreachable for AMD‑IOMMU, keeping the
> existing check might actually be clearer?
Ah, I didn't remember that, it makes sense. Let's leave it then
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-07 1:26 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03 12:26 [PATCH v2 0/5] Improve the invalidation path in AMD Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 1/5] iommu/amd: Simplify build_inv_address() Jason Gunthorpe
2026-04-06 13:18 ` Wei Wang
2026-04-06 13:30 ` Jason Gunthorpe
2026-04-07 1:15 ` Wei Wang
2026-04-07 1:26 ` Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 2/5] iommu/amd: Pass last in through to build_inv_address() Jason Gunthorpe
2026-04-06 12:02 ` Wei Wang
2026-04-03 12:26 ` [PATCH v2 3/5] iommu/amd: Have amd_iommu_domain_flush_pages() use last Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 4/5] iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec Jason Gunthorpe
2026-04-03 12:26 ` [PATCH v2 5/5] iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox