public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] [PULL REQUEST] Intel IOMMU updates for v7.1
@ 2026-04-07  6:45 Lu Baolu
  2026-04-07  6:45 ` [PATCH v2 1/1] iommu/vt-d: Simplify calculate_psi_aligned_address() Lu Baolu
  0 siblings, 1 reply; 2+ messages in thread
From: Lu Baolu @ 2026-04-07  6:45 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Jason Gunthorpe, iommu, linux-kernel

Hi Joerg,

The following remaining change is ready for v7.1-rc1. It aims to:

- Simplify calculate_psi_aligned_address()

This patch was originally included in the v1 pull request but was
removed due to an issue identified during review. This has now been
fixed, and the patch is ready for iommu/vt-d. Please consider it for
inclusion.

Best regards,
baolu

Jason Gunthorpe (1):
  iommu/vt-d: Simplify calculate_psi_aligned_address()

 drivers/iommu/intel/cache.c | 49 ++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH v2 1/1] iommu/vt-d: Simplify calculate_psi_aligned_address()
  2026-04-07  6:45 [PATCH v2 0/1] [PULL REQUEST] Intel IOMMU updates for v7.1 Lu Baolu
@ 2026-04-07  6:45 ` Lu Baolu
  0 siblings, 0 replies; 2+ messages in thread
From: Lu Baolu @ 2026-04-07  6:45 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Jason Gunthorpe, iommu, linux-kernel

From: Jason Gunthorpe <jgg@nvidia.com>

This is doing far too much math for the simple task of finding a
power of 2 that fully spans the given range. Use fls directly on
the xor which computes the common binary prefix.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v2-895748900b39+5303-iommupt_inv_vtd_jgg@nvidia.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/cache.c | 49 ++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index be8410f0e841..fdc88817709f 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -254,37 +254,29 @@ void cache_tag_unassign_domain(struct dmar_domain *domain,
 }
 
 static unsigned long calculate_psi_aligned_address(unsigned long start,
-						   unsigned long end,
-						   unsigned long *_mask)
+						   unsigned long last,
+						   unsigned long *size_order)
 {
-	unsigned long pages = aligned_nrpages(start, end - start + 1);
-	unsigned long aligned_pages = __roundup_pow_of_two(pages);
-	unsigned long bitmask = aligned_pages - 1;
-	unsigned long mask = ilog2(aligned_pages);
-	unsigned long pfn = IOVA_PFN(start);
-
-	/*
-	 * PSI masks the low order bits of the base address. If the
-	 * address isn't aligned to the mask, then compute a mask value
-	 * needed to ensure the target range is flushed.
-	 */
-	if (unlikely(bitmask & pfn)) {
-		unsigned long end_pfn = pfn + pages - 1, shared_bits;
+	unsigned int sz_lg2;
 
+	/* Compute a sz_lg2 that spans start and last */
+	start &= GENMASK(BITS_PER_LONG - 1, VTD_PAGE_SHIFT);
+	sz_lg2 = fls_long(start ^ last);
+	if (sz_lg2 <= 12) {
+		*size_order = 0;
+		return start;
+	}
+	if (unlikely(sz_lg2 >= BITS_PER_LONG)) {
 		/*
-		 * Since end_pfn <= pfn + bitmask, the only way bits
-		 * higher than bitmask can differ in pfn and end_pfn is
-		 * by carrying. This means after masking out bitmask,
-		 * high bits starting with the first set bit in
-		 * shared_bits are all equal in both pfn and end_pfn.
+		 * MAX_AGAW_PFN_WIDTH triggers full invalidation in all
+		 * downstream users.
 		 */
-		shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
-		mask = shared_bits ? __ffs(shared_bits) : MAX_AGAW_PFN_WIDTH;
+		*size_order = MAX_AGAW_PFN_WIDTH;
+		return 0;
 	}
 
-	*_mask = mask;
-
-	return ALIGN_DOWN(start, VTD_PAGE_SIZE << mask);
+	*size_order = sz_lg2 - VTD_PAGE_SHIFT;
+	return start & GENMASK(BITS_PER_LONG - 1, sz_lg2);
 }
 
 static void qi_batch_flush_descs(struct intel_iommu *iommu, struct qi_batch *batch)
@@ -441,12 +433,7 @@ void cache_tag_flush_range(struct dmar_domain *domain, unsigned long start,
 	struct cache_tag *tag;
 	unsigned long flags;
 
-	if (start == 0 && end == ULONG_MAX) {
-		addr = 0;
-		mask = MAX_AGAW_PFN_WIDTH;
-	} else {
-		addr = calculate_psi_aligned_address(start, end, &mask);
-	}
+	addr = calculate_psi_aligned_address(start, end, &mask);
 
 	spin_lock_irqsave(&domain->cache_lock, flags);
 	list_for_each_entry(tag, &domain->cache_tags, node) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-07  6:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07  6:45 [PATCH v2 0/1] [PULL REQUEST] Intel IOMMU updates for v7.1 Lu Baolu
2026-04-07  6:45 ` [PATCH v2 1/1] iommu/vt-d: Simplify calculate_psi_aligned_address() Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox