patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt
@ 2025-06-12 14:31 Jason Gunthorpe
  2025-06-12 14:31 ` [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

This is similar to a series I sent for AMD.

Lightly reorganize some of the page table related code so there is greater
isolation between the first and second stage logic. iommupt will introduce
distinct types and code flows for first and second stages, having them all
combined together is difficult.

Splitting actually makes things generally more understandable anyhow as
they do share only a little code. Most of the registers and logic are in
fact different. That was being obscured by small helper functions that
combine the first and second stage information.

This would replace some of the patches recently sent by Wei.

The full branch including the iommput conversion is here:

    https://github.com/jgunthorpe/linux/commits/iommu_pt_vtd

Jason Gunthorpe (7):
  iommu/vt-d: Lift the __pa to
    domain_setup_first_level/intel_svm_set_dev_pasid()
  iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()
  iommu/vt-d: Do not wipe out the page table NID when devices detach
  iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
  iommu/vt-d: Create unique domain ops for each stage
  iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
  iommu/vt-d: Split paging_domain_compatible()

 drivers/iommu/intel/cache.c  |   5 +-
 drivers/iommu/intel/iommu.c  | 326 +++++++++++++++++++++--------------
 drivers/iommu/intel/iommu.h  |   9 +-
 drivers/iommu/intel/nested.c |   3 +-
 drivers/iommu/intel/pasid.c  |  17 +-
 drivers/iommu/intel/pasid.h  |  11 +-
 drivers/iommu/intel/svm.c    |   3 +-
 7 files changed, 224 insertions(+), 150 deletions(-)


base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
-- 
2.43.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  2025-06-13  9:31   ` Tian, Kevin
  2025-06-12 14:31 ` [PATCH v2 2/7] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

Pass the phys_addr_t down through the call chain from the top instead of
passing a pgd_t * KVA. This moves the __pa() into
domain_setup_first_level() which is the first function to obtain the pgd
from the IOMMU page table in this call chain.

The SVA flow is also adjusted to get the pa of the mm->pgd.

iommput will move the __pa() into iommupt code, it never shares the KVA of
the page table with the driver.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c | 15 +++++++--------
 drivers/iommu/intel/iommu.h |  7 +++----
 drivers/iommu/intel/pasid.c | 17 +++++++++--------
 drivers/iommu/intel/pasid.h | 11 +++++------
 drivers/iommu/intel/svm.c   |  2 +-
 5 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7aa3932251b2fd..b42bb091088ff5 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1736,15 +1736,14 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
 	intel_context_flush_no_pasid(info, context, did);
 }
 
-int __domain_setup_first_level(struct intel_iommu *iommu,
-			       struct device *dev, ioasid_t pasid,
-			       u16 did, pgd_t *pgd, int flags,
-			       struct iommu_domain *old)
+int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+			       ioasid_t pasid, u16 did, phys_addr_t fsptptr,
+			       int flags, struct iommu_domain *old)
 {
 	if (!old)
-		return intel_pasid_setup_first_level(iommu, dev, pgd,
-						     pasid, did, flags);
-	return intel_pasid_replace_first_level(iommu, dev, pgd, pasid, did,
+		return intel_pasid_setup_first_level(iommu, dev, fsptptr, pasid,
+						     did, flags);
+	return intel_pasid_replace_first_level(iommu, dev, fsptptr, pasid, did,
 					       iommu_domain_did(old, iommu),
 					       flags);
 }
@@ -1793,7 +1792,7 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
 
 	return __domain_setup_first_level(iommu, dev, pasid,
 					  domain_id_iommu(domain, iommu),
-					  (pgd_t *)pgd, flags, old);
+					  __pa(pgd), flags, old);
 }
 
 static int dmar_domain_attach_device(struct dmar_domain *domain,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 3ddbcc603de23b..ca6843973d4393 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1252,10 +1252,9 @@ domain_add_dev_pasid(struct iommu_domain *domain,
 void domain_remove_dev_pasid(struct iommu_domain *domain,
 			     struct device *dev, ioasid_t pasid);
 
-int __domain_setup_first_level(struct intel_iommu *iommu,
-			       struct device *dev, ioasid_t pasid,
-			       u16 did, pgd_t *pgd, int flags,
-			       struct iommu_domain *old);
+int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+			       ioasid_t pasid, u16 did, phys_addr_t fsptptr,
+			       int flags, struct iommu_domain *old);
 
 int dmar_ir_support(void);
 
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index ac67a056b6c868..52f678975da745 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -348,14 +348,15 @@ static void intel_pasid_flush_present(struct intel_iommu *iommu,
  */
 static void pasid_pte_config_first_level(struct intel_iommu *iommu,
 					 struct pasid_entry *pte,
-					 pgd_t *pgd, u16 did, int flags)
+					 phys_addr_t fsptptr, u16 did,
+					 int flags)
 {
 	lockdep_assert_held(&iommu->lock);
 
 	pasid_clear_entry(pte);
 
 	/* Setup the first level page table pointer: */
-	pasid_set_flptr(pte, (u64)__pa(pgd));
+	pasid_set_flptr(pte, fsptptr);
 
 	if (flags & PASID_FLAG_FL5LP)
 		pasid_set_flpm(pte, 1);
@@ -372,9 +373,9 @@ static void pasid_pte_config_first_level(struct intel_iommu *iommu,
 	pasid_set_present(pte);
 }
 
-int intel_pasid_setup_first_level(struct intel_iommu *iommu,
-				  struct device *dev, pgd_t *pgd,
-				  u32 pasid, u16 did, int flags)
+int intel_pasid_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+				  phys_addr_t fsptptr, u32 pasid, u16 did,
+				  int flags)
 {
 	struct pasid_entry *pte;
 
@@ -402,7 +403,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 		return -EBUSY;
 	}
 
-	pasid_pte_config_first_level(iommu, pte, pgd, did, flags);
+	pasid_pte_config_first_level(iommu, pte, fsptptr, did, flags);
 
 	spin_unlock(&iommu->lock);
 
@@ -412,7 +413,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 }
 
 int intel_pasid_replace_first_level(struct intel_iommu *iommu,
-				    struct device *dev, pgd_t *pgd,
+				    struct device *dev, phys_addr_t fsptptr,
 				    u32 pasid, u16 did, u16 old_did,
 				    int flags)
 {
@@ -430,7 +431,7 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
 		return -EINVAL;
 	}
 
-	pasid_pte_config_first_level(iommu, &new_pte, pgd, did, flags);
+	pasid_pte_config_first_level(iommu, &new_pte, fsptptr, did, flags);
 
 	spin_lock(&iommu->lock);
 	pte = intel_pasid_get_entry(dev, pasid);
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index fd0fd1a0df84cc..a771a77d4239c4 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -288,9 +288,9 @@ extern unsigned int intel_pasid_max_id;
 int intel_pasid_alloc_table(struct device *dev);
 void intel_pasid_free_table(struct device *dev);
 struct pasid_table *intel_pasid_get_table(struct device *dev);
-int intel_pasid_setup_first_level(struct intel_iommu *iommu,
-				  struct device *dev, pgd_t *pgd,
-				  u32 pasid, u16 did, int flags);
+int intel_pasid_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+				  phys_addr_t fsptptr, u32 pasid, u16 did,
+				  int flags);
 int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
@@ -302,9 +302,8 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
 			     u32 pasid, struct dmar_domain *domain);
 int intel_pasid_replace_first_level(struct intel_iommu *iommu,
-				    struct device *dev, pgd_t *pgd,
-				    u32 pasid, u16 did, u16 old_did,
-				    int flags);
+				    struct device *dev, phys_addr_t fsptptr,
+				    u32 pasid, u16 did, u16 old_did, int flags);
 int intel_pasid_replace_second_level(struct intel_iommu *iommu,
 				     struct dmar_domain *domain,
 				     struct device *dev, u16 old_did,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index f3da596410b5e5..8c0bed36c58777 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -171,7 +171,7 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 	/* Setup the pasid table: */
 	sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
 	ret = __domain_setup_first_level(iommu, dev, pasid,
-					 FLPT_DEFAULT_DID, mm->pgd,
+					 FLPT_DEFAULT_DID, __pa(mm->pgd),
 					 sflags, old);
 	if (ret)
 		goto out_unwind_iopf;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/7] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
  2025-06-12 14:31 ` [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  2025-06-13  9:32   ` Tian, Kevin
  2025-06-12 14:31 ` [PATCH v2 3/7] iommu/vt-d: Do not wipe out the page table NID when devices detach Jason Gunthorpe
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

It has only one caller, no need for two functions.

Correct the WARN_ON() error handling to leak the entire page table if the
HW is still referencing it so we don't UAF during WARN_ON recovery.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c | 38 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b42bb091088ff5..7119f4be1b1393 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1396,23 +1396,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 	}
 }
 
-static void domain_exit(struct dmar_domain *domain)
-{
-	if (domain->pgd) {
-		struct iommu_pages_list freelist =
-			IOMMU_PAGES_LIST_INIT(freelist);
-
-		domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist);
-		iommu_put_pages_list(&freelist);
-	}
-
-	if (WARN_ON(!list_empty(&domain->devices)))
-		return;
-
-	kfree(domain->qi_batch);
-	kfree(domain);
-}
-
 /*
  * For kdump cases, old valid entries may be cached due to the
  * in-flight DMA and copied pgtable, but there is no unmapping
@@ -3392,9 +3375,24 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 
-	WARN_ON(dmar_domain->nested_parent &&
-		!list_empty(&dmar_domain->s1_domains));
-	domain_exit(dmar_domain);
+	if (WARN_ON(dmar_domain->nested_parent &&
+		!list_empty(&dmar_domain->s1_domains)))
+		return;
+
+	if (WARN_ON(!list_empty(&dmar_domain->devices)))
+		return;
+
+	if (dmar_domain->pgd) {
+		struct iommu_pages_list freelist =
+			IOMMU_PAGES_LIST_INIT(freelist);
+
+		domain_unmap(dmar_domain, 0, DOMAIN_MAX_PFN(dmar_domain->gaw),
+			     &freelist);
+		iommu_put_pages_list(&freelist);
+	}
+
+	kfree(dmar_domain->qi_batch);
+	kfree(dmar_domain);
 }
 
 int paging_domain_compatible(struct iommu_domain *domain, struct device *dev)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/7] iommu/vt-d: Do not wipe out the page table NID when devices detach
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
  2025-06-12 14:31 ` [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
  2025-06-12 14:31 ` [PATCH v2 2/7] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  2025-06-13  9:33   ` Tian, Kevin
  2025-06-12 14:31 ` [PATCH v2 4/7] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

The NID is used to control which NUMA node memory for the page table is
allocated it from. It should be a permanent property of the page table
when it was allocated and not change during attach/detach of devices.

Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7119f4be1b1393..00079c3e49f078 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1391,7 +1391,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 	if (--info->refcnt == 0) {
 		ida_free(&iommu->domain_ida, info->did);
 		xa_erase(&domain->iommu_array, iommu->seq_id);
-		domain->nid = NUMA_NO_NODE;
 		kfree(info);
 	}
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/7] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2025-06-12 14:31 ` [PATCH v2 3/7] iommu/vt-d: Do not wipe out the page table NID when devices detach Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  2025-06-13  9:35   ` Tian, Kevin
  2025-06-12 14:31 ` [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage Jason Gunthorpe
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

Create stage specific functions that check the stage specific conditions
if each stage can be supported.

Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence
until one does not return EOPNOTSUPP and prefer to use the first stage if
available and suitable for the requested flags.

Move second stage only operations like nested_parent and dirty_tracking
into the second stage function for clarity.

Move initialization of the iommu_domain members into paging_domain_alloc().

Drop initialization of domain->owner as the callers all do it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c | 98 +++++++++++++++++++++----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 00079c3e49f078..243e77d5a06aac 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3267,10 +3267,15 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 	spin_lock_init(&domain->lock);
 	spin_lock_init(&domain->cache_lock);
 	xa_init(&domain->iommu_array);
+	INIT_LIST_HEAD(&domain->s1_domains);
+	spin_lock_init(&domain->s1_lock);
 
 	domain->nid = dev_to_node(dev);
 	domain->use_first_level = first_stage;
 
+	domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
+	domain->domain.ops = intel_iommu_ops.default_domain_ops;
+
 	/* calculate the address width */
 	addr_width = agaw_to_width(iommu->agaw);
 	if (addr_width > cap_mgaw(iommu->cap))
@@ -3312,62 +3317,73 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 }
 
 static struct iommu_domain *
-intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
-				      const struct iommu_user_data *user_data)
+intel_iommu_domain_alloc_first_stage(struct device *dev,
+				     struct intel_iommu *iommu, u32 flags)
+{
+	struct dmar_domain *dmar_domain;
+
+	if (flags & (~IOMMU_HWPT_ALLOC_PASID))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/* Only SL is available in legacy mode */
+	if (!sm_supported(iommu) || !ecap_flts(iommu->ecap))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	dmar_domain = paging_domain_alloc(dev, true);
+	if (IS_ERR(dmar_domain))
+		return ERR_CAST(dmar_domain);
+	return &dmar_domain->domain;
+}
+
+static struct iommu_domain *
+intel_iommu_domain_alloc_second_stage(struct device *dev,
+				      struct intel_iommu *iommu, u32 flags)
 {
-	struct device_domain_info *info = dev_iommu_priv_get(dev);
-	bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
-	bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
-	struct intel_iommu *iommu = info->iommu;
 	struct dmar_domain *dmar_domain;
-	struct iommu_domain *domain;
-	bool first_stage;
 
 	if (flags &
 	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
 	       IOMMU_HWPT_ALLOC_PASID)))
 		return ERR_PTR(-EOPNOTSUPP);
-	if (nested_parent && !nested_supported(iommu))
-		return ERR_PTR(-EOPNOTSUPP);
-	if (user_data || (dirty_tracking && !ssads_supported(iommu)))
+
+	if (((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+	     !nested_supported(iommu)) ||
+	    ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+	     !ssads_supported(iommu)))
 		return ERR_PTR(-EOPNOTSUPP);
 
-	/*
-	 * Always allocate the guest compatible page table unless
-	 * IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING
-	 * is specified.
-	 */
-	if (nested_parent || dirty_tracking) {
-		if (!sm_supported(iommu) || !ecap_slts(iommu->ecap))
-			return ERR_PTR(-EOPNOTSUPP);
-		first_stage = false;
-	} else {
-		first_stage = first_level_by_default(iommu);
-	}
+	/* Legacy mode always supports second stage */
+	if (sm_supported(iommu) && !ecap_slts(iommu->ecap))
+		return ERR_PTR(-EOPNOTSUPP);
 
-	dmar_domain = paging_domain_alloc(dev, first_stage);
+	dmar_domain = paging_domain_alloc(dev, false);
 	if (IS_ERR(dmar_domain))
 		return ERR_CAST(dmar_domain);
-	domain = &dmar_domain->domain;
-	domain->type = IOMMU_DOMAIN_UNMANAGED;
-	domain->owner = &intel_iommu_ops;
-	domain->ops = intel_iommu_ops.default_domain_ops;
 
-	if (nested_parent) {
-		dmar_domain->nested_parent = true;
-		INIT_LIST_HEAD(&dmar_domain->s1_domains);
-		spin_lock_init(&dmar_domain->s1_lock);
-	}
+	dmar_domain->nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
 
-	if (dirty_tracking) {
-		if (dmar_domain->use_first_level) {
-			iommu_domain_free(domain);
-			return ERR_PTR(-EOPNOTSUPP);
-		}
-		domain->dirty_ops = &intel_dirty_ops;
-	}
+	if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
+		dmar_domain->domain.dirty_ops = &intel_dirty_ops;
 
-	return domain;
+	return &dmar_domain->domain;
+}
+
+static struct iommu_domain *
+intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
+				      const struct iommu_user_data *user_data)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct iommu_domain *domain;
+
+	if (user_data)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/* Prefer first stage if possible by default. */
+	domain = intel_iommu_domain_alloc_first_stage(dev, iommu, flags);
+	if (domain != ERR_PTR(-EOPNOTSUPP))
+		return domain;
+	return intel_iommu_domain_alloc_second_stage(dev, iommu, flags);
 }
 
 static void intel_iommu_domain_free(struct iommu_domain *domain)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2025-06-12 14:31 ` [PATCH v2 4/7] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  2025-06-13  9:38   ` Tian, Kevin
  2025-06-12 14:31 ` [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
  2025-06-12 14:31 ` [PATCH v2 7/7] iommu/vt-d: Split paging_domain_compatible() Jason Gunthorpe
  6 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

Use the domain ops pointer to tell what kind of domain it is instead of
the internal use_first_level indication. This also protects against
wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they
should not be.

The only remaining uses of use_first_level outside the paging domain are in
paging_domain_compatible() and intel_iommu_enforce_cache_coherency().

Thus, remove the useless sets of use_first_level in
intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of
the unique ops for these domain types ever reference it on their call
chains.

Add a WARN_ON() check in domain_context_mapping_one() as it only works
with second stage.

This is preparation for iommupt which will have different ops for each of
the stages.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/cache.c  |  5 ++--
 drivers/iommu/intel/iommu.c  | 58 +++++++++++++++++++++++++-----------
 drivers/iommu/intel/iommu.h  |  2 ++
 drivers/iommu/intel/nested.c |  3 +-
 drivers/iommu/intel/svm.c    |  1 -
 5 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index fc35cba5914532..29b8af6487512c 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -371,7 +371,7 @@ static void cache_tag_flush_iotlb(struct dmar_domain *domain, struct cache_tag *
 	struct intel_iommu *iommu = tag->iommu;
 	u64 type = DMA_TLB_PSI_FLUSH;
 
-	if (domain->use_first_level) {
+	if (domain->domain.ops == &intel_fs_paging_domain_ops) {
 		qi_batch_add_piotlb(iommu, tag->domain_id, tag->pasid, addr,
 				    pages, ih, domain->qi_batch);
 		return;
@@ -546,7 +546,8 @@ void cache_tag_flush_range_np(struct dmar_domain *domain, unsigned long start,
 			qi_batch_flush_descs(iommu, domain->qi_batch);
 		iommu = tag->iommu;
 
-		if (!cap_caching_mode(iommu->cap) || domain->use_first_level) {
+		if (!cap_caching_mode(iommu->cap) ||
+		    domain->domain.ops == &intel_fs_paging_domain_ops) {
 			iommu_flush_write_buffer(iommu);
 			continue;
 		}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 243e77d5a06aac..7a5c794e488614 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1462,6 +1462,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	struct context_entry *context;
 	int ret;
 
+	if (WARN_ON(domain->domain.ops != &intel_ss_paging_domain_ops))
+		return -EINVAL;
+
 	pr_debug("Set context mapping for %02x:%02x.%d\n",
 		bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
 
@@ -1800,12 +1803,14 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
 
 	if (!sm_supported(iommu))
 		ret = domain_context_mapping(domain, dev);
-	else if (domain->use_first_level)
+	else if (domain->domain.ops == &intel_fs_paging_domain_ops)
 		ret = domain_setup_first_level(iommu, domain, dev,
 					       IOMMU_NO_PASID, NULL);
-	else
+	else if (domain->domain.ops == &intel_ss_paging_domain_ops)
 		ret = domain_setup_second_level(iommu, domain, dev,
 						IOMMU_NO_PASID, NULL);
+	else if (WARN_ON(true))
+		ret = -EINVAL;
 
 	if (ret)
 		goto out_block_translation;
@@ -3274,7 +3279,6 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 	domain->use_first_level = first_stage;
 
 	domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
-	domain->domain.ops = intel_iommu_ops.default_domain_ops;
 
 	/* calculate the address width */
 	addr_width = agaw_to_width(iommu->agaw);
@@ -3332,6 +3336,8 @@ intel_iommu_domain_alloc_first_stage(struct device *dev,
 	dmar_domain = paging_domain_alloc(dev, true);
 	if (IS_ERR(dmar_domain))
 		return ERR_CAST(dmar_domain);
+
+	dmar_domain->domain.ops = &intel_fs_paging_domain_ops;
 	return &dmar_domain->domain;
 }
 
@@ -3360,6 +3366,7 @@ intel_iommu_domain_alloc_second_stage(struct device *dev,
 	if (IS_ERR(dmar_domain))
 		return ERR_CAST(dmar_domain);
 
+	dmar_domain->domain.ops = &intel_ss_paging_domain_ops;
 	dmar_domain->nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
 
 	if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
@@ -4081,12 +4088,15 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
 	if (ret)
 		goto out_remove_dev_pasid;
 
-	if (dmar_domain->use_first_level)
+	if (domain->ops == &intel_fs_paging_domain_ops)
 		ret = domain_setup_first_level(iommu, dmar_domain,
 					       dev, pasid, old);
-	else
+	else if (domain->ops == &intel_ss_paging_domain_ops)
 		ret = domain_setup_second_level(iommu, dmar_domain,
 						dev, pasid, old);
+	else if (WARN_ON(true))
+		ret = -EINVAL;
+
 	if (ret)
 		goto out_unwind_iopf;
 
@@ -4361,6 +4371,32 @@ static struct iommu_domain identity_domain = {
 	},
 };
 
+const struct iommu_domain_ops intel_fs_paging_domain_ops = {
+	.attach_dev = intel_iommu_attach_device,
+	.set_dev_pasid = intel_iommu_set_dev_pasid,
+	.map_pages = intel_iommu_map_pages,
+	.unmap_pages = intel_iommu_unmap_pages,
+	.iotlb_sync_map = intel_iommu_iotlb_sync_map,
+	.flush_iotlb_all = intel_flush_iotlb_all,
+	.iotlb_sync = intel_iommu_tlb_sync,
+	.iova_to_phys = intel_iommu_iova_to_phys,
+	.free = intel_iommu_domain_free,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+};
+
+const struct iommu_domain_ops intel_ss_paging_domain_ops = {
+	.attach_dev = intel_iommu_attach_device,
+	.set_dev_pasid = intel_iommu_set_dev_pasid,
+	.map_pages = intel_iommu_map_pages,
+	.unmap_pages = intel_iommu_unmap_pages,
+	.iotlb_sync_map = intel_iommu_iotlb_sync_map,
+	.flush_iotlb_all = intel_flush_iotlb_all,
+	.iotlb_sync = intel_iommu_tlb_sync,
+	.iova_to_phys = intel_iommu_iova_to_phys,
+	.free = intel_iommu_domain_free,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+};
+
 const struct iommu_ops intel_iommu_ops = {
 	.blocked_domain		= &blocking_domain,
 	.release_domain		= &blocking_domain,
@@ -4379,18 +4415,6 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.pgsize_bitmap		= SZ_4K,
 	.page_response		= intel_iommu_page_response,
-	.default_domain_ops = &(const struct iommu_domain_ops) {
-		.attach_dev		= intel_iommu_attach_device,
-		.set_dev_pasid		= intel_iommu_set_dev_pasid,
-		.map_pages		= intel_iommu_map_pages,
-		.unmap_pages		= intel_iommu_unmap_pages,
-		.iotlb_sync_map		= intel_iommu_iotlb_sync_map,
-		.flush_iotlb_all        = intel_flush_iotlb_all,
-		.iotlb_sync		= intel_iommu_tlb_sync,
-		.iova_to_phys		= intel_iommu_iova_to_phys,
-		.free			= intel_iommu_domain_free,
-		.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
-	}
 };
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index ca6843973d4393..9b9e5fde4a3062 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1375,6 +1375,8 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
 					 u8 devfn, int alloc);
 
 extern const struct iommu_ops intel_iommu_ops;
+extern const struct iommu_domain_ops intel_fs_paging_domain_ops;
+extern const struct iommu_domain_ops intel_ss_paging_domain_ops;
 
 #ifdef CONFIG_INTEL_IOMMU
 extern int intel_iommu_sm;
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index fc312f649f9ef0..d976d6b6188d3f 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -216,7 +216,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	/* Must be nested domain */
 	if (user_data->type != IOMMU_HWPT_DATA_VTD_S1)
 		return ERR_PTR(-EOPNOTSUPP);
-	if (parent->ops != intel_iommu_ops.default_domain_ops ||
+	if (parent->ops != &intel_ss_paging_domain_ops ||
 	    !s2_domain->nested_parent)
 		return ERR_PTR(-EINVAL);
 
@@ -229,7 +229,6 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	if (!domain)
 		return ERR_PTR(-ENOMEM);
 
-	domain->use_first_level = true;
 	domain->s2_domain = s2_domain;
 	domain->s1_cfg = vtd;
 	domain->domain.ops = &intel_nested_domain_ops;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 8c0bed36c58777..e147f71f91b722 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -214,7 +214,6 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
 		return ERR_PTR(-ENOMEM);
 
 	domain->domain.ops = &intel_svm_domain_ops;
-	domain->use_first_level = true;
 	INIT_LIST_HEAD(&domain->dev_pasids);
 	INIT_LIST_HEAD(&domain->cache_tags);
 	spin_lock_init(&domain->cache_lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2025-06-12 14:31 ` [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  2025-06-13  9:56   ` Tian, Kevin
  2025-06-12 14:31 ` [PATCH v2 7/7] iommu/vt-d: Split paging_domain_compatible() Jason Gunthorpe
  6 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

First Stage and Second Stage have very different ways to deny
no-snoop. The first stage uses the PGSNP bit which is global per-PASID so
enabling requires loading new PASID entries for all the attached devices.

Second stage uses a bit per PTE, so enabling just requires telling future
maps to set the bit.

Since we now have two domain ops we can have two functions that can
directly code their required actions instead of a bunch of logic dancing
around use_first_level.

Combine domain_set_force_snooping() into the new functions since they are
the only caller.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c | 56 +++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7a5c794e488614..5fd4b82576ca44 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3629,44 +3629,40 @@ static bool domain_support_force_snooping(struct dmar_domain *domain)
 	return support;
 }
 
-static void domain_set_force_snooping(struct dmar_domain *domain)
-{
-	struct device_domain_info *info;
-
-	assert_spin_locked(&domain->lock);
-	/*
-	 * Second level page table supports per-PTE snoop control. The
-	 * iommu_map() interface will handle this by setting SNP bit.
-	 */
-	if (!domain->use_first_level) {
-		domain->set_pte_snp = true;
-		return;
-	}
-
-	list_for_each_entry(info, &domain->devices, link)
-		intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
-						     IOMMU_NO_PASID);
-}
-
-static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+static bool intel_iommu_enforce_cache_coherency_fs(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-	unsigned long flags;
+	struct device_domain_info *info;
+
+	guard(spinlock_irqsave)(&dmar_domain->lock);
 
 	if (dmar_domain->force_snooping)
 		return true;
 
-	spin_lock_irqsave(&dmar_domain->lock, flags);
-	if (!domain_support_force_snooping(dmar_domain) ||
-	    (!dmar_domain->use_first_level && dmar_domain->has_mappings)) {
-		spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	if (!domain_support_force_snooping(dmar_domain))
 		return false;
-	}
 
-	domain_set_force_snooping(dmar_domain);
 	dmar_domain->force_snooping = true;
-	spin_unlock_irqrestore(&dmar_domain->lock, flags);
+	list_for_each_entry(info, &dmar_domain->devices, link)
+		intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
+						     IOMMU_NO_PASID);
+	return true;
+}
 
+static bool intel_iommu_enforce_cache_coherency_ss(struct iommu_domain *domain)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+	guard(spinlock_irqsave)(&dmar_domain->lock);
+	if (!domain_support_force_snooping(dmar_domain) ||
+	    dmar_domain->has_mappings)
+		return false;
+
+	/*
+	 * Second level page table supports per-PTE snoop control. The
+	 * iommu_map() interface will handle this by setting SNP bit.
+	 */
+	dmar_domain->set_pte_snp = true;
 	return true;
 }
 
@@ -4381,7 +4377,7 @@ const struct iommu_domain_ops intel_fs_paging_domain_ops = {
 	.iotlb_sync = intel_iommu_tlb_sync,
 	.iova_to_phys = intel_iommu_iova_to_phys,
 	.free = intel_iommu_domain_free,
-	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency_fs,
 };
 
 const struct iommu_domain_ops intel_ss_paging_domain_ops = {
@@ -4394,7 +4390,7 @@ const struct iommu_domain_ops intel_ss_paging_domain_ops = {
 	.iotlb_sync = intel_iommu_tlb_sync,
 	.iova_to_phys = intel_iommu_iova_to_phys,
 	.free = intel_iommu_domain_free,
-	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+	.enforce_cache_coherency = intel_iommu_enforce_cache_coherency_ss,
 };
 
 const struct iommu_ops intel_iommu_ops = {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 7/7] iommu/vt-d: Split paging_domain_compatible()
  2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2025-06-12 14:31 ` [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
@ 2025-06-12 14:31 ` Jason Gunthorpe
  6 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 14:31 UTC (permalink / raw)
  To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Will Deacon
  Cc: patches, Wei Wang

Make First/Second stage specific functions that follow the same pattern in
intel_iommu_domain_alloc_first/second_stage() for computing
EOPNOTSUPP. This makes the code easier to understand as if we couldn't
create a domain with the parameters for this IOMMU instance then we
certainly are not compatible with it.

Check superpage support directly against the per-stage cap bits and the
pgsize_bitmap.

Add a note that the force_snooping is read without locking. The locking
needs to cover the compatible check and the add of the device to the list.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.c | 66 ++++++++++++++++++++++++++++++-------
 1 file changed, 54 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 5fd4b82576ca44..ce8ae89404f313 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3417,33 +3417,75 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
 	kfree(dmar_domain);
 }
 
+static int paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain,
+						struct intel_iommu *iommu)
+{
+	if (WARN_ON(dmar_domain->domain.dirty_ops ||
+		    dmar_domain->nested_parent))
+		return -EINVAL;
+
+	/* Only SL is available in legacy mode */
+	if (!sm_supported(iommu) || !ecap_flts(iommu->ecap))
+		return -EINVAL;
+
+	/* Same page size support */
+	if (!cap_fl1gp_support(iommu->cap) &&
+	    (dmar_domain->domain.pgsize_bitmap & SZ_1G))
+		return -EINVAL;
+	return 0;
+}
+
+static int
+paging_domain_compatible_second_stage(struct dmar_domain *dmar_domain,
+				      struct intel_iommu *iommu)
+{
+	unsigned int sslps = cap_super_page_val(iommu->cap);
+
+	if (dmar_domain->domain.dirty_ops && !ssads_supported(iommu))
+		return -EINVAL;
+	if (dmar_domain->nested_parent && !nested_supported(iommu))
+		return -EINVAL;
+
+	/* Legacy mode always supports second stage */
+	if (sm_supported(iommu) && !ecap_slts(iommu->ecap))
+		return -EINVAL;
+
+	/* Same page size support */
+	if (!(sslps & BIT(0)) && (dmar_domain->domain.pgsize_bitmap & SZ_2M))
+		return -EINVAL;
+	if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G))
+		return -EINVAL;
+	return 0;
+}
+
 int paging_domain_compatible(struct iommu_domain *domain, struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
 	struct intel_iommu *iommu = info->iommu;
+	int ret = -EINVAL;
 	int addr_width;
 
-	if (WARN_ON_ONCE(!(domain->type & __IOMMU_DOMAIN_PAGING)))
-		return -EPERM;
+	if (domain->ops == &intel_fs_paging_domain_ops)
+		ret = paging_domain_compatible_first_stage(dmar_domain, iommu);
+	else if (domain->ops == &intel_ss_paging_domain_ops)
+		ret = paging_domain_compatible_second_stage(dmar_domain, iommu);
+	else if (WARN_ON(true))
+		ret = -EINVAL;
+	if (ret)
+		return ret;
 
+	/*
+	 * FIXME this is locked wrong, it needs to be under the
+	 * dmar_domain->lock
+	 */
 	if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
 		return -EINVAL;
 
-	if (domain->dirty_ops && !ssads_supported(iommu))
-		return -EINVAL;
-
 	if (dmar_domain->iommu_coherency !=
 			iommu_paging_structure_coherency(iommu))
 		return -EINVAL;
 
-	if (dmar_domain->iommu_superpage !=
-			iommu_superpage_capability(iommu, dmar_domain->use_first_level))
-		return -EINVAL;
-
-	if (dmar_domain->use_first_level &&
-	    (!sm_supported(iommu) || !ecap_flts(iommu->ecap)))
-		return -EINVAL;
 
 	/* check if this iommu agaw is sufficient for max mapped address */
 	addr_width = agaw_to_width(iommu->agaw);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
  2025-06-12 14:31 ` [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
@ 2025-06-13  9:31   ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2025-06-13  9:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
	Joerg Roedel, Robin Murphy, Will Deacon
  Cc: patches@lists.linux.dev, Wang, Wei W

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, June 12, 2025 10:32 PM
> 
> Pass the phys_addr_t down through the call chain from the top instead of
> passing a pgd_t * KVA. This moves the __pa() into
> domain_setup_first_level() which is the first function to obtain the pgd
> from the IOMMU page table in this call chain.
> 
> The SVA flow is also adjusted to get the pa of the mm->pgd.
> 
> iommput will move the __pa() into iommupt code, it never shares the KVA of
> the page table with the driver.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2 2/7] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()
  2025-06-12 14:31 ` [PATCH v2 2/7] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
@ 2025-06-13  9:32   ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2025-06-13  9:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
	Joerg Roedel, Robin Murphy, Will Deacon
  Cc: patches@lists.linux.dev, Wang, Wei W

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, June 12, 2025 10:32 PM
> 
> It has only one caller, no need for two functions.
> 
> Correct the WARN_ON() error handling to leak the entire page table if the
> HW is still referencing it so we don't UAF during WARN_ON recovery.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2 3/7] iommu/vt-d: Do not wipe out the page table NID when devices detach
  2025-06-12 14:31 ` [PATCH v2 3/7] iommu/vt-d: Do not wipe out the page table NID when devices detach Jason Gunthorpe
@ 2025-06-13  9:33   ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2025-06-13  9:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
	Joerg Roedel, Robin Murphy, Will Deacon
  Cc: patches@lists.linux.dev, Wang, Wei W

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, June 12, 2025 10:32 PM
> 
> The NID is used to control which NUMA node memory for the page table is
> allocated it from. It should be a permanent property of the page table
> when it was allocated and not change during attach/detach of devices.
> 
> Reviewed-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2 4/7] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
  2025-06-12 14:31 ` [PATCH v2 4/7] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
@ 2025-06-13  9:35   ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2025-06-13  9:35 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
	Joerg Roedel, Robin Murphy, Will Deacon
  Cc: patches@lists.linux.dev, Wang, Wei W

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, June 12, 2025 10:32 PM
> 
> Create stage specific functions that check the stage specific conditions
> if each stage can be supported.
> 
> Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence
> until one does not return EOPNOTSUPP and prefer to use the first stage if
> available and suitable for the requested flags.
> 
> Move second stage only operations like nested_parent and dirty_tracking
> into the second stage function for clarity.
> 
> Move initialization of the iommu_domain members into
> paging_domain_alloc().
> 
> Drop initialization of domain->owner as the callers all do it.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage
  2025-06-12 14:31 ` [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage Jason Gunthorpe
@ 2025-06-13  9:38   ` Tian, Kevin
  2025-06-13 12:58     ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Tian, Kevin @ 2025-06-13  9:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
	Joerg Roedel, Robin Murphy, Will Deacon
  Cc: patches@lists.linux.dev, Wang, Wei W

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, June 12, 2025 10:32 PM
> 
> -	if (domain->use_first_level) {
> +	if (domain->domain.ops == &intel_fs_paging_domain_ops) {

It is more readable to have macros:

	domain_is_fs_paging(domain)
	domain_is_ss_paging(domain)

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
  2025-06-12 14:31 ` [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
@ 2025-06-13  9:56   ` Tian, Kevin
  2025-06-13 13:12     ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Tian, Kevin @ 2025-06-13  9:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
	Joerg Roedel, Robin Murphy, Will Deacon
  Cc: patches@lists.linux.dev, Wang, Wei W

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, June 12, 2025 10:32 PM
>
> -static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
> *domain)
> +static bool intel_iommu_enforce_cache_coherency_fs(struct
> iommu_domain *domain)
>  {
>  	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> -	unsigned long flags;
> +	struct device_domain_info *info;
> +
> +	guard(spinlock_irqsave)(&dmar_domain->lock);
> 
>  	if (dmar_domain->force_snooping)
>  		return true;
> 
> -	spin_lock_irqsave(&dmar_domain->lock, flags);
> -	if (!domain_support_force_snooping(dmar_domain) ||
> -	    (!dmar_domain->use_first_level && dmar_domain-
> >has_mappings)) {
> -		spin_unlock_irqrestore(&dmar_domain->lock, flags);
> +	if (!domain_support_force_snooping(dmar_domain))
>  		return false;
> -	}
> 
> -	domain_set_force_snooping(dmar_domain);
>  	dmar_domain->force_snooping = true;

with this change force_snooping is relevant only to first level.

but nested domain checks this flag of the parent domain to set
PGSNP:

pasid_pte_config_nestd():
	if (s2_domain->force_snooping)
		pasid_set_pgsnp(pte);	

we either need set it for second level or also check set_pte_snp
here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage
  2025-06-13  9:38   ` Tian, Kevin
@ 2025-06-13 12:58     ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-13 12:58 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon, patches@lists.linux.dev, Wang, Wei W

On Fri, Jun 13, 2025 at 09:38:30AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, June 12, 2025 10:32 PM
> > 
> > -	if (domain->use_first_level) {
> > +	if (domain->domain.ops == &intel_fs_paging_domain_ops) {
> 
> It is more readable to have macros:
> 
> 	domain_is_fs_paging(domain)
> 	domain_is_ss_paging(domain)

Done

Thanks
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
  2025-06-13  9:56   ` Tian, Kevin
@ 2025-06-13 13:12     ` Jason Gunthorpe
  2025-06-30  6:11       ` Baolu Lu
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-13 13:12 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lu Baolu, David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon, patches@lists.linux.dev, Wang, Wei W

On Fri, Jun 13, 2025 at 09:56:44AM +0000, Tian, Kevin wrote:
> pasid_pte_config_nestd():
> 	if (s2_domain->force_snooping)
> 		pasid_set_pgsnp(pte);	
> 
> we either need set it for second level or also check set_pte_snp
> here.

Okay, I'm deleting set_pte_snp, so lets set force_snooping

Thanks,
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
  2025-06-13 13:12     ` Jason Gunthorpe
@ 2025-06-30  6:11       ` Baolu Lu
  2025-06-30 15:04         ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Baolu Lu @ 2025-06-30  6:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon, patches@lists.linux.dev, Wang, Wei W

On 6/13/25 21:12, Jason Gunthorpe wrote:
> On Fri, Jun 13, 2025 at 09:56:44AM +0000, Tian, Kevin wrote:
>> pasid_pte_config_nestd():
>> 	if (s2_domain->force_snooping)
>> 		pasid_set_pgsnp(pte);	
>>
>> we either need set it for second level or also check set_pte_snp
>> here.
> Okay, I'm deleting set_pte_snp, so lets set force_snooping

Hi Jason, do you have a new version of this series? I'm planning to
queue this for the next merge cycle.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
  2025-06-30  6:11       ` Baolu Lu
@ 2025-06-30 15:04         ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2025-06-30 15:04 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Tian, Kevin, David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
	Robin Murphy, Will Deacon, patches@lists.linux.dev, Wang, Wei W

On Mon, Jun 30, 2025 at 02:11:09PM +0800, Baolu Lu wrote:
> On 6/13/25 21:12, Jason Gunthorpe wrote:
> > On Fri, Jun 13, 2025 at 09:56:44AM +0000, Tian, Kevin wrote:
> > > pasid_pte_config_nestd():
> > > 	if (s2_domain->force_snooping)
> > > 		pasid_set_pgsnp(pte);	
> > > 
> > > we either need set it for second level or also check set_pte_snp
> > > here.
> > Okay, I'm deleting set_pte_snp, so lets set force_snooping
> 
> Hi Jason, do you have a new version of this series? I'm planning to
> queue this for the next merge cycle.

Yes, sorry, I forgot to repost v3 with the little changes

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-30 15:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-12 14:31 [PATCH v2 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
2025-06-12 14:31 ` [PATCH v2 1/7] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
2025-06-13  9:31   ` Tian, Kevin
2025-06-12 14:31 ` [PATCH v2 2/7] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
2025-06-13  9:32   ` Tian, Kevin
2025-06-12 14:31 ` [PATCH v2 3/7] iommu/vt-d: Do not wipe out the page table NID when devices detach Jason Gunthorpe
2025-06-13  9:33   ` Tian, Kevin
2025-06-12 14:31 ` [PATCH v2 4/7] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
2025-06-13  9:35   ` Tian, Kevin
2025-06-12 14:31 ` [PATCH v2 5/7] iommu/vt-d: Create unique domain ops for each stage Jason Gunthorpe
2025-06-13  9:38   ` Tian, Kevin
2025-06-13 12:58     ` Jason Gunthorpe
2025-06-12 14:31 ` [PATCH v2 6/7] iommu/vt-d: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
2025-06-13  9:56   ` Tian, Kevin
2025-06-13 13:12     ` Jason Gunthorpe
2025-06-30  6:11       ` Baolu Lu
2025-06-30 15:04         ` Jason Gunthorpe
2025-06-12 14:31 ` [PATCH v2 7/7] iommu/vt-d: Split paging_domain_compatible() Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).