* [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt
@ 2025-06-09 19:58 Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
` (7 more replies)
0 siblings, 8 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
This is similar to a series I sent for AMD.
Lightly reorganize some of the page table related code so there is greater
isolation between the first and second stage logic. iommupt will introduce
distinct types and code flows for first and second stages, having them all
combined together is difficult.
Splitting actually makes things generally more understandable anyhow as
they do share only a little code. Most of the registers and logic are in
fact different. That was being obscured by small helper functions that
combine the first and second stage information.
This would replace some of the patches recently sent by Wei.
The full branch including the iommput conversion is here:
https://github.com/jgunthorpe/linux/commits/iommu_pt_vtd
Jason Gunthorpe (7):
iommu/vtd: Lift the __pa to
domain_setup_first_level/intel_svm_set_dev_pasid()
iommu/vtd: Fold domain_exit() into intel_iommu_domain_free()
iommu/vtd: Do not wipe out the page table NID when devices detach
iommu/vtd: Split intel_iommu_domain_alloc_paging_flags()
iommu/vtd: Create unique domain ops for each stage
iommu/vtd: Split intel_iommu_enforce_cache_coherency()
iommu/vtd: Split paging_domain_compatible()
drivers/iommu/intel/cache.c | 5 +-
drivers/iommu/intel/iommu.c | 321 +++++++++++++++++++++--------------
drivers/iommu/intel/iommu.h | 9 +-
drivers/iommu/intel/nested.c | 3 +-
drivers/iommu/intel/pasid.c | 17 +-
drivers/iommu/intel/pasid.h | 11 +-
drivers/iommu/intel/svm.c | 3 +-
7 files changed, 221 insertions(+), 148 deletions(-)
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
--
2.43.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-10 18:34 ` Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
` (6 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
Pass the phys_addr_t down through the call chain from the top instead of
passing a pgd_t * KVA. This moves the __pa() into
domain_setup_first_level() which is the first function to obtain the pgd
from the IOMMU page table in this call chain.
The SVA flow is also adjusted to get the pa of the mm->pgd.
iommput will move the __pa() into iommupt code, it never shares the KVA of
the page table with the driver.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/iommu.c | 15 +++++++--------
drivers/iommu/intel/iommu.h | 7 +++----
drivers/iommu/intel/pasid.c | 17 +++++++++--------
drivers/iommu/intel/pasid.h | 11 +++++------
drivers/iommu/intel/svm.c | 2 +-
5 files changed, 25 insertions(+), 27 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7aa3932251b2fd..b42bb091088ff5 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1736,15 +1736,14 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
intel_context_flush_no_pasid(info, context, did);
}
-int __domain_setup_first_level(struct intel_iommu *iommu,
- struct device *dev, ioasid_t pasid,
- u16 did, pgd_t *pgd, int flags,
- struct iommu_domain *old)
+int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+ ioasid_t pasid, u16 did, phys_addr_t fsptptr,
+ int flags, struct iommu_domain *old)
{
if (!old)
- return intel_pasid_setup_first_level(iommu, dev, pgd,
- pasid, did, flags);
- return intel_pasid_replace_first_level(iommu, dev, pgd, pasid, did,
+ return intel_pasid_setup_first_level(iommu, dev, fsptptr, pasid,
+ did, flags);
+ return intel_pasid_replace_first_level(iommu, dev, fsptptr, pasid, did,
iommu_domain_did(old, iommu),
flags);
}
@@ -1793,7 +1792,7 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
return __domain_setup_first_level(iommu, dev, pasid,
domain_id_iommu(domain, iommu),
- (pgd_t *)pgd, flags, old);
+ __pa(pgd), flags, old);
}
static int dmar_domain_attach_device(struct dmar_domain *domain,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 3ddbcc603de23b..ca6843973d4393 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1252,10 +1252,9 @@ domain_add_dev_pasid(struct iommu_domain *domain,
void domain_remove_dev_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid);
-int __domain_setup_first_level(struct intel_iommu *iommu,
- struct device *dev, ioasid_t pasid,
- u16 did, pgd_t *pgd, int flags,
- struct iommu_domain *old);
+int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+ ioasid_t pasid, u16 did, phys_addr_t fsptptr,
+ int flags, struct iommu_domain *old);
int dmar_ir_support(void);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index ac67a056b6c868..26561dbbbaaac2 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -348,14 +348,15 @@ static void intel_pasid_flush_present(struct intel_iommu *iommu,
*/
static void pasid_pte_config_first_level(struct intel_iommu *iommu,
struct pasid_entry *pte,
- pgd_t *pgd, u16 did, int flags)
+ phys_addr_t fsptptr, u16 did,
+ int flags)
{
lockdep_assert_held(&iommu->lock);
pasid_clear_entry(pte);
/* Setup the first level page table pointer: */
- pasid_set_flptr(pte, (u64)__pa(pgd));
+ pasid_set_flptr(pte, fsptptr);
if (flags & PASID_FLAG_FL5LP)
pasid_set_flpm(pte, 1);
@@ -372,9 +373,9 @@ static void pasid_pte_config_first_level(struct intel_iommu *iommu,
pasid_set_present(pte);
}
-int intel_pasid_setup_first_level(struct intel_iommu *iommu,
- struct device *dev, pgd_t *pgd,
- u32 pasid, u16 did, int flags)
+int intel_pasid_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+ phys_addr_t fsptptr, u32 pasid, u16 did,
+ int flags)
{
struct pasid_entry *pte;
@@ -402,7 +403,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
return -EBUSY;
}
- pasid_pte_config_first_level(iommu, pte, pgd, did, flags);
+ pasid_pte_config_first_level(iommu, pte, fsptptr, did, flags);
spin_unlock(&iommu->lock);
@@ -412,7 +413,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
}
int intel_pasid_replace_first_level(struct intel_iommu *iommu,
- struct device *dev, pgd_t *pgd,
+ struct device *dev, u64 fsptptr,
u32 pasid, u16 did, u16 old_did,
int flags)
{
@@ -430,7 +431,7 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
return -EINVAL;
}
- pasid_pte_config_first_level(iommu, &new_pte, pgd, did, flags);
+ pasid_pte_config_first_level(iommu, &new_pte, fsptptr, did, flags);
spin_lock(&iommu->lock);
pte = intel_pasid_get_entry(dev, pasid);
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index fd0fd1a0df84cc..a771a77d4239c4 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -288,9 +288,9 @@ extern unsigned int intel_pasid_max_id;
int intel_pasid_alloc_table(struct device *dev);
void intel_pasid_free_table(struct device *dev);
struct pasid_table *intel_pasid_get_table(struct device *dev);
-int intel_pasid_setup_first_level(struct intel_iommu *iommu,
- struct device *dev, pgd_t *pgd,
- u32 pasid, u16 did, int flags);
+int intel_pasid_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+ phys_addr_t fsptptr, u32 pasid, u16 did,
+ int flags);
int intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, u32 pasid);
@@ -302,9 +302,8 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
u32 pasid, struct dmar_domain *domain);
int intel_pasid_replace_first_level(struct intel_iommu *iommu,
- struct device *dev, pgd_t *pgd,
- u32 pasid, u16 did, u16 old_did,
- int flags);
+ struct device *dev, phys_addr_t fsptptr,
+ u32 pasid, u16 did, u16 old_did, int flags);
int intel_pasid_replace_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, u16 old_did,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index f3da596410b5e5..8c0bed36c58777 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -171,7 +171,7 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
/* Setup the pasid table: */
sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
ret = __domain_setup_first_level(iommu, dev, pasid,
- FLPT_DEFAULT_DID, mm->pgd,
+ FLPT_DEFAULT_DID, __pa(mm->pgd),
sflags, old);
if (ret)
goto out_unwind_iopf;
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free()
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-10 9:15 ` Wang, Wei W
2025-06-09 19:58 ` [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach Jason Gunthorpe
` (5 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
It has only one caller, no need for two functions.
Correct the WARN_ON() error handling to leak the entire page table if the
HW is still referencing it so we don't UAF during WARN_ON recovery.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/iommu.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b42bb091088ff5..ceb960a796e2ba 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1396,23 +1396,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
}
}
-static void domain_exit(struct dmar_domain *domain)
-{
- if (domain->pgd) {
- struct iommu_pages_list freelist =
- IOMMU_PAGES_LIST_INIT(freelist);
-
- domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist);
- iommu_put_pages_list(&freelist);
- }
-
- if (WARN_ON(!list_empty(&domain->devices)))
- return;
-
- kfree(domain->qi_batch);
- kfree(domain);
-}
-
/*
* For kdump cases, old valid entries may be cached due to the
* in-flight DMA and copied pgtable, but there is no unmapping
@@ -3394,7 +3377,21 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
WARN_ON(dmar_domain->nested_parent &&
!list_empty(&dmar_domain->s1_domains));
- domain_exit(dmar_domain);
+
+ if (WARN_ON(!list_empty(&dmar_domain->devices)))
+ return;
+
+ if (dmar_domain->pgd) {
+ struct iommu_pages_list freelist =
+ IOMMU_PAGES_LIST_INIT(freelist);
+
+ domain_unmap(dmar_domain, 0, DOMAIN_MAX_PFN(dmar_domain->gaw),
+ &freelist);
+ iommu_put_pages_list(&freelist);
+ }
+
+ kfree(dmar_domain->qi_batch);
+ kfree(dmar_domain);
}
int paging_domain_compatible(struct iommu_domain *domain, struct device *dev)
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-10 6:41 ` Baolu Lu
2025-06-10 9:14 ` Wang, Wei W
2025-06-09 19:58 ` [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
` (4 subsequent siblings)
7 siblings, 2 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
The NID is used to control which NUMA node memory for the page table is
allocated it from. It should be a permanent property of the page table
when it was allocated and not change during attach/detach of devices.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/iommu.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ceb960a796e2ba..72b0769c391008 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1391,7 +1391,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
if (--info->refcnt == 0) {
ida_free(&iommu->domain_ida, info->did);
xa_erase(&domain->iommu_array, iommu->seq_id);
- domain->nid = NUMA_NO_NODE;
kfree(info);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags()
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
` (2 preceding siblings ...)
2025-06-09 19:58 ` [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-10 9:14 ` Wang, Wei W
2025-06-09 19:58 ` [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage Jason Gunthorpe
` (3 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
Create stage specific functions that check the stage specific conditions
if each stage can be supported.
Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence
until one does not return EOPNOTSUPP and prefer to use the first stage if
available and suitable for the requested flags.
Move second stage only operations like nested_parent and dirty_tracking
into the second stage function for clarity.
Move initialization of the iommu_domain members into paging_domain_alloc().
Drop initialization of domain->owner as the callers all do it.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/iommu.c | 98 +++++++++++++++++++++----------------
1 file changed, 57 insertions(+), 41 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 72b0769c391008..6987a46c2e8421 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3267,10 +3267,15 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
spin_lock_init(&domain->lock);
spin_lock_init(&domain->cache_lock);
xa_init(&domain->iommu_array);
+ INIT_LIST_HEAD(&domain->s1_domains);
+ spin_lock_init(&domain->s1_lock);
domain->nid = dev_to_node(dev);
domain->use_first_level = first_stage;
+ domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
+ domain->domain.ops = intel_iommu_ops.default_domain_ops;
+
/* calculate the address width */
addr_width = agaw_to_width(iommu->agaw);
if (addr_width > cap_mgaw(iommu->cap))
@@ -3312,62 +3317,73 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
}
static struct iommu_domain *
-intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
- const struct iommu_user_data *user_data)
+intel_iommu_domain_alloc_first_stage(struct device *dev,
+ struct intel_iommu *iommu, u32 flags)
+{
+ struct dmar_domain *dmar_domain;
+
+ if (flags & (~IOMMU_HWPT_ALLOC_PASID))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ /* Only SL is available in legacy mode */
+ if (!sm_supported(iommu) || !ecap_flts(iommu->ecap))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ dmar_domain = paging_domain_alloc(dev, true);
+ if (IS_ERR(dmar_domain))
+ return ERR_CAST(dmar_domain);
+ return &dmar_domain->domain;
+}
+
+static struct iommu_domain *
+intel_iommu_domain_alloc_second_stage(struct device *dev,
+ struct intel_iommu *iommu, u32 flags)
{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
- bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
- struct intel_iommu *iommu = info->iommu;
struct dmar_domain *dmar_domain;
- struct iommu_domain *domain;
- bool first_stage;
if (flags &
(~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
IOMMU_HWPT_ALLOC_PASID)))
return ERR_PTR(-EOPNOTSUPP);
- if (nested_parent && !nested_supported(iommu))
- return ERR_PTR(-EOPNOTSUPP);
- if (user_data || (dirty_tracking && !ssads_supported(iommu)))
+
+ if (((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+ !nested_supported(iommu)) ||
+ ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+ !ssads_supported(iommu)))
return ERR_PTR(-EOPNOTSUPP);
- /*
- * Always allocate the guest compatible page table unless
- * IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING
- * is specified.
- */
- if (nested_parent || dirty_tracking) {
- if (!sm_supported(iommu) || !ecap_slts(iommu->ecap))
- return ERR_PTR(-EOPNOTSUPP);
- first_stage = false;
- } else {
- first_stage = first_level_by_default(iommu);
- }
+ /* Legacy mode always supports second stage */
+ if (sm_supported(iommu) && !ecap_slts(iommu->ecap))
+ return ERR_PTR(-EOPNOTSUPP);
- dmar_domain = paging_domain_alloc(dev, first_stage);
+ dmar_domain = paging_domain_alloc(dev, false);
if (IS_ERR(dmar_domain))
return ERR_CAST(dmar_domain);
- domain = &dmar_domain->domain;
- domain->type = IOMMU_DOMAIN_UNMANAGED;
- domain->owner = &intel_iommu_ops;
- domain->ops = intel_iommu_ops.default_domain_ops;
- if (nested_parent) {
- dmar_domain->nested_parent = true;
- INIT_LIST_HEAD(&dmar_domain->s1_domains);
- spin_lock_init(&dmar_domain->s1_lock);
- }
+ dmar_domain->nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
- if (dirty_tracking) {
- if (dmar_domain->use_first_level) {
- iommu_domain_free(domain);
- return ERR_PTR(-EOPNOTSUPP);
- }
- domain->dirty_ops = &intel_dirty_ops;
- }
+ if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
+ dmar_domain->domain.dirty_ops = &intel_dirty_ops;
- return domain;
+ return &dmar_domain->domain;
+}
+
+static struct iommu_domain *
+intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct iommu_domain *domain;
+
+ if (user_data)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ /* Prefer fist stage if possible by default. */
+ domain = intel_iommu_domain_alloc_first_stage(dev, iommu, flags);
+ if (domain != ERR_PTR(-EOPNOTSUPP))
+ return domain;
+ return intel_iommu_domain_alloc_second_stage(dev, iommu, flags);
}
static void intel_iommu_domain_free(struct iommu_domain *domain)
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
` (3 preceding siblings ...)
2025-06-09 19:58 ` [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-10 9:14 ` Wang, Wei W
2025-06-09 19:58 ` [PATCH 6/7] iommu/vtd: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
` (2 subsequent siblings)
7 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
Use the domain ops pointer to tell what kind of domain it is instead of
the internal use_first_level indication. This also protects against
wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they
should not be.
The only remaining uses of use_first_level outside the paging domain are in
paging_domain_compatible() and intel_iommu_enforce_cache_coherency().
Thus, remove the useless sets of use_first_level in
intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of
the unique ops for these domain types ever reference it on their call
chains.
Add a WARN_ON() check in domain_context_mapping_one() as it only works
with second stage.
This is preparation for iommupt which will have different ops for each of
the stages.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/cache.c | 5 ++--
drivers/iommu/intel/iommu.c | 58 +++++++++++++++++++++++++-----------
drivers/iommu/intel/iommu.h | 2 ++
drivers/iommu/intel/nested.c | 3 +-
drivers/iommu/intel/svm.c | 1 -
5 files changed, 47 insertions(+), 22 deletions(-)
diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index fc35cba5914532..29b8af6487512c 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -371,7 +371,7 @@ static void cache_tag_flush_iotlb(struct dmar_domain *domain, struct cache_tag *
struct intel_iommu *iommu = tag->iommu;
u64 type = DMA_TLB_PSI_FLUSH;
- if (domain->use_first_level) {
+ if (domain->domain.ops == &intel_fs_paging_domain_ops) {
qi_batch_add_piotlb(iommu, tag->domain_id, tag->pasid, addr,
pages, ih, domain->qi_batch);
return;
@@ -546,7 +546,8 @@ void cache_tag_flush_range_np(struct dmar_domain *domain, unsigned long start,
qi_batch_flush_descs(iommu, domain->qi_batch);
iommu = tag->iommu;
- if (!cap_caching_mode(iommu->cap) || domain->use_first_level) {
+ if (!cap_caching_mode(iommu->cap) ||
+ domain->domain.ops == &intel_fs_paging_domain_ops) {
iommu_flush_write_buffer(iommu);
continue;
}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6987a46c2e8421..820a4ef9396d68 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1462,6 +1462,9 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
struct context_entry *context;
int ret;
+ if (WARN_ON(domain->domain.ops != &intel_ss_paging_domain_ops))
+ return -EINVAL;
+
pr_debug("Set context mapping for %02x:%02x.%d\n",
bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -1800,12 +1803,14 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (!sm_supported(iommu))
ret = domain_context_mapping(domain, dev);
- else if (domain->use_first_level)
+ else if (domain->domain.ops == &intel_fs_paging_domain_ops)
ret = domain_setup_first_level(iommu, domain, dev,
IOMMU_NO_PASID, NULL);
- else
+ else if (domain->domain.ops == &intel_ss_paging_domain_ops)
ret = domain_setup_second_level(iommu, domain, dev,
IOMMU_NO_PASID, NULL);
+ else if (WARN_ON(true))
+ ret = -EINVAL;
if (ret)
goto out_block_translation;
@@ -3274,7 +3279,6 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
domain->use_first_level = first_stage;
domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
- domain->domain.ops = intel_iommu_ops.default_domain_ops;
/* calculate the address width */
addr_width = agaw_to_width(iommu->agaw);
@@ -3332,6 +3336,8 @@ intel_iommu_domain_alloc_first_stage(struct device *dev,
dmar_domain = paging_domain_alloc(dev, true);
if (IS_ERR(dmar_domain))
return ERR_CAST(dmar_domain);
+
+ dmar_domain->domain.ops = &intel_fs_paging_domain_ops;
return &dmar_domain->domain;
}
@@ -3360,6 +3366,7 @@ intel_iommu_domain_alloc_second_stage(struct device *dev,
if (IS_ERR(dmar_domain))
return ERR_CAST(dmar_domain);
+ dmar_domain->domain.ops = &intel_ss_paging_domain_ops;
dmar_domain->nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
@@ -4080,12 +4087,15 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain *domain,
if (ret)
goto out_remove_dev_pasid;
- if (dmar_domain->use_first_level)
+ if (domain->ops == &intel_fs_paging_domain_ops)
ret = domain_setup_first_level(iommu, dmar_domain,
dev, pasid, old);
- else
+ else if (domain->ops == &intel_ss_paging_domain_ops)
ret = domain_setup_second_level(iommu, dmar_domain,
dev, pasid, old);
+ else if (WARN_ON(true))
+ ret = -EINVAL;
+
if (ret)
goto out_unwind_iopf;
@@ -4360,6 +4370,32 @@ static struct iommu_domain identity_domain = {
},
};
+const struct iommu_domain_ops intel_fs_paging_domain_ops = {
+ .attach_dev = intel_iommu_attach_device,
+ .set_dev_pasid = intel_iommu_set_dev_pasid,
+ .map_pages = intel_iommu_map_pages,
+ .unmap_pages = intel_iommu_unmap_pages,
+ .iotlb_sync_map = intel_iommu_iotlb_sync_map,
+ .flush_iotlb_all = intel_flush_iotlb_all,
+ .iotlb_sync = intel_iommu_tlb_sync,
+ .iova_to_phys = intel_iommu_iova_to_phys,
+ .free = intel_iommu_domain_free,
+ .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+};
+
+const struct iommu_domain_ops intel_ss_paging_domain_ops = {
+ .attach_dev = intel_iommu_attach_device,
+ .set_dev_pasid = intel_iommu_set_dev_pasid,
+ .map_pages = intel_iommu_map_pages,
+ .unmap_pages = intel_iommu_unmap_pages,
+ .iotlb_sync_map = intel_iommu_iotlb_sync_map,
+ .flush_iotlb_all = intel_flush_iotlb_all,
+ .iotlb_sync = intel_iommu_tlb_sync,
+ .iova_to_phys = intel_iommu_iova_to_phys,
+ .free = intel_iommu_domain_free,
+ .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+};
+
const struct iommu_ops intel_iommu_ops = {
.blocked_domain = &blocking_domain,
.release_domain = &blocking_domain,
@@ -4378,18 +4414,6 @@ const struct iommu_ops intel_iommu_ops = {
.def_domain_type = device_def_domain_type,
.pgsize_bitmap = SZ_4K,
.page_response = intel_iommu_page_response,
- .default_domain_ops = &(const struct iommu_domain_ops) {
- .attach_dev = intel_iommu_attach_device,
- .set_dev_pasid = intel_iommu_set_dev_pasid,
- .map_pages = intel_iommu_map_pages,
- .unmap_pages = intel_iommu_unmap_pages,
- .iotlb_sync_map = intel_iommu_iotlb_sync_map,
- .flush_iotlb_all = intel_flush_iotlb_all,
- .iotlb_sync = intel_iommu_tlb_sync,
- .iova_to_phys = intel_iommu_iova_to_phys,
- .free = intel_iommu_domain_free,
- .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
- }
};
static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index ca6843973d4393..9b9e5fde4a3062 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1375,6 +1375,8 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
u8 devfn, int alloc);
extern const struct iommu_ops intel_iommu_ops;
+extern const struct iommu_domain_ops intel_fs_paging_domain_ops;
+extern const struct iommu_domain_ops intel_ss_paging_domain_ops;
#ifdef CONFIG_INTEL_IOMMU
extern int intel_iommu_sm;
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index fc312f649f9ef0..d976d6b6188d3f 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -216,7 +216,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
/* Must be nested domain */
if (user_data->type != IOMMU_HWPT_DATA_VTD_S1)
return ERR_PTR(-EOPNOTSUPP);
- if (parent->ops != intel_iommu_ops.default_domain_ops ||
+ if (parent->ops != &intel_ss_paging_domain_ops ||
!s2_domain->nested_parent)
return ERR_PTR(-EINVAL);
@@ -229,7 +229,6 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
if (!domain)
return ERR_PTR(-ENOMEM);
- domain->use_first_level = true;
domain->s2_domain = s2_domain;
domain->s1_cfg = vtd;
domain->domain.ops = &intel_nested_domain_ops;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 8c0bed36c58777..e147f71f91b722 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -214,7 +214,6 @@ struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
return ERR_PTR(-ENOMEM);
domain->domain.ops = &intel_svm_domain_ops;
- domain->use_first_level = true;
INIT_LIST_HEAD(&domain->dev_pasids);
INIT_LIST_HEAD(&domain->cache_tags);
spin_lock_init(&domain->cache_lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 6/7] iommu/vtd: Split intel_iommu_enforce_cache_coherency()
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
` (4 preceding siblings ...)
2025-06-09 19:58 ` [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 7/7] iommu/vtd: Split paging_domain_compatible() Jason Gunthorpe
2025-06-12 7:18 ` [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Joerg Roedel
7 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
First Stage and Second Stage have very different ways to deny
no-snoop. The first stage uses the PGSNP bit which is global per-PASID so
enabling requires loading new PASID entries for all the attached devices.
Second stage uses a bit per PTE, so enabling just requires telling future
maps to set the bit.
Since we now have two domain ops we can have two functions that can
directly code their required actions instead of a bunch of logic dancing
around use_first_level.
Combine domain_set_force_snooping() into the new functions since they are
the only caller.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/iommu.c | 56 +++++++++++++++++--------------------
1 file changed, 26 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 820a4ef9396d68..ab2e9fef75293c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3628,44 +3628,40 @@ static bool domain_support_force_snooping(struct dmar_domain *domain)
return support;
}
-static void domain_set_force_snooping(struct dmar_domain *domain)
-{
- struct device_domain_info *info;
-
- assert_spin_locked(&domain->lock);
- /*
- * Second level page table supports per-PTE snoop control. The
- * iommu_map() interface will handle this by setting SNP bit.
- */
- if (!domain->use_first_level) {
- domain->set_pte_snp = true;
- return;
- }
-
- list_for_each_entry(info, &domain->devices, link)
- intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
- IOMMU_NO_PASID);
-}
-
-static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+static bool intel_iommu_enforce_cache_coherency_fs(struct iommu_domain *domain)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- unsigned long flags;
+ struct device_domain_info *info;
+
+ guard(spinlock_irqsave)(&dmar_domain->lock);
if (dmar_domain->force_snooping)
return true;
- spin_lock_irqsave(&dmar_domain->lock, flags);
- if (!domain_support_force_snooping(dmar_domain) ||
- (!dmar_domain->use_first_level && dmar_domain->has_mappings)) {
- spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ if (!domain_support_force_snooping(dmar_domain))
return false;
- }
- domain_set_force_snooping(dmar_domain);
dmar_domain->force_snooping = true;
- spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ list_for_each_entry(info, &dmar_domain->devices, link)
+ intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
+ IOMMU_NO_PASID);
+ return true;
+}
+static bool intel_iommu_enforce_cache_coherency_ss(struct iommu_domain *domain)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+ guard(spinlock_irqsave)(&dmar_domain->lock);
+ if (!domain_support_force_snooping(dmar_domain) ||
+ dmar_domain->has_mappings)
+ return false;
+
+ /*
+ * Second level page table supports per-PTE snoop control. The
+ * iommu_map() interface will handle this by setting SNP bit.
+ */
+ dmar_domain->set_pte_snp = true;
return true;
}
@@ -4380,7 +4376,7 @@ const struct iommu_domain_ops intel_fs_paging_domain_ops = {
.iotlb_sync = intel_iommu_tlb_sync,
.iova_to_phys = intel_iommu_iova_to_phys,
.free = intel_iommu_domain_free,
- .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+ .enforce_cache_coherency = intel_iommu_enforce_cache_coherency_fs,
};
const struct iommu_domain_ops intel_ss_paging_domain_ops = {
@@ -4393,7 +4389,7 @@ const struct iommu_domain_ops intel_ss_paging_domain_ops = {
.iotlb_sync = intel_iommu_tlb_sync,
.iova_to_phys = intel_iommu_iova_to_phys,
.free = intel_iommu_domain_free,
- .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+ .enforce_cache_coherency = intel_iommu_enforce_cache_coherency_ss,
};
const struct iommu_ops intel_iommu_ops = {
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 7/7] iommu/vtd: Split paging_domain_compatible()
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
` (5 preceding siblings ...)
2025-06-09 19:58 ` [PATCH 6/7] iommu/vtd: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
@ 2025-06-09 19:58 ` Jason Gunthorpe
2025-06-10 7:12 ` Baolu Lu
2025-06-12 7:18 ` [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Joerg Roedel
7 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-09 19:58 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
Make First/Second stage specific functions that follow the same pattern in
intel_iommu_domain_alloc_first/second_stage() for computing
EOPNOTSUPP. This makes the code easier to understand as if we couldn't
create a domain with the parameters for this IOMMU instance then we
certainly are not compatible with it.
Check superpage support directly against the per-stage cap bits and the
pgsize_bitmap.
Add a note that the force_snooping is read without locking. The locking
needs to cover the compatible check and the add of the device to the list.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/intel/iommu.c | 66 ++++++++++++++++++++++++++++++-------
1 file changed, 54 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ab2e9fef75293c..a482d1b77d1203 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3416,33 +3416,75 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
kfree(dmar_domain);
}
+static int paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain,
+ struct intel_iommu *iommu)
+{
+ if (WARN_ON(dmar_domain->domain.dirty_ops ||
+ dmar_domain->nested_parent))
+ return -EINVAL;
+
+ /* Only SL is available in legacy mode */
+ if (!sm_supported(iommu) || !ecap_flts(iommu->ecap))
+ return -EINVAL;
+
+ /* Same page size support */
+ if (!cap_fl1gp_support(iommu->cap) &&
+ (dmar_domain->domain.pgsize_bitmap & SZ_1G))
+ return -EINVAL;
+ return 0;
+}
+
+static int
+paging_domain_compatible_second_stage(struct dmar_domain *dmar_domain,
+ struct intel_iommu *iommu)
+{
+ unsigned int sslps = cap_super_page_val(iommu->cap);
+
+ if (dmar_domain->domain.dirty_ops && !ssads_supported(iommu))
+ return -EINVAL;
+ if (dmar_domain->nested_parent && !nested_supported(iommu))
+ return -EINVAL;
+
+ /* Legacy mode always supports second stage */
+ if (sm_supported(iommu) && !ecap_slts(iommu->ecap))
+ return -EINVAL;
+
+ /* Same page size support */
+ if (!(sslps & BIT(0)) && (dmar_domain->domain.pgsize_bitmap & SZ_2M))
+ return -EINVAL;
+ if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G))
+ return -EINVAL;
+ return 0;
+}
+
int paging_domain_compatible(struct iommu_domain *domain, struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu = info->iommu;
+ int ret = -EINVAL;
int addr_width;
- if (WARN_ON_ONCE(!(domain->type & __IOMMU_DOMAIN_PAGING)))
- return -EPERM;
+ if (domain->ops == &intel_fs_paging_domain_ops)
+ ret = paging_domain_compatible_first_stage(dmar_domain, iommu);
+ else if (domain->ops == &intel_ss_paging_domain_ops)
+ ret = paging_domain_compatible_second_stage(dmar_domain, iommu);
+ else if (WARN_ON(true))
+ ret = -EINVAL;
+ if (ret)
+ return ret;
+ /*
+ * FIXME this is locked wrong, it needs to be under the
+ * dmar_domain->lock
+ */
if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
return -EINVAL;
- if (domain->dirty_ops && !ssads_supported(iommu))
- return -EINVAL;
-
if (dmar_domain->iommu_coherency !=
iommu_paging_structure_coherency(iommu))
return -EINVAL;
- if (dmar_domain->iommu_superpage !=
- iommu_superpage_capability(iommu, dmar_domain->use_first_level))
- return -EINVAL;
-
- if (dmar_domain->use_first_level &&
- (!sm_supported(iommu) || !ecap_flts(iommu->ecap)))
- return -EINVAL;
/* check if this iommu agaw is sufficient for max mapped address */
addr_width = agaw_to_width(iommu->agaw);
--
2.43.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach
2025-06-09 19:58 ` [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach Jason Gunthorpe
@ 2025-06-10 6:41 ` Baolu Lu
2025-06-10 13:18 ` Jason Gunthorpe
2025-06-10 9:14 ` Wang, Wei W
1 sibling, 1 reply; 24+ messages in thread
From: Baolu Lu @ 2025-06-10 6:41 UTC (permalink / raw)
To: Jason Gunthorpe, David Woodhouse, iommu, Joerg Roedel,
Robin Murphy, Will Deacon
Cc: patches, Wei Wang
On 6/10/25 03:58, Jason Gunthorpe wrote:
> The NID is used to control which NUMA node memory for the page table is
> allocated it from. It should be a permanent property of the page table
> when it was allocated and not change during attach/detach of devices.
>
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
> drivers/iommu/intel/iommu.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index ceb960a796e2ba..72b0769c391008 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1391,7 +1391,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
> if (--info->refcnt == 0) {
> ida_free(&iommu->domain_ida, info->did);
> xa_erase(&domain->iommu_array, iommu->seq_id);
> - domain->nid = NUMA_NO_NODE;
It appears that this is a fix. Before domain_alloc()'s retirement, no
device pointer was passed to the domain allocation callback. Thus, the
NUMA node ID could only be determined when the domain was attached to a
device. Since we have introduced domain_alloc_paging, this should be
removed. So,
Fixes: 7c204426b818 ("iommu/vt-d: Add domain_alloc_paging support")
?
Thanks,
baolu
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 7/7] iommu/vtd: Split paging_domain_compatible()
2025-06-09 19:58 ` [PATCH 7/7] iommu/vtd: Split paging_domain_compatible() Jason Gunthorpe
@ 2025-06-10 7:12 ` Baolu Lu
2025-06-10 23:51 ` Jason Gunthorpe
0 siblings, 1 reply; 24+ messages in thread
From: Baolu Lu @ 2025-06-10 7:12 UTC (permalink / raw)
To: Jason Gunthorpe, David Woodhouse, iommu, Joerg Roedel,
Robin Murphy, Will Deacon
Cc: patches, Wei Wang
On 6/10/25 03:58, Jason Gunthorpe wrote:
> Make First/Second stage specific functions that follow the same pattern in
> intel_iommu_domain_alloc_first/second_stage() for computing
> EOPNOTSUPP. This makes the code easier to understand as if we couldn't
> create a domain with the parameters for this IOMMU instance then we
> certainly are not compatible with it.
>
> Check superpage support directly against the per-stage cap bits and the
> pgsize_bitmap.
>
> Add a note that the force_snooping is read without locking. The locking
> needs to cover the compatible check and the add of the device to the list.
>
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
> drivers/iommu/intel/iommu.c | 66 ++++++++++++++++++++++++++++++-------
> 1 file changed, 54 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index ab2e9fef75293c..a482d1b77d1203 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3416,33 +3416,75 @@ static void intel_iommu_domain_free(struct iommu_domain *domain)
> kfree(dmar_domain);
> }
>
> +static int paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain,
> + struct intel_iommu *iommu)
> +{
> + if (WARN_ON(dmar_domain->domain.dirty_ops ||
> + dmar_domain->nested_parent))
> + return -EINVAL;
> +
> + /* Only SL is available in legacy mode */
> + if (!sm_supported(iommu) || !ecap_flts(iommu->ecap))
> + return -EINVAL;
> +
> + /* Same page size support */
> + if (!cap_fl1gp_support(iommu->cap) &&
> + (dmar_domain->domain.pgsize_bitmap & SZ_1G))
> + return -EINVAL;
> + return 0;
> +}
> +
> +static int
> +paging_domain_compatible_second_stage(struct dmar_domain *dmar_domain,
> + struct intel_iommu *iommu)
> +{
> + unsigned int sslps = cap_super_page_val(iommu->cap);
> +
> + if (dmar_domain->domain.dirty_ops && !ssads_supported(iommu))
> + return -EINVAL;
> + if (dmar_domain->nested_parent && !nested_supported(iommu))
> + return -EINVAL;
> +
> + /* Legacy mode always supports second stage */
> + if (sm_supported(iommu) && !ecap_slts(iommu->ecap))
> + return -EINVAL;
> +
> + /* Same page size support */
> + if (!(sslps & BIT(0)) && (dmar_domain->domain.pgsize_bitmap & SZ_2M))
> + return -EINVAL;
> + if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G))
> + return -EINVAL;
> + return 0;
> +}
> +
> int paging_domain_compatible(struct iommu_domain *domain, struct device *dev)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
> struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> struct intel_iommu *iommu = info->iommu;
> + int ret = -EINVAL;
> int addr_width;
>
> - if (WARN_ON_ONCE(!(domain->type & __IOMMU_DOMAIN_PAGING)))
> - return -EPERM;
> + if (domain->ops == &intel_fs_paging_domain_ops)
> + ret = paging_domain_compatible_first_stage(dmar_domain, iommu);
> + else if (domain->ops == &intel_ss_paging_domain_ops)
> + ret = paging_domain_compatible_second_stage(dmar_domain, iommu);
> + else if (WARN_ON(true))
> + ret = -EINVAL;
> + if (ret)
> + return ret;
>
> + /*
> + * FIXME this is locked wrong, it needs to be under the
> + * dmar_domain->lock
> + */
> if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
> return -EINVAL;
Perhaps we can use group->mutex to fix this in the future?
paging_domain_compatible() is in the domain attaching path, which is
already synchronized by group->mutex. We can further expose an iommu
interface for cache coherency enforcement, which would also apply group-
>mutex.
Thanks,
baolu
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags()
2025-06-09 19:58 ` [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
@ 2025-06-10 9:14 ` Wang, Wei W
2025-06-10 13:25 ` Jason Gunthorpe
0 siblings, 1 reply; 24+ messages in thread
From: Wang, Wei W @ 2025-06-10 9:14 UTC (permalink / raw)
To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
Joerg Roedel, Robin Murphy, Will Deacon
Cc: patches@lists.linux.dev
On Tuesday, June 10, 2025 3:58 AM, Jason Gunthorpe wrote:
> Create stage specific functions that check the stage specific conditions if each
> stage can be supported.
>
> Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence
> until one does not return EOPNOTSUPP and prefer to use the first stage if
> available and suitable for the requested flags.
>
> Move second stage only operations like nested_parent and dirty_tracking
> into the second stage function for clarity.
>
> Move initialization of the iommu_domain members into
> paging_domain_alloc().
>
> Drop initialization of domain->owner as the callers all do it.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/intel/iommu.c | 98 +++++++++++++++++++++----------------
> 1 file changed, 57 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 72b0769c391008..6987a46c2e8421 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3267,10 +3267,15 @@ static struct dmar_domain
> *paging_domain_alloc(struct device *dev, bool first_st
> spin_lock_init(&domain->lock);
> spin_lock_init(&domain->cache_lock);
> xa_init(&domain->iommu_array);
> + INIT_LIST_HEAD(&domain->s1_domains);
> + spin_lock_init(&domain->s1_lock);
>
> domain->nid = dev_to_node(dev);
> domain->use_first_level = first_stage;
>
> + domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
> + domain->domain.ops = intel_iommu_ops.default_domain_ops;
> +
> /* calculate the address width */
> addr_width = agaw_to_width(iommu->agaw);
> if (addr_width > cap_mgaw(iommu->cap)) @@ -3312,62 +3317,73
> @@ static struct dmar_domain *paging_domain_alloc(struct device *dev,
> bool first_st }
>
> static struct iommu_domain *
> -intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> - const struct iommu_user_data *user_data)
> +intel_iommu_domain_alloc_first_stage(struct device *dev,
> + struct intel_iommu *iommu, u32 flags) {
> + struct dmar_domain *dmar_domain;
> +
> + if (flags & (~IOMMU_HWPT_ALLOC_PASID))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + /* Only SL is available in legacy mode */
> + if (!sm_supported(iommu) || !ecap_flts(iommu->ecap))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + dmar_domain = paging_domain_alloc(dev, true);
> + if (IS_ERR(dmar_domain))
> + return ERR_CAST(dmar_domain);
> + return &dmar_domain->domain;
> +}
> +
> +static struct iommu_domain *
> +intel_iommu_domain_alloc_second_stage(struct device *dev,
> + struct intel_iommu *iommu, u32 flags)
> {
> - struct device_domain_info *info = dev_iommu_priv_get(dev);
> - bool dirty_tracking = flags &
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> - bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
> - struct intel_iommu *iommu = info->iommu;
> struct dmar_domain *dmar_domain;
> - struct iommu_domain *domain;
> - bool first_stage;
>
> if (flags &
> (~(IOMMU_HWPT_ALLOC_NEST_PARENT |
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> IOMMU_HWPT_ALLOC_PASID)))
> return ERR_PTR(-EOPNOTSUPP);
> - if (nested_parent && !nested_supported(iommu))
> - return ERR_PTR(-EOPNOTSUPP);
> - if (user_data || (dirty_tracking && !ssads_supported(iommu)))
> +
> + if (((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
> + !nested_supported(iommu)) ||
> + ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
> + !ssads_supported(iommu)))
> return ERR_PTR(-EOPNOTSUPP);
>
> - /*
> - * Always allocate the guest compatible page table unless
> - * IOMMU_HWPT_ALLOC_NEST_PARENT or
> IOMMU_HWPT_ALLOC_DIRTY_TRACKING
> - * is specified.
> - */
> - if (nested_parent || dirty_tracking) {
> - if (!sm_supported(iommu) || !ecap_slts(iommu->ecap))
> - return ERR_PTR(-EOPNOTSUPP);
> - first_stage = false;
> - } else {
> - first_stage = first_level_by_default(iommu);
> - }
> + /* Legacy mode always supports second stage */
> + if (sm_supported(iommu) && !ecap_slts(iommu->ecap))
> + return ERR_PTR(-EOPNOTSUPP);
>
> - dmar_domain = paging_domain_alloc(dev, first_stage);
> + dmar_domain = paging_domain_alloc(dev, false);
> if (IS_ERR(dmar_domain))
> return ERR_CAST(dmar_domain);
> - domain = &dmar_domain->domain;
> - domain->type = IOMMU_DOMAIN_UNMANAGED;
> - domain->owner = &intel_iommu_ops;
> - domain->ops = intel_iommu_ops.default_domain_ops;
>
> - if (nested_parent) {
> - dmar_domain->nested_parent = true;
> - INIT_LIST_HEAD(&dmar_domain->s1_domains);
> - spin_lock_init(&dmar_domain->s1_lock);
> - }
> + dmar_domain->nested_parent = flags &
> IOMMU_HWPT_ALLOC_NEST_PARENT;
>
> - if (dirty_tracking) {
> - if (dmar_domain->use_first_level) {
> - iommu_domain_free(domain);
> - return ERR_PTR(-EOPNOTSUPP);
> - }
> - domain->dirty_ops = &intel_dirty_ops;
> - }
> + if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
> + dmar_domain->domain.dirty_ops = &intel_dirty_ops;
>
> - return domain;
> + return &dmar_domain->domain;
> +}
> +
> +static struct iommu_domain *
> +intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> + const struct iommu_user_data *user_data)
> {
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct intel_iommu *iommu = info->iommu;
> + struct iommu_domain *domain;
> +
> + if (user_data)
Curious how is this user_data used. I check all the drivers (intel/amd/arm/selftest),
seems nobody supports it. It seems a bit confusing to have it everywhere without
real usages. I had a plan to remove it from all the interfaces (not sure if any
misunderstanding)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + /* Prefer fist stage if possible by default. */
fist --> first
> + domain = intel_iommu_domain_alloc_first_stage(dev, iommu, flags);
> + if (domain != ERR_PTR(-EOPNOTSUPP))
Should be "if (!IS_ERR(domain))" here?
paging_domain_alloc() could also return ERR_PTR(-ENOMEM)
> + return domain;
> + return intel_iommu_domain_alloc_second_stage(dev, iommu, flags);
> }
>
> static void intel_iommu_domain_free(struct iommu_domain *domain)
> --
> 2.43.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach
2025-06-09 19:58 ` [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach Jason Gunthorpe
2025-06-10 6:41 ` Baolu Lu
@ 2025-06-10 9:14 ` Wang, Wei W
1 sibling, 0 replies; 24+ messages in thread
From: Wang, Wei W @ 2025-06-10 9:14 UTC (permalink / raw)
To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
Joerg Roedel, Robin Murphy, Will Deacon
Cc: patches@lists.linux.dev
On Tuesday, June 10, 2025 3:58 AM, Jason Gunthorpe wrote:
> To: Lu Baolu <baolu.lu@linux.intel.com>; David Woodhouse
> <dwmw2@infradead.org>; iommu@lists.linux.dev; Joerg Roedel
> <joro@8bytes.org>; Robin Murphy <robin.murphy@arm.com>; Will Deacon
> <will@kernel.org>
> Cc: patches@lists.linux.dev; Wang, Wei W <wei.w.wang@intel.com>
> Subject: [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when
> devices detach
>
> The NID is used to control which NUMA node memory for the page table is
> allocated it from. It should be a permanent property of the page table when
> it was allocated and not change during attach/detach of devices.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/intel/iommu.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index ceb960a796e2ba..72b0769c391008 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1391,7 +1391,6 @@ void domain_detach_iommu(struct dmar_domain
> *domain, struct intel_iommu *iommu)
> if (--info->refcnt == 0) {
> ida_free(&iommu->domain_ida, info->did);
> xa_erase(&domain->iommu_array, iommu->seq_id);
> - domain->nid = NUMA_NO_NODE;
> kfree(info);
> }
> }
> --
> 2.43.0
Reviewed-by: Wei Wang <wei.w.wang@intel.com>
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage
2025-06-09 19:58 ` [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage Jason Gunthorpe
@ 2025-06-10 9:14 ` Wang, Wei W
2025-06-10 13:26 ` Jason Gunthorpe
0 siblings, 1 reply; 24+ messages in thread
From: Wang, Wei W @ 2025-06-10 9:14 UTC (permalink / raw)
To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
Joerg Roedel, Robin Murphy, Will Deacon
Cc: patches@lists.linux.dev
On Tuesday, June 10, 2025 3:58 AM, Jason Gunthorpe wrote:
> Use the domain ops pointer to tell what kind of domain it is instead of the
> internal use_first_level indication. This also protects against wrongly using a
> SVA/nested/IDENTITY/BLOCKED domain type in places they should not be.
>
> The only remaining uses of use_first_level outside the paging domain are in
> paging_domain_compatible() and intel_iommu_enforce_cache_coherency().
>
> Thus, remove the useless sets of use_first_level in
> intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of
> the unique ops for these domain types ever reference it on their call chains.
>
> Add a WARN_ON() check in domain_context_mapping_one() as it only works
> with second stage.
>
> This is preparation for iommupt which will have different ops for each of the
> stages.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/intel/cache.c | 5 ++--
> drivers/iommu/intel/iommu.c | 58 +++++++++++++++++++++++++-----------
> drivers/iommu/intel/iommu.h | 2 ++
> drivers/iommu/intel/nested.c | 3 +-
> drivers/iommu/intel/svm.c | 1 -
> 5 files changed, 47 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c index
> fc35cba5914532..29b8af6487512c 100644
> --- a/drivers/iommu/intel/cache.c
> +++ b/drivers/iommu/intel/cache.c
> @@ -371,7 +371,7 @@ static void cache_tag_flush_iotlb(struct
> dmar_domain *domain, struct cache_tag *
> struct intel_iommu *iommu = tag->iommu;
> u64 type = DMA_TLB_PSI_FLUSH;
>
> - if (domain->use_first_level) {
> + if (domain->domain.ops == &intel_fs_paging_domain_ops) {
> qi_batch_add_piotlb(iommu, tag->domain_id, tag->pasid,
> addr,
> pages, ih, domain->qi_batch);
> return;
> @@ -546,7 +546,8 @@ void cache_tag_flush_range_np(struct dmar_domain
> *domain, unsigned long start,
> qi_batch_flush_descs(iommu, domain->qi_batch);
> iommu = tag->iommu;
>
> - if (!cap_caching_mode(iommu->cap) || domain-
> >use_first_level) {
> + if (!cap_caching_mode(iommu->cap) ||
> + domain->domain.ops == &intel_fs_paging_domain_ops) {
> iommu_flush_write_buffer(iommu);
> continue;
> }
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6987a46c2e8421..820a4ef9396d68 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1462,6 +1462,9 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
> struct context_entry *context;
> int ret;
>
> + if (WARN_ON(domain->domain.ops !=
> &intel_ss_paging_domain_ops))
> + return -EINVAL;
> +
> pr_debug("Set context mapping for %02x:%02x.%d\n",
> bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>
> @@ -1800,12 +1803,14 @@ static int dmar_domain_attach_device(struct
> dmar_domain *domain,
>
> if (!sm_supported(iommu))
> ret = domain_context_mapping(domain, dev);
> - else if (domain->use_first_level)
> + else if (domain->domain.ops == &intel_fs_paging_domain_ops)
> ret = domain_setup_first_level(iommu, domain, dev,
> IOMMU_NO_PASID, NULL);
> - else
> + else if (domain->domain.ops == &intel_ss_paging_domain_ops)
> ret = domain_setup_second_level(iommu, domain, dev,
> IOMMU_NO_PASID, NULL);
> + else if (WARN_ON(true))
> + ret = -EINVAL;
>
> if (ret)
> goto out_block_translation;
> @@ -3274,7 +3279,6 @@ static struct dmar_domain
> *paging_domain_alloc(struct device *dev, bool first_st
> domain->use_first_level = first_stage;
>
> domain->domain.type = IOMMU_DOMAIN_UNMANAGED;
> - domain->domain.ops = intel_iommu_ops.default_domain_ops;
>
> /* calculate the address width */
> addr_width = agaw_to_width(iommu->agaw); @@ -3332,6 +3336,8
> @@ intel_iommu_domain_alloc_first_stage(struct device *dev,
> dmar_domain = paging_domain_alloc(dev, true);
> if (IS_ERR(dmar_domain))
> return ERR_CAST(dmar_domain);
> +
> + dmar_domain->domain.ops = &intel_fs_paging_domain_ops;
> return &dmar_domain->domain;
> }
>
> @@ -3360,6 +3366,7 @@ intel_iommu_domain_alloc_second_stage(struct
> device *dev,
> if (IS_ERR(dmar_domain))
> return ERR_CAST(dmar_domain);
>
> + dmar_domain->domain.ops = &intel_ss_paging_domain_ops;
> dmar_domain->nested_parent = flags &
> IOMMU_HWPT_ALLOC_NEST_PARENT;
>
> if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) @@ -4080,12
> +4087,15 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain
> *domain,
> if (ret)
> goto out_remove_dev_pasid;
>
> - if (dmar_domain->use_first_level)
> + if (domain->ops == &intel_fs_paging_domain_ops)
> ret = domain_setup_first_level(iommu, dmar_domain,
> dev, pasid, old);
> - else
> + else if (domain->ops == &intel_ss_paging_domain_ops)
> ret = domain_setup_second_level(iommu, dmar_domain,
> dev, pasid, old);
> + else if (WARN_ON(true))
> + ret = -EINVAL;
> +
> if (ret)
> goto out_unwind_iopf;
>
> @@ -4360,6 +4370,32 @@ static struct iommu_domain identity_domain = {
> },
> };
>
> +const struct iommu_domain_ops intel_fs_paging_domain_ops = {
> + .attach_dev = intel_iommu_attach_device,
> + .set_dev_pasid = intel_iommu_set_dev_pasid,
> + .map_pages = intel_iommu_map_pages,
> + .unmap_pages = intel_iommu_unmap_pages,
> + .iotlb_sync_map = intel_iommu_iotlb_sync_map,
> + .flush_iotlb_all = intel_flush_iotlb_all,
> + .iotlb_sync = intel_iommu_tlb_sync,
> + .iova_to_phys = intel_iommu_iova_to_phys,
> + .free = intel_iommu_domain_free,
> + .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
> +};
> +
> +const struct iommu_domain_ops intel_ss_paging_domain_ops = {
> + .attach_dev = intel_iommu_attach_device,
> + .set_dev_pasid = intel_iommu_set_dev_pasid,
> + .map_pages = intel_iommu_map_pages,
> + .unmap_pages = intel_iommu_unmap_pages,
> + .iotlb_sync_map = intel_iommu_iotlb_sync_map,
> + .flush_iotlb_all = intel_flush_iotlb_all,
> + .iotlb_sync = intel_iommu_tlb_sync,
> + .iova_to_phys = intel_iommu_iova_to_phys,
> + .free = intel_iommu_domain_free,
> + .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
> +};
> +
IIUC, the only different callback for fs and ss is .enforce_cache_coherency (from patch 6).
Have you considered splitting them by handling the distinction within
intel_iommu_enforce_cache_coherency() itself:
static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
{
if (dmar_domain->use_first_level)
intel_iommu_enforce_cache_coherency_fs(...);
else
intel_iommu_enforce_cache_coherency_ss(...)
}
This could avoid above code duplications.
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free()
2025-06-09 19:58 ` [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
@ 2025-06-10 9:15 ` Wang, Wei W
2025-06-10 13:29 ` Jason Gunthorpe
0 siblings, 1 reply; 24+ messages in thread
From: Wang, Wei W @ 2025-06-10 9:15 UTC (permalink / raw)
To: Jason Gunthorpe, Lu Baolu, David Woodhouse, iommu@lists.linux.dev,
Joerg Roedel, Robin Murphy, Will Deacon
Cc: patches@lists.linux.dev
On Tuesday, June 10, 2025 3:58 AM, Jason Gunthorpe wrote:
> intel_iommu_domain_free()
>
> It has only one caller, no need for two functions.
>
> Correct the WARN_ON() error handling to leak the entire page table if the
> HW is still referencing it so we don't UAF during WARN_ON recovery.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/intel/iommu.c | 33 +++++++++++++++------------------
> 1 file changed, 15 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index b42bb091088ff5..ceb960a796e2ba 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1396,23 +1396,6 @@ void domain_detach_iommu(struct
> dmar_domain *domain, struct intel_iommu *iommu)
> }
> }
>
> -static void domain_exit(struct dmar_domain *domain) -{
> - if (domain->pgd) {
> - struct iommu_pages_list freelist =
> - IOMMU_PAGES_LIST_INIT(freelist);
> -
> - domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain-
> >gaw), &freelist);
> - iommu_put_pages_list(&freelist);
> - }
> -
> - if (WARN_ON(!list_empty(&domain->devices)))
> - return;
> -
> - kfree(domain->qi_batch);
> - kfree(domain);
> -}
> -
> /*
> * For kdump cases, old valid entries may be cached due to the
> * in-flight DMA and copied pgtable, but there is no unmapping @@ -3394,7
> +3377,21 @@ static void intel_iommu_domain_free(struct iommu_domain
> *domain)
>
> WARN_ON(dmar_domain->nested_parent &&
> !list_empty(&dmar_domain->s1_domains));
Not sure if UFA could also occur here in the nested case, as there
are active s1_domains nested on this domain, but this domain's
paging is going to be freed. If so, we may need to merge the two
warnings and return:
+ bool is_nested_s1_active = dmar_domain->nested_parent &&
+ !list_empty(&dmar_domain->s1_domains);
- WARN_ON(dmar_domain->nested_parent &&
- !list_empty(&dmar_domain->s1_domains));
-
- if (WARN_ON(!list_empty(&dmar_domain->devices)))
+ if (WARN_ON(is_nested_s1_active ||
+ !list_empty(&dmar_domain->devices)))
> - domain_exit(dmar_domain);
> +
> + if (WARN_ON(!list_empty(&dmar_domain->devices)))
> + return;
> +
> + if (dmar_domain->pgd) {
> + struct iommu_pages_list freelist =
> + IOMMU_PAGES_LIST_INIT(freelist);
> +
> + domain_unmap(dmar_domain, 0,
> DOMAIN_MAX_PFN(dmar_domain->gaw),
> + &freelist);
> + iommu_put_pages_list(&freelist);
> + }
> +
> + kfree(dmar_domain->qi_batch);
> + kfree(dmar_domain);
> }
>
> int paging_domain_compatible(struct iommu_domain *domain, struct
> device *dev)
> --
> 2.43.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach
2025-06-10 6:41 ` Baolu Lu
@ 2025-06-10 13:18 ` Jason Gunthorpe
0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-10 13:18 UTC (permalink / raw)
To: Baolu Lu
Cc: David Woodhouse, iommu, Joerg Roedel, Robin Murphy, Will Deacon,
patches, Wei Wang
On Tue, Jun 10, 2025 at 02:41:11PM +0800, Baolu Lu wrote:
> On 6/10/25 03:58, Jason Gunthorpe wrote:
> > The NID is used to control which NUMA node memory for the page table is
> > allocated it from. It should be a permanent property of the page table
> > when it was allocated and not change during attach/detach of devices.
> >
> > Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> > ---
> > drivers/iommu/intel/iommu.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index ceb960a796e2ba..72b0769c391008 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1391,7 +1391,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
> > if (--info->refcnt == 0) {
> > ida_free(&iommu->domain_ida, info->did);
> > xa_erase(&domain->iommu_array, iommu->seq_id);
> > - domain->nid = NUMA_NO_NODE;
>
> It appears that this is a fix.
Hard to say, but sure it could be..
> Fixes: 7c204426b818 ("iommu/vt-d: Add domain_alloc_paging support")
Arguably it is before this, the NID should not have been cleared once
set by attach. The above only fixed things to have the NID set from
the creation side..
I'd pick:
Fixes: ba949f4cd4c3 ("iommu/vt-d: Refactor iommu information of each domain")
As the patch that introduced the NUMA_NO_NODE on the detach flow in
the first place..
But that would encourage people to back port this quite far, and I
don't think that's really a good idea since it isn't something someone
cares about, and maybe there is good reason the above had it..
So I'd leave it with no fixes line..
Thanks,
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags()
2025-06-10 9:14 ` Wang, Wei W
@ 2025-06-10 13:25 ` Jason Gunthorpe
0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-10 13:25 UTC (permalink / raw)
To: Wang, Wei W
Cc: Lu Baolu, David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
Robin Murphy, Will Deacon, patches@lists.linux.dev
On Tue, Jun 10, 2025 at 09:14:05AM +0000, Wang, Wei W wrote:
> > +static struct iommu_domain *
> > +intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> > + const struct iommu_user_data *user_data)
> > {
> > + struct device_domain_info *info = dev_iommu_priv_get(dev);
> > + struct intel_iommu *iommu = info->iommu;
> > + struct iommu_domain *domain;
> > +
> > + if (user_data)
>
> Curious how is this user_data used. I check all the drivers (intel/amd/arm/selftest),
> seems nobody supports it.
It is a stand in for drivers to allow userspace to customize the page
table format. Only nesting domain types use it right now and since
they were split out the paging ones became disused.
> It seems a bit confusing to have it everywhere without
> real usages. I had a plan to remove it from all the interfaces (not sure if any
> misunderstanding)
Please just leave it, it will get used at some point
> > +
> > + /* Prefer fist stage if possible by default. */
>
> fist --> first
Yep
> > + domain = intel_iommu_domain_alloc_first_stage(dev, iommu, flags);
> > + if (domain != ERR_PTR(-EOPNOTSUPP))
> > + return domain;
>
> Should be "if (!IS_ERR(domain))" here?
> paging_domain_alloc() could also return ERR_PTR(-ENOMEM)
If alloc_first_stage() returns -ENOMEM then the first stage was
supported and we ran out of memory, the entire function should fail
and we don't try the second stage (since first was supported).
If we changed it as you propse then we'd never try the second stage.
EOPNOTSUPP means the first stage is not supported and the second stage
should be tried.
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage
2025-06-10 9:14 ` Wang, Wei W
@ 2025-06-10 13:26 ` Jason Gunthorpe
0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-10 13:26 UTC (permalink / raw)
To: Wang, Wei W
Cc: Lu Baolu, David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
Robin Murphy, Will Deacon, patches@lists.linux.dev
On Tue, Jun 10, 2025 at 09:14:55AM +0000, Wang, Wei W wrote:
> IIUC, the only different callback for fs and ss is .enforce_cache_coherency (from patch 6).
> Have you considered splitting them by handling the distinction within
> intel_iommu_enforce_cache_coherency() itself:
The iommupt patches add more different function pointers, which is why
this is being done.
This series also uses the ops function pointer as a kind of 'type' to
tell the difference between sva/first/second/blocking/identity
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free()
2025-06-10 9:15 ` Wang, Wei W
@ 2025-06-10 13:29 ` Jason Gunthorpe
0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-10 13:29 UTC (permalink / raw)
To: Wang, Wei W
Cc: Lu Baolu, David Woodhouse, iommu@lists.linux.dev, Joerg Roedel,
Robin Murphy, Will Deacon, patches@lists.linux.dev
On Tue, Jun 10, 2025 at 09:15:14AM +0000, Wang, Wei W wrote:
> > WARN_ON(dmar_domain->nested_parent &&
> > !list_empty(&dmar_domain->s1_domains));
>
> Not sure if UFA could also occur here in the nested case, as there
> are active s1_domains nested on this domain, but this domain's
> paging is going to be freed. If so, we may need to merge the two
> warnings and return:
Yep, I did it like this:
if (WARN_ON(dmar_domain->nested_parent &&
!list_empty(&dmar_domain->s1_domains)))
return;
if (WARN_ON(!list_empty(&dmar_domain->devices)))
return;
So if the WARN_ON hits each case gets a unique line number.
Thanks,
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
2025-06-09 19:58 ` [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
@ 2025-06-10 18:34 ` Jason Gunthorpe
0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-10 18:34 UTC (permalink / raw)
To: Lu Baolu, David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
Will Deacon
Cc: patches, Wei Wang
On Mon, Jun 09, 2025 at 04:58:20PM -0300, Jason Gunthorpe wrote:
> Pass the phys_addr_t down through the call chain from the top instead of
> @@ -412,7 +413,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
> }
>
> int intel_pasid_replace_first_level(struct intel_iommu *iommu,
> - struct device *dev, pgd_t *pgd,
> + struct device *dev, u64 fsptptr,
0-day pointed out this should be 'phys_addr_t fsptptr'
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 7/7] iommu/vtd: Split paging_domain_compatible()
2025-06-10 7:12 ` Baolu Lu
@ 2025-06-10 23:51 ` Jason Gunthorpe
2025-06-11 4:50 ` Baolu Lu
0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-10 23:51 UTC (permalink / raw)
To: Baolu Lu
Cc: David Woodhouse, iommu, Joerg Roedel, Robin Murphy, Will Deacon,
patches, Wei Wang
On Tue, Jun 10, 2025 at 03:12:43PM +0800, Baolu Lu wrote:
> > + /*
> > + * FIXME this is locked wrong, it needs to be under the
> > + * dmar_domain->lock
> > + */
> > if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
> > return -EINVAL;
>
> Perhaps we can use group->mutex to fix this in the future?
It is part of the iommu_domain, there is no clear group->mutex that
is affiliated with a domain, there can be multiple devices attached..
Maybe we should change the enforce coherency away from a function op
and into a domain creation flag. Then it won't change at run time,
fixing this race, and we don't have the issue of possibly having PTEs
with the wrong bit already mapped into the domain.
When this was first done that was harder, but now I think we have
enough infrastructure to do it easily..
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 7/7] iommu/vtd: Split paging_domain_compatible()
2025-06-10 23:51 ` Jason Gunthorpe
@ 2025-06-11 4:50 ` Baolu Lu
2025-06-12 13:47 ` Jason Gunthorpe
0 siblings, 1 reply; 24+ messages in thread
From: Baolu Lu @ 2025-06-11 4:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Woodhouse, iommu, Joerg Roedel, Robin Murphy, Will Deacon,
patches, Wei Wang
On 6/11/25 07:51, Jason Gunthorpe wrote:
> On Tue, Jun 10, 2025 at 03:12:43PM +0800, Baolu Lu wrote:
>>> + /*
>>> + * FIXME this is locked wrong, it needs to be under the
>>> + * dmar_domain->lock
>>> + */
>>> if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
>>> return -EINVAL;
>>
>> Perhaps we can use group->mutex to fix this in the future?
>
> It is part of the iommu_domain, there is no clear group->mutex that
> is affiliated with a domain, there can be multiple devices attached..
Fair enough.
> Maybe we should change the enforce coherency away from a function op
> and into a domain creation flag. Then it won't change at run time,
> fixing this race, and we don't have the issue of possibly having PTEs
> with the wrong bit already mapped into the domain.
That's a better approach. enforce_cache_coherency is only used in vfio
and iommufd, where the cache coherency is enforced just after the domain
is allocated.
> When this was first done that was harder, but now I think we have
> enough infrastructure to do it easily..
Yes. I can follow up with a patch series for broad discussion if you
don't want to include it in this series.
Thanks,
baolu
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
` (6 preceding siblings ...)
2025-06-09 19:58 ` [PATCH 7/7] iommu/vtd: Split paging_domain_compatible() Jason Gunthorpe
@ 2025-06-12 7:18 ` Joerg Roedel
7 siblings, 0 replies; 24+ messages in thread
From: Joerg Roedel @ 2025-06-12 7:18 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Lu Baolu, David Woodhouse, iommu, Robin Murphy, Will Deacon,
patches, Wei Wang
Hi Jason,
On Mon, Jun 09, 2025 at 04:58:19PM -0300, Jason Gunthorpe wrote:
> Jason Gunthorpe (7):
> iommu/vtd: Lift the __pa to
> domain_setup_first_level/intel_svm_set_dev_pasid()
> iommu/vtd: Fold domain_exit() into intel_iommu_domain_free()
> iommu/vtd: Do not wipe out the page table NID when devices detach
> iommu/vtd: Split intel_iommu_domain_alloc_paging_flags()
> iommu/vtd: Create unique domain ops for each stage
> iommu/vtd: Split intel_iommu_enforce_cache_coherency()
> iommu/vtd: Split paging_domain_compatible()
Nit: The correct prefix for VT-d patches is "iommu/vt-d".
Regards,
Joerg
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 7/7] iommu/vtd: Split paging_domain_compatible()
2025-06-11 4:50 ` Baolu Lu
@ 2025-06-12 13:47 ` Jason Gunthorpe
2025-06-13 3:15 ` Baolu Lu
0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-06-12 13:47 UTC (permalink / raw)
To: Baolu Lu
Cc: David Woodhouse, iommu, Joerg Roedel, Robin Murphy, Will Deacon,
patches, Wei Wang
On Wed, Jun 11, 2025 at 12:50:11PM +0800, Baolu Lu wrote:
> > When this was first done that was harder, but now I think we have
> > enough infrastructure to do it easily..
>
> Yes. I can follow up with a patch series for broad discussion if you
> don't want to include it in this series.
Please, I'd rather keep this as is.
There is also a problem here with AMD, it turns out only its V1 page
table can support disabling no-snoop, the v2 page table cannot and
there is nothing like the PGSNP there.
The iommufd side assumes that if a device supports coherencey then all
paging iommu_domains support it.. Maybe a creation flag can fix that
also?
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 7/7] iommu/vtd: Split paging_domain_compatible()
2025-06-12 13:47 ` Jason Gunthorpe
@ 2025-06-13 3:15 ` Baolu Lu
0 siblings, 0 replies; 24+ messages in thread
From: Baolu Lu @ 2025-06-13 3:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: David Woodhouse, iommu, Joerg Roedel, Robin Murphy, Will Deacon,
patches, Wei Wang
On 6/12/25 21:47, Jason Gunthorpe wrote:
> On Wed, Jun 11, 2025 at 12:50:11PM +0800, Baolu Lu wrote:
>
>>> When this was first done that was harder, but now I think we have
>>> enough infrastructure to do it easily..
>> Yes. I can follow up with a patch series for broad discussion if you
>> don't want to include it in this series.
> Please, I'd rather keep this as is.
>
> There is also a problem here with AMD, it turns out only its V1 page
> table can support disabling no-snoop, the v2 page table cannot and
> there is nothing like the PGSNP there.
>
> The iommufd side assumes that if a device supports coherencey then all
> paging iommu_domains support it.. Maybe a creation flag can fix that
> also?
I guess so. For the AMD case, a FORCE_COHERENCY allocation flag
indicates that the IOMMU driver should select a right page table type to
meet this requirement. This requirement will also be enforced in the
domain attaching paths, meaning it will be an incompatible failure if a
FORCE_COHERENCY domain is attached to any device that doesn't support
this feature.
Anyway, we can discuss the details when we have that code.
Thanks,
baolu
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2025-06-13 3:16 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 19:58 [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 1/7] iommu/vtd: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Jason Gunthorpe
2025-06-10 18:34 ` Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 2/7] iommu/vtd: Fold domain_exit() into intel_iommu_domain_free() Jason Gunthorpe
2025-06-10 9:15 ` Wang, Wei W
2025-06-10 13:29 ` Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 3/7] iommu/vtd: Do not wipe out the page table NID when devices detach Jason Gunthorpe
2025-06-10 6:41 ` Baolu Lu
2025-06-10 13:18 ` Jason Gunthorpe
2025-06-10 9:14 ` Wang, Wei W
2025-06-09 19:58 ` [PATCH 4/7] iommu/vtd: Split intel_iommu_domain_alloc_paging_flags() Jason Gunthorpe
2025-06-10 9:14 ` Wang, Wei W
2025-06-10 13:25 ` Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 5/7] iommu/vtd: Create unique domain ops for each stage Jason Gunthorpe
2025-06-10 9:14 ` Wang, Wei W
2025-06-10 13:26 ` Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 6/7] iommu/vtd: Split intel_iommu_enforce_cache_coherency() Jason Gunthorpe
2025-06-09 19:58 ` [PATCH 7/7] iommu/vtd: Split paging_domain_compatible() Jason Gunthorpe
2025-06-10 7:12 ` Baolu Lu
2025-06-10 23:51 ` Jason Gunthorpe
2025-06-11 4:50 ` Baolu Lu
2025-06-12 13:47 ` Jason Gunthorpe
2025-06-13 3:15 ` Baolu Lu
2025-06-12 7:18 ` [PATCH 0/7] Reorganize Intel VT-D to be ready for iommupt Joerg Roedel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).