public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support
@ 2024-10-11  4:27 Lu Baolu
  2024-10-11  4:27 ` [PATCH 1/7] " Lu Baolu
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

The Intel iommu driver will now use the domain_alloc_paging callback and
remove the legacy domain_alloc callback. This ensures that a valid
device pointer is provided whenever a paging domain is allocated, and
all dmar_domain attributes can be set up at the time of allocation.

Both first-stage and second-stage page tables can be used for a paging
domain. Unless IOMMU_HWPT_ALLOC_NEST_PARENT or
IOMMU_HWPT_ALLOC_DIRTY_TRACKING is specified during paging domain
allocation, this driver will try to use first-stage page tables if the
hardware is capable. This is assuming that the first-stage page table is
compatible with both the host and guest kernels.

The whole series is also available on GitHub:
https://github.com/LuBaolu/intel-iommu/commits/vtd-domain_alloc_paging-v1

Please help review and comment.

Lu Baolu (7):
  iommu/vt-d: Add domain_alloc_paging support
  iommu/vt-d: Remove unused domain_alloc callback
  iommu/vt-d: Enhance compatibility check for paging domain attach
  iommu/vt-d: Remove domain_update_iommu_cap()
  iommu/vt-d: Remove domain_update_iommu_superpage()
  iommu/vt-d: Refactor first_level_by_default()
  iommu/vt-d: Refine intel_iommu_domain_alloc_user()

 drivers/iommu/intel/iommu.h |   1 -
 drivers/iommu/intel/iommu.c | 328 +++++++-----------------------------
 drivers/iommu/intel/pasid.c |  28 +--
 3 files changed, 62 insertions(+), 295 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] iommu/vt-d: Add domain_alloc_paging support
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 13:22   ` Jason Gunthorpe
  2024-10-11  4:27 ` [PATCH 2/7] iommu/vt-d: Remove unused domain_alloc callback Lu Baolu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

Add the domain_alloc_paging callback for domain allocation using the
iommu_paging_domain_alloc() interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f6b0780f2ef..4803e0cb8279 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4590,6 +4590,19 @@ static struct iommu_domain identity_domain = {
 	},
 };
 
+static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev)
+{
+	struct dmar_domain *dmar_domain;
+	bool first_stage;
+
+	first_stage = first_level_by_default(0);
+	dmar_domain = paging_domain_alloc(dev, first_stage);
+	if (IS_ERR(dmar_domain))
+		return ERR_CAST(dmar_domain);
+
+	return &dmar_domain->domain;
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.blocked_domain		= &blocking_domain,
 	.release_domain		= &blocking_domain,
@@ -4599,6 +4612,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_alloc_user	= intel_iommu_domain_alloc_user,
 	.domain_alloc_sva	= intel_svm_domain_alloc,
+	.domain_alloc_paging	= intel_iommu_domain_alloc_paging,
 	.probe_device		= intel_iommu_probe_device,
 	.release_device		= intel_iommu_release_device,
 	.get_resv_regions	= intel_iommu_get_resv_regions,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] iommu/vt-d: Remove unused domain_alloc callback
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
  2024-10-11  4:27 ` [PATCH 1/7] " Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 16:17   ` Jason Gunthorpe
  2024-10-11  4:27 ` [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach Lu Baolu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

With domain_alloc_paging callback supported, the legacy domain_alloc
callback will never be used anymore. Remove it to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 90 -------------------------------------
 1 file changed, 90 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4803e0cb8279..dd158ff5fd45 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1454,27 +1454,6 @@ static bool first_level_by_default(unsigned int type)
 	return type != IOMMU_DOMAIN_UNMANAGED;
 }
 
-static struct dmar_domain *alloc_domain(unsigned int type)
-{
-	struct dmar_domain *domain;
-
-	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
-	if (!domain)
-		return NULL;
-
-	domain->nid = NUMA_NO_NODE;
-	if (first_level_by_default(type))
-		domain->use_first_level = true;
-	INIT_LIST_HEAD(&domain->devices);
-	INIT_LIST_HEAD(&domain->dev_pasids);
-	INIT_LIST_HEAD(&domain->cache_tags);
-	spin_lock_init(&domain->lock);
-	spin_lock_init(&domain->cache_lock);
-	xa_init(&domain->iommu_array);
-
-	return domain;
-}
-
 int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 {
 	struct iommu_domain_info *info, *curr;
@@ -1546,20 +1525,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 	spin_unlock(&iommu->lock);
 }
 
-static int guestwidth_to_adjustwidth(int gaw)
-{
-	int agaw;
-	int r = (gaw - 12) % 9;
-
-	if (r == 0)
-		agaw = gaw;
-	else
-		agaw = gaw + 9 - r;
-	if (agaw > 64)
-		agaw = 64;
-	return agaw;
-}
-
 static void domain_exit(struct dmar_domain *domain)
 {
 	if (domain->pgd) {
@@ -3379,27 +3344,6 @@ void device_block_translation(struct device *dev)
 	info->domain = NULL;
 }
 
-static int md_domain_init(struct dmar_domain *domain, int guest_width)
-{
-	int adjust_width;
-
-	/* calculate AGAW */
-	domain->gaw = guest_width;
-	adjust_width = guestwidth_to_adjustwidth(guest_width);
-	domain->agaw = width_to_agaw(adjust_width);
-
-	domain->iommu_coherency = false;
-	domain->iommu_superpage = 0;
-	domain->max_addr = 0;
-
-	/* always allocate the top pgd */
-	domain->pgd = iommu_alloc_page_node(domain->nid, GFP_ATOMIC);
-	if (!domain->pgd)
-		return -ENOMEM;
-	domain_flush_cache(domain, domain->pgd, PAGE_SIZE);
-	return 0;
-}
-
 static int blocking_domain_attach_dev(struct iommu_domain *domain,
 				      struct device *dev)
 {
@@ -3486,39 +3430,6 @@ static struct dmar_domain *paging_domain_alloc(struct device *dev, bool first_st
 	return domain;
 }
 
-static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
-{
-	struct dmar_domain *dmar_domain;
-	struct iommu_domain *domain;
-
-	switch (type) {
-	case IOMMU_DOMAIN_DMA:
-	case IOMMU_DOMAIN_UNMANAGED:
-		dmar_domain = alloc_domain(type);
-		if (!dmar_domain) {
-			pr_err("Can't allocate dmar_domain\n");
-			return NULL;
-		}
-		if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
-			pr_err("Domain initialization failed\n");
-			domain_exit(dmar_domain);
-			return NULL;
-		}
-
-		domain = &dmar_domain->domain;
-		domain->geometry.aperture_start = 0;
-		domain->geometry.aperture_end   =
-				__DOMAIN_MAX_ADDR(dmar_domain->gaw);
-		domain->geometry.force_aperture = true;
-
-		return domain;
-	default:
-		return NULL;
-	}
-
-	return NULL;
-}
-
 static struct iommu_domain *
 intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 			      struct iommu_domain *parent,
@@ -4609,7 +4520,6 @@ const struct iommu_ops intel_iommu_ops = {
 	.identity_domain	= &identity_domain,
 	.capable		= intel_iommu_capable,
 	.hw_info		= intel_iommu_hw_info,
-	.domain_alloc		= intel_iommu_domain_alloc,
 	.domain_alloc_user	= intel_iommu_domain_alloc_user,
 	.domain_alloc_sva	= intel_svm_domain_alloc,
 	.domain_alloc_paging	= intel_iommu_domain_alloc_paging,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
  2024-10-11  4:27 ` [PATCH 1/7] " Lu Baolu
  2024-10-11  4:27 ` [PATCH 2/7] iommu/vt-d: Remove unused domain_alloc callback Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 16:27   ` Jason Gunthorpe
  2024-10-11  4:27 ` [PATCH 4/7] iommu/vt-d: Remove domain_update_iommu_cap() Lu Baolu
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

The driver now supports domain_alloc_paging, ensuring that a valid device
pointer is provided whenever a paging domain is allocated. Additionally,
the dmar_domain attributes are set up at the time of allocation.

Consistent with the established semantics in the IOMMU core, if a domain is
attached to a device and found to be incompatible with the IOMMU hardware
capabilities, the operation will return an -EINVAL error. This implicitly
advises the caller to allocate a new domain for the device and attempt the
domain attachment again.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 70 ++++++++++++-------------------------
 drivers/iommu/intel/pasid.c | 28 +--------------
 2 files changed, 24 insertions(+), 74 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index dd158ff5fd45..f6a3266b17d4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1606,7 +1606,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	int translation = CONTEXT_TT_MULTI_LEVEL;
 	struct dma_pte *pgd = domain->pgd;
 	struct context_entry *context;
-	int agaw, ret;
+	int ret;
 
 	pr_debug("Set context mapping for %02x:%02x.%d\n",
 		bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -1623,27 +1623,15 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 
 	copied_context_tear_down(iommu, context, bus, devfn);
 	context_clear_entry(context);
-
 	context_set_domain_id(context, did);
 
-	/*
-	 * Skip top levels of page tables for iommu which has
-	 * less agaw than default. Unnecessary for PT mode.
-	 */
-	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
-		ret = -ENOMEM;
-		pgd = phys_to_virt(dma_pte_addr(pgd));
-		if (!dma_pte_present(pgd))
-			goto out_unlock;
-	}
-
 	if (info && info->ats_supported)
 		translation = CONTEXT_TT_DEV_IOTLB;
 	else
 		translation = CONTEXT_TT_MULTI_LEVEL;
 
 	context_set_address_root(context, virt_to_phys(pgd));
-	context_set_address_width(context, agaw);
+	context_set_address_width(context, domain->agaw);
 	context_set_translation_type(context, translation);
 	context_set_fault_enable(context);
 	context_set_present(context);
@@ -1876,20 +1864,9 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
 				    u32 pasid)
 {
 	struct dma_pte *pgd = domain->pgd;
-	int agaw, level;
-	int flags = 0;
+	int level, flags = 0;
 
-	/*
-	 * Skip top levels of page tables for iommu which has
-	 * less agaw than default. Unnecessary for PT mode.
-	 */
-	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
-		pgd = phys_to_virt(dma_pte_addr(pgd));
-		if (!dma_pte_present(pgd))
-			return -ENOMEM;
-	}
-
-	level = agaw_to_level(agaw);
+	level = agaw_to_level(domain->agaw);
 	if (level != 4 && level != 5)
 		return -EINVAL;
 
@@ -3506,27 +3483,26 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
 	if (domain->dirty_ops && !ssads_supported(iommu))
 		return -EINVAL;
 
-	/* check if this iommu agaw is sufficient for max mapped address */
-	addr_width = agaw_to_width(iommu->agaw);
-	if (addr_width > cap_mgaw(iommu->cap))
-		addr_width = cap_mgaw(iommu->cap);
-
-	if (dmar_domain->max_addr > (1LL << addr_width))
+	if (dmar_domain->iommu_coherency !=
+			iommu_paging_structure_coherency(iommu))
 		return -EINVAL;
-	dmar_domain->gaw = addr_width;
-
-	/*
-	 * Knock out extra levels of page tables if necessary
-	 */
-	while (iommu->agaw < dmar_domain->agaw) {
-		struct dma_pte *pte;
-
-		pte = dmar_domain->pgd;
-		if (dma_pte_present(pte)) {
-			dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
-			iommu_free_page(pte);
-		}
-		dmar_domain->agaw--;
+
+	if (domain->type & __IOMMU_DOMAIN_PAGING) {
+		if (dmar_domain->iommu_superpage !=
+				iommu_superpage_capability(iommu, dmar_domain->use_first_level))
+			return -EINVAL;
+
+		if (dmar_domain->use_first_level &&
+		    (!sm_supported(iommu) || !ecap_flts(iommu->ecap)))
+			return -EINVAL;
+
+		/* check if this iommu agaw is sufficient for max mapped address */
+		addr_width = agaw_to_width(iommu->agaw);
+		if (addr_width > cap_mgaw(iommu->cap))
+			addr_width = cap_mgaw(iommu->cap);
+
+		if (dmar_domain->gaw > addr_width || dmar_domain->agaw > iommu->agaw)
+			return -EINVAL;
 	}
 
 	if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 2e5fa0a23299..53157e1194f4 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -345,25 +345,6 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 	return 0;
 }
 
-/*
- * Skip top levels of page tables for iommu which has less agaw
- * than default. Unnecessary for PT mode.
- */
-static int iommu_skip_agaw(struct dmar_domain *domain,
-			   struct intel_iommu *iommu,
-			   struct dma_pte **pgd)
-{
-	int agaw;
-
-	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
-		*pgd = phys_to_virt(dma_pte_addr(*pgd));
-		if (!dma_pte_present(*pgd))
-			return -EINVAL;
-	}
-
-	return agaw;
-}
-
 /*
  * Set up the scalable mode pasid entry for second only translation type.
  */
@@ -374,7 +355,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	struct pasid_entry *pte;
 	struct dma_pte *pgd;
 	u64 pgd_val;
-	int agaw;
 	u16 did;
 
 	/*
@@ -388,12 +368,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	}
 
 	pgd = domain->pgd;
-	agaw = iommu_skip_agaw(domain, iommu, &pgd);
-	if (agaw < 0) {
-		dev_err(dev, "Invalid domain page table\n");
-		return -EINVAL;
-	}
-
 	pgd_val = virt_to_phys(pgd);
 	did = domain_id_iommu(domain, iommu);
 
@@ -412,7 +386,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	pasid_clear_entry(pte);
 	pasid_set_domain_id(pte, did);
 	pasid_set_slptr(pte, pgd_val);
-	pasid_set_address_width(pte, agaw);
+	pasid_set_address_width(pte, domain->agaw);
 	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY);
 	pasid_set_fault_enable(pte);
 	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] iommu/vt-d: Remove domain_update_iommu_cap()
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
                   ` (2 preceding siblings ...)
  2024-10-11  4:27 ` [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 16:30   ` Jason Gunthorpe
  2024-10-11  4:27 ` [PATCH 5/7] iommu/vt-d: Remove domain_update_iommu_superpage() Lu Baolu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

The attributes of a paging domain are initialized during the allocation
process, and any attempt to attach a domain that is not compatible will
result in a failure. Therefore, there is no need to update the domain
attributes at the time of domain attachment.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h |  1 -
 drivers/iommu/intel/iommu.c | 83 -------------------------------------
 2 files changed, 84 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 1497f3112b12..a4bff90c2d07 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1232,7 +1232,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
 void device_block_translation(struct device *dev);
 int prepare_domain_attach_device(struct iommu_domain *domain,
 				 struct device *dev);
-void domain_update_iommu_cap(struct dmar_domain *domain);
 
 int dmar_ir_support(void);
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f6a3266b17d4..ada271e7be50 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -352,36 +352,6 @@ static bool iommu_paging_structure_coherency(struct intel_iommu *iommu)
 			ecap_smpwc(iommu->ecap) : ecap_coherent(iommu->ecap);
 }
 
-static void domain_update_iommu_coherency(struct dmar_domain *domain)
-{
-	struct iommu_domain_info *info;
-	struct dmar_drhd_unit *drhd;
-	struct intel_iommu *iommu;
-	bool found = false;
-	unsigned long i;
-
-	domain->iommu_coherency = true;
-	xa_for_each(&domain->iommu_array, i, info) {
-		found = true;
-		if (!iommu_paging_structure_coherency(info->iommu)) {
-			domain->iommu_coherency = false;
-			break;
-		}
-	}
-	if (found)
-		return;
-
-	/* No hardware attached; use lowest common denominator */
-	rcu_read_lock();
-	for_each_active_iommu(iommu, drhd) {
-		if (!iommu_paging_structure_coherency(iommu)) {
-			domain->iommu_coherency = false;
-			break;
-		}
-	}
-	rcu_read_unlock();
-}
-
 static int domain_update_iommu_superpage(struct dmar_domain *domain,
 					 struct intel_iommu *skip)
 {
@@ -412,29 +382,6 @@ static int domain_update_iommu_superpage(struct dmar_domain *domain,
 	return fls(mask);
 }
 
-static int domain_update_device_node(struct dmar_domain *domain)
-{
-	struct device_domain_info *info;
-	int nid = NUMA_NO_NODE;
-	unsigned long flags;
-
-	spin_lock_irqsave(&domain->lock, flags);
-	list_for_each_entry(info, &domain->devices, link) {
-		/*
-		 * There could possibly be multiple device numa nodes as devices
-		 * within the same domain may sit behind different IOMMUs. There
-		 * isn't perfect answer in such situation, so we select first
-		 * come first served policy.
-		 */
-		nid = dev_to_node(info->dev);
-		if (nid != NUMA_NO_NODE)
-			break;
-	}
-	spin_unlock_irqrestore(&domain->lock, flags);
-
-	return nid;
-}
-
 /* Return the super pagesize bitmap if supported. */
 static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 {
@@ -452,34 +399,6 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 	return bitmap;
 }
 
-/* Some capabilities may be different across iommus */
-void domain_update_iommu_cap(struct dmar_domain *domain)
-{
-	domain_update_iommu_coherency(domain);
-	domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
-
-	/*
-	 * If RHSA is missing, we should default to the device numa domain
-	 * as fall back.
-	 */
-	if (domain->nid == NUMA_NO_NODE)
-		domain->nid = domain_update_device_node(domain);
-
-	/*
-	 * First-level translation restricts the input-address to a
-	 * canonical address (i.e., address bits 63:N have the same
-	 * value as address bit [N-1], where N is 48-bits with 4-level
-	 * paging and 57-bits with 5-level paging). Hence, skip bit
-	 * [N-1].
-	 */
-	if (domain->use_first_level)
-		domain->domain.geometry.aperture_end = __DOMAIN_MAX_ADDR(domain->gaw - 1);
-	else
-		domain->domain.geometry.aperture_end = __DOMAIN_MAX_ADDR(domain->gaw);
-
-	domain->domain.pgsize_bitmap |= domain_super_pgsize_bitmap(domain);
-}
-
 struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
 					 u8 devfn, int alloc)
 {
@@ -1493,7 +1412,6 @@ int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 		ret = xa_err(curr) ? : -EBUSY;
 		goto err_clear;
 	}
-	domain_update_iommu_cap(domain);
 
 	spin_unlock(&iommu->lock);
 	return 0;
@@ -1519,7 +1437,6 @@ void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
 		clear_bit(info->did, iommu->domain_ids);
 		xa_erase(&domain->iommu_array, iommu->seq_id);
 		domain->nid = NUMA_NO_NODE;
-		domain_update_iommu_cap(domain);
 		kfree(info);
 	}
 	spin_unlock(&iommu->lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] iommu/vt-d: Remove domain_update_iommu_superpage()
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
                   ` (3 preceding siblings ...)
  2024-10-11  4:27 ` [PATCH 4/7] iommu/vt-d: Remove domain_update_iommu_cap() Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 16:30   ` Jason Gunthorpe
  2024-10-11  4:27 ` [PATCH 6/7] iommu/vt-d: Refactor first_level_by_default() Lu Baolu
  2024-10-11  4:27 ` [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user() Lu Baolu
  6 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

The requirement for consistent super page support across all the IOMMU
hardware in the system has been removed. In the past, if a new IOMMU
was hot-added and lacked consistent super page capability, the hot-add
process would be aborted. However, with the updated attachment semantics,
it is now permissible for the super page capability to vary among
different IOMMU hardware units.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 39 +------------------------------------
 1 file changed, 1 insertion(+), 38 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ada271e7be50..29b53d955022 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -352,36 +352,6 @@ static bool iommu_paging_structure_coherency(struct intel_iommu *iommu)
 			ecap_smpwc(iommu->ecap) : ecap_coherent(iommu->ecap);
 }
 
-static int domain_update_iommu_superpage(struct dmar_domain *domain,
-					 struct intel_iommu *skip)
-{
-	struct dmar_drhd_unit *drhd;
-	struct intel_iommu *iommu;
-	int mask = 0x3;
-
-	if (!intel_iommu_superpage)
-		return 0;
-
-	/* set iommu_superpage to the smallest common denominator */
-	rcu_read_lock();
-	for_each_active_iommu(iommu, drhd) {
-		if (iommu != skip) {
-			if (domain && domain->use_first_level) {
-				if (!cap_fl1gp_support(iommu->cap))
-					mask = 0x1;
-			} else {
-				mask &= cap_super_page_val(iommu->cap);
-			}
-
-			if (!mask)
-				break;
-		}
-	}
-	rcu_read_unlock();
-
-	return fls(mask);
-}
-
 /* Return the super pagesize bitmap if supported. */
 static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 {
@@ -2605,20 +2575,13 @@ int dmar_parse_one_satc(struct acpi_dmar_header *hdr, void *arg)
 
 static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
 {
-	int sp, ret;
 	struct intel_iommu *iommu = dmaru->iommu;
+	int ret;
 
 	ret = intel_cap_audit(CAP_AUDIT_HOTPLUG_DMAR, iommu);
 	if (ret)
 		goto out;
 
-	sp = domain_update_iommu_superpage(NULL, iommu) - 1;
-	if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
-		pr_warn("%s: Doesn't support large page.\n",
-			iommu->name);
-		return -ENXIO;
-	}
-
 	/*
 	 * Disable translation if already enabled prior to OS handover.
 	 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] iommu/vt-d: Refactor first_level_by_default()
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
                   ` (4 preceding siblings ...)
  2024-10-11  4:27 ` [PATCH 5/7] iommu/vt-d: Remove domain_update_iommu_superpage() Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 16:31   ` Jason Gunthorpe
  2024-10-11  4:27 ` [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user() Lu Baolu
  6 siblings, 1 reply; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu

The first stage page table is compatible across host and guest kernels.
Therefore, this driver uses the first stage page table as the default for
paging domains.

The helper first_level_by_default() determines the feasibility of using
the first stage page table based on a global policy. This policy requires
consistency in scalable mode and first stage translation capability among
all iommu units. However, this is unnecessary as domain allocation,
attachment, and removal operations are performed on a per-device basis.

The domain type (IOMMU_DOMAIN_DMA vs. IOMMU_DOMAIN_UNMANAGED) should not
be a factor in determining the first stage page table usage. Both types
are for paging domains, and there's no fundamental difference between them.
The driver should not be aware of this distinction unless the core
specifies allocation flags that require special handling.

Convert first_level_by_default() from global to per-iommu and remove the
'type' input.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 29b53d955022..70f3cbcc3160 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1329,18 +1329,17 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
  * Check and return whether first level is used by default for
  * DMA translation.
  */
-static bool first_level_by_default(unsigned int type)
+static bool first_level_by_default(struct intel_iommu *iommu)
 {
 	/* Only SL is available in legacy mode */
-	if (!scalable_mode_support())
+	if (!sm_supported(iommu))
 		return false;
 
 	/* Only level (either FL or SL) is available, just use it */
-	if (intel_cap_flts_sanity() ^ intel_cap_slts_sanity())
-		return intel_cap_flts_sanity();
+	if (ecap_flts(iommu->ecap) ^ ecap_slts(iommu->ecap))
+		return ecap_flts(iommu->ecap);
 
-	/* Both levels are available, decide it based on domain type */
-	return type != IOMMU_DOMAIN_UNMANAGED;
+	return true;
 }
 
 int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
@@ -3110,7 +3109,7 @@ int __init intel_iommu_init(void)
 		 * the virtual and physical IOMMU page-tables.
 		 */
 		if (cap_caching_mode(iommu->cap) &&
-		    !first_level_by_default(IOMMU_DOMAIN_DMA)) {
+		    !first_level_by_default(iommu)) {
 			pr_info_once("IOMMU batching disallowed due to virtualization\n");
 			iommu_set_dma_strict();
 		}
@@ -4359,10 +4358,12 @@ static struct iommu_domain identity_domain = {
 
 static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev)
 {
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
 	struct dmar_domain *dmar_domain;
 	bool first_stage;
 
-	first_stage = first_level_by_default(0);
+	first_stage = first_level_by_default(iommu);
 	dmar_domain = paging_domain_alloc(dev, first_stage);
 	if (IS_ERR(dmar_domain))
 		return ERR_CAST(dmar_domain);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user()
  2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
                   ` (5 preceding siblings ...)
  2024-10-11  4:27 ` [PATCH 6/7] iommu/vt-d: Refactor first_level_by_default() Lu Baolu
@ 2024-10-11  4:27 ` Lu Baolu
  2024-10-11 11:32   ` Jason Gunthorpe
  2024-10-11 16:31   ` Jason Gunthorpe
  6 siblings, 2 replies; 24+ messages in thread
From: Lu Baolu @ 2024-10-11  4:27 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel, Lu Baolu,
	Jason Gunthorpe

The domain_alloc_user ops should always allocate a guest-compatible page
table unless specific allocation flags are specified.

Currently, IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_ALLOC_DIRTY_TRACKING
require special handling, as both require hardware support for scalable
mode and second-stage translation. In such cases, the driver should select
a second-stage page table for the paging domain.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 70f3cbcc3160..02d0dc2c6490 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3297,6 +3297,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 	struct intel_iommu *iommu = info->iommu;
 	struct dmar_domain *dmar_domain;
 	struct iommu_domain *domain;
+	bool first_stage;
 
 	/* Must be NESTING domain */
 	if (parent) {
@@ -3313,8 +3314,20 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 	if (user_data || (dirty_tracking && !ssads_supported(iommu)))
 		return ERR_PTR(-EOPNOTSUPP);
 
-	/* Do not use first stage for user domain translation. */
-	dmar_domain = paging_domain_alloc(dev, false);
+	/*
+	 * Always allocate the guest compatible page table unless
+	 * IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING
+	 * is specified.
+	 */
+	if (nested_parent || dirty_tracking) {
+		if (!sm_supported(iommu) || !ecap_slts(iommu->ecap))
+			return ERR_PTR(-EOPNOTSUPP);
+		first_stage = false;
+	} else {
+		first_stage = first_level_by_default(iommu);
+	}
+
+	dmar_domain = paging_domain_alloc(dev, first_stage);
 	if (IS_ERR(dmar_domain))
 		return ERR_CAST(dmar_domain);
 	domain = &dmar_domain->domain;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user()
  2024-10-11  4:27 ` [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user() Lu Baolu
@ 2024-10-11 11:32   ` Jason Gunthorpe
  2024-10-11 16:31   ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 11:32 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:22PM +0800, Lu Baolu wrote:
> The domain_alloc_user ops should always allocate a guest-compatible page
> table unless specific allocation flags are specified.
> 
> Currently, IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_ALLOC_DIRTY_TRACKING
> require special handling, as both require hardware support for scalable
> mode and second-stage translation. In such cases, the driver should select
> a second-stage page table for the paging domain.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Maybe even for -rc/stable since it fixes uAPI?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] iommu/vt-d: Add domain_alloc_paging support
  2024-10-11  4:27 ` [PATCH 1/7] " Lu Baolu
@ 2024-10-11 13:22   ` Jason Gunthorpe
  2024-10-11 16:33     ` Jason Gunthorpe
  2024-10-14  0:53     ` Baolu Lu
  0 siblings, 2 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 13:22 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:16PM +0800, Lu Baolu wrote:

> +static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev)
> +{
> +	struct dmar_domain *dmar_domain;
> +	bool first_stage;
> +
> +	first_stage = first_level_by_default(0);
> +	dmar_domain = paging_domain_alloc(dev, first_stage);
> +	if (IS_ERR(dmar_domain))
> +		return ERR_CAST(dmar_domain);
> +
> +	return &dmar_domain->domain;
> +}

With the direction that Vasant's series is going in, I think this
should be skipped and instead your other patch to make
domain_alloc_user (which we will rename) a full functional replacement
is the right thing.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] iommu/vt-d: Remove unused domain_alloc callback
  2024-10-11  4:27 ` [PATCH 2/7] iommu/vt-d: Remove unused domain_alloc callback Lu Baolu
@ 2024-10-11 16:17   ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:17 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:17PM +0800, Lu Baolu wrote:
> With domain_alloc_paging callback supported, the legacy domain_alloc
> callback will never be used anymore. Remove it to avoid dead code.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 90 -------------------------------------
>  1 file changed, 90 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-11  4:27 ` [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach Lu Baolu
@ 2024-10-11 16:27   ` Jason Gunthorpe
  2024-10-14  1:25     ` Baolu Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:27 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:18PM +0800, Lu Baolu wrote:

> @@ -1623,27 +1623,15 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
>  
>  	copied_context_tear_down(iommu, context, bus, devfn);
>  	context_clear_entry(context);
> -
>  	context_set_domain_id(context, did);
>  
> -	/*
> -	 * Skip top levels of page tables for iommu which has
> -	 * less agaw than default. Unnecessary for PT mode.
> -	 */
> -	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
> -		ret = -ENOMEM;
> -		pgd = phys_to_virt(dma_pte_addr(pgd));
> -		if (!dma_pte_present(pgd))
> -			goto out_unlock;
> -	}

Yikes, this is nasty racy stuff, glad to see it go

But should the agaw stuff be in its own patch?

> @@ -3506,27 +3483,26 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
>  	if (domain->dirty_ops && !ssads_supported(iommu))
>  		return -EINVAL;
>  
> -	/* check if this iommu agaw is sufficient for max mapped address */
> -	addr_width = agaw_to_width(iommu->agaw);
> -	if (addr_width > cap_mgaw(iommu->cap))
> -		addr_width = cap_mgaw(iommu->cap);
> -
> -	if (dmar_domain->max_addr > (1LL << addr_width))
> +	if (dmar_domain->iommu_coherency !=
> +			iommu_paging_structure_coherency(iommu))
>  		return -EINVAL;
> -	dmar_domain->gaw = addr_width;
> -
> -	/*
> -	 * Knock out extra levels of page tables if necessary
> -	 */
> -	while (iommu->agaw < dmar_domain->agaw) {
> -		struct dma_pte *pte;
> -
> -		pte = dmar_domain->pgd;
> -		if (dma_pte_present(pte)) {
> -			dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
> -			iommu_free_page(pte);
> -		}
> -		dmar_domain->agaw--;
> +
> +	if (domain->type & __IOMMU_DOMAIN_PAGING) {

It looks like this entire function is already never called for
anything but paging?

The only three callers are:

	.default_domain_ops = &(const struct iommu_domain_ops) {
		.attach_dev		= intel_iommu_attach_device,
		.set_dev_pasid		= intel_iommu_set_dev_pasid,

and

static const struct iommu_domain_ops intel_nested_domain_ops = {
	.attach_dev		= intel_nested_attach_dev,

And none of those cases can be anything except a paging domain by
definition.

So this if should go away, or be turned into a WARN_ON.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] iommu/vt-d: Remove domain_update_iommu_cap()
  2024-10-11  4:27 ` [PATCH 4/7] iommu/vt-d: Remove domain_update_iommu_cap() Lu Baolu
@ 2024-10-11 16:30   ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:30 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:19PM +0800, Lu Baolu wrote:
> The attributes of a paging domain are initialized during the allocation
> process, and any attempt to attach a domain that is not compatible will
> result in a failure. Therefore, there is no need to update the domain
> attributes at the time of domain attachment.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.h |  1 -
>  drivers/iommu/intel/iommu.c | 83 -------------------------------------
>  2 files changed, 84 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] iommu/vt-d: Remove domain_update_iommu_superpage()
  2024-10-11  4:27 ` [PATCH 5/7] iommu/vt-d: Remove domain_update_iommu_superpage() Lu Baolu
@ 2024-10-11 16:30   ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:30 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:20PM +0800, Lu Baolu wrote:
> The requirement for consistent super page support across all the IOMMU
> hardware in the system has been removed. In the past, if a new IOMMU
> was hot-added and lacked consistent super page capability, the hot-add
> process would be aborted. However, with the updated attachment semantics,
> it is now permissible for the super page capability to vary among
> different IOMMU hardware units.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 39 +------------------------------------
>  1 file changed, 1 insertion(+), 38 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] iommu/vt-d: Refactor first_level_by_default()
  2024-10-11  4:27 ` [PATCH 6/7] iommu/vt-d: Refactor first_level_by_default() Lu Baolu
@ 2024-10-11 16:31   ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:31 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:21PM +0800, Lu Baolu wrote:
> The first stage page table is compatible across host and guest kernels.
> Therefore, this driver uses the first stage page table as the default for
> paging domains.
> 
> The helper first_level_by_default() determines the feasibility of using
> the first stage page table based on a global policy. This policy requires
> consistency in scalable mode and first stage translation capability among
> all iommu units. However, this is unnecessary as domain allocation,
> attachment, and removal operations are performed on a per-device basis.
> 
> The domain type (IOMMU_DOMAIN_DMA vs. IOMMU_DOMAIN_UNMANAGED) should not
> be a factor in determining the first stage page table usage. Both types
> are for paging domains, and there's no fundamental difference between them.
> The driver should not be aware of this distinction unless the core
> specifies allocation flags that require special handling.
> 
> Convert first_level_by_default() from global to per-iommu and remove the
> 'type' input.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user()
  2024-10-11  4:27 ` [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user() Lu Baolu
  2024-10-11 11:32   ` Jason Gunthorpe
@ 2024-10-11 16:31   ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:31 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 12:27:22PM +0800, Lu Baolu wrote:
> The domain_alloc_user ops should always allocate a guest-compatible page
> table unless specific allocation flags are specified.
> 
> Currently, IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_ALLOC_DIRTY_TRACKING
> require special handling, as both require hardware support for scalable
> mode and second-stage translation. In such cases, the driver should select
> a second-stage page table for the paging domain.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] iommu/vt-d: Add domain_alloc_paging support
  2024-10-11 13:22   ` Jason Gunthorpe
@ 2024-10-11 16:33     ` Jason Gunthorpe
  2024-10-14  0:53     ` Baolu Lu
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-11 16:33 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Fri, Oct 11, 2024 at 10:22:52AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 11, 2024 at 12:27:16PM +0800, Lu Baolu wrote:
> 
> > +static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev)
> > +{
> > +	struct dmar_domain *dmar_domain;
> > +	bool first_stage;
> > +
> > +	first_stage = first_level_by_default(0);
> > +	dmar_domain = paging_domain_alloc(dev, first_stage);
> > +	if (IS_ERR(dmar_domain))
> > +		return ERR_CAST(dmar_domain);
> > +
> > +	return &dmar_domain->domain;
> > +}
> 
> With the direction that Vasant's series is going in, I think this
> should be skipped and instead your other patch to make
> domain_alloc_user (which we will rename) a full functional replacement
> is the right thing.

Though I guess for patch ordering purposes it makes sense to just
leave this and remove it the next cycle

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] iommu/vt-d: Add domain_alloc_paging support
  2024-10-11 13:22   ` Jason Gunthorpe
  2024-10-11 16:33     ` Jason Gunthorpe
@ 2024-10-14  0:53     ` Baolu Lu
  2024-10-14 19:23       ` Jason Gunthorpe
  1 sibling, 1 reply; 24+ messages in thread
From: Baolu Lu @ 2024-10-14  0:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel

On 2024/10/11 21:22, Jason Gunthorpe wrote:
> On Fri, Oct 11, 2024 at 12:27:16PM +0800, Lu Baolu wrote:
> 
>> +static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev)
>> +{
>> +	struct dmar_domain *dmar_domain;
>> +	bool first_stage;
>> +
>> +	first_stage = first_level_by_default(0);
>> +	dmar_domain = paging_domain_alloc(dev, first_stage);
>> +	if (IS_ERR(dmar_domain))
>> +		return ERR_CAST(dmar_domain);
>> +
>> +	return &dmar_domain->domain;
>> +}
> With the direction that Vasant's series is going in, I think this
> should be skipped and instead your other patch to make
> domain_alloc_user (which we will rename) a full functional replacement
> is the right thing.

I think it's too early to remove domain_alloc_paging from the iommu
driver. Vasant's series makes most of the paging domain allocation go
through domain_alloc_user ops, but for those that are non-PASID related,
it still goes through domain_alloc_paging. So perhaps we can clean up
after both series are merged.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-11 16:27   ` Jason Gunthorpe
@ 2024-10-14  1:25     ` Baolu Lu
  2024-10-14 19:24       ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Baolu Lu @ 2024-10-14  1:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel

On 2024/10/12 0:27, Jason Gunthorpe wrote:
> On Fri, Oct 11, 2024 at 12:27:18PM +0800, Lu Baolu wrote:
> 
>> @@ -1623,27 +1623,15 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
>>   
>>   	copied_context_tear_down(iommu, context, bus, devfn);
>>   	context_clear_entry(context);
>> -
>>   	context_set_domain_id(context, did);
>>   
>> -	/*
>> -	 * Skip top levels of page tables for iommu which has
>> -	 * less agaw than default. Unnecessary for PT mode.
>> -	 */
>> -	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
>> -		ret = -ENOMEM;
>> -		pgd = phys_to_virt(dma_pte_addr(pgd));
>> -		if (!dma_pte_present(pgd))
>> -			goto out_unlock;
>> -	}
> 
> Yikes, this is nasty racy stuff, glad to see it go
> 
> But should the agaw stuff be in its own patch?

The agaw stuff is now in the paging domain allocation path.

'agaw' represents the levels that the page table could support. With the
domain allocation callbacks always have a valid device pointer, the page
table levels of a domain could be determined during allocation.

When the domain is attached to a new device, the domain's levels will be
checked again the iommu capabilities of the device. If the iommu is not
capable of supporting the page levels, -EINVAL (not compatible) will be
returned, suggesting that the caller should use a new domain for the
device.

> 
>> @@ -3506,27 +3483,26 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
>>   	if (domain->dirty_ops && !ssads_supported(iommu))
>>   		return -EINVAL;
>>   
>> -	/* check if this iommu agaw is sufficient for max mapped address */
>> -	addr_width = agaw_to_width(iommu->agaw);
>> -	if (addr_width > cap_mgaw(iommu->cap))
>> -		addr_width = cap_mgaw(iommu->cap);
>> -
>> -	if (dmar_domain->max_addr > (1LL << addr_width))
>> +	if (dmar_domain->iommu_coherency !=
>> +			iommu_paging_structure_coherency(iommu))
>>   		return -EINVAL;
>> -	dmar_domain->gaw = addr_width;
>> -
>> -	/*
>> -	 * Knock out extra levels of page tables if necessary
>> -	 */
>> -	while (iommu->agaw < dmar_domain->agaw) {
>> -		struct dma_pte *pte;
>> -
>> -		pte = dmar_domain->pgd;
>> -		if (dma_pte_present(pte)) {
>> -			dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
>> -			iommu_free_page(pte);
>> -		}
>> -		dmar_domain->agaw--;
>> +
>> +	if (domain->type & __IOMMU_DOMAIN_PAGING) {
> 
> It looks like this entire function is already never called for
> anything but paging?
> 
> The only three callers are:
> 
> 	.default_domain_ops = &(const struct iommu_domain_ops) {
> 		.attach_dev		= intel_iommu_attach_device,
> 		.set_dev_pasid		= intel_iommu_set_dev_pasid,
> 
> and
> 
> static const struct iommu_domain_ops intel_nested_domain_ops = {
> 	.attach_dev		= intel_nested_attach_dev,
> 
> And none of those cases can be anything except a paging domain by
> definition.

A nested domain is not a paging domain. It represents a user-space page
table that nested on a parent paging domain. Perhaps I overlooked
anything?

> 
> So this if should go away, or be turned into a WARN_ON.
> 
> Jason

Thanks,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] iommu/vt-d: Add domain_alloc_paging support
  2024-10-14  0:53     ` Baolu Lu
@ 2024-10-14 19:23       ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-14 19:23 UTC (permalink / raw)
  To: Baolu Lu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Mon, Oct 14, 2024 at 08:53:36AM +0800, Baolu Lu wrote:
> On 2024/10/11 21:22, Jason Gunthorpe wrote:
> > On Fri, Oct 11, 2024 at 12:27:16PM +0800, Lu Baolu wrote:
> > 
> > > +static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev)
> > > +{
> > > +	struct dmar_domain *dmar_domain;
> > > +	bool first_stage;
> > > +
> > > +	first_stage = first_level_by_default(0);
> > > +	dmar_domain = paging_domain_alloc(dev, first_stage);
> > > +	if (IS_ERR(dmar_domain))
> > > +		return ERR_CAST(dmar_domain);
> > > +
> > > +	return &dmar_domain->domain;
> > > +}
> > With the direction that Vasant's series is going in, I think this
> > should be skipped and instead your other patch to make
> > domain_alloc_user (which we will rename) a full functional replacement
> > is the right thing.
> 
> I think it's too early to remove domain_alloc_paging from the iommu
> driver. Vasant's series makes most of the paging domain allocation go
> through domain_alloc_user ops, but for those that are non-PASID related,
> it still goes through domain_alloc_paging. So perhaps we can clean up
> after both series are merged.

I gave him a few more patches to finish the job, but yes let's not
make them interdependent. Just make sure this series sets things up so
we can delete this and the _user version can do everything

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-14  1:25     ` Baolu Lu
@ 2024-10-14 19:24       ` Jason Gunthorpe
  2024-10-15  2:52         ` Baolu Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-14 19:24 UTC (permalink / raw)
  To: Baolu Lu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Mon, Oct 14, 2024 at 09:25:03AM +0800, Baolu Lu wrote:
> > > +	if (domain->type & __IOMMU_DOMAIN_PAGING) {
> > 
> > It looks like this entire function is already never called for
> > anything but paging?
> > 
> > The only three callers are:
> > 
> > 	.default_domain_ops = &(const struct iommu_domain_ops) {
> > 		.attach_dev		= intel_iommu_attach_device,
> > 		.set_dev_pasid		= intel_iommu_set_dev_pasid,
> > 
> > and
> > 
> > static const struct iommu_domain_ops intel_nested_domain_ops = {
> > 	.attach_dev		= intel_nested_attach_dev,
> > 
> > And none of those cases can be anything except a paging domain by
> > definition.
> 
> A nested domain is not a paging domain. It represents a user-space page
> table that nested on a parent paging domain. Perhaps I overlooked
> anything?

It only calls it on the s2_parent which is always a paging domain?

	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-14 19:24       ` Jason Gunthorpe
@ 2024-10-15  2:52         ` Baolu Lu
  2024-10-15 12:48           ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Baolu Lu @ 2024-10-15  2:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel

On 2024/10/15 3:24, Jason Gunthorpe wrote:
> On Mon, Oct 14, 2024 at 09:25:03AM +0800, Baolu Lu wrote:
>>>> +	if (domain->type & __IOMMU_DOMAIN_PAGING) {
>>> It looks like this entire function is already never called for
>>> anything but paging?
>>>
>>> The only three callers are:
>>>
>>> 	.default_domain_ops = &(const struct iommu_domain_ops) {
>>> 		.attach_dev		= intel_iommu_attach_device,
>>> 		.set_dev_pasid		= intel_iommu_set_dev_pasid,
>>>
>>> and
>>>
>>> static const struct iommu_domain_ops intel_nested_domain_ops = {
>>> 	.attach_dev		= intel_nested_attach_dev,
>>>
>>> And none of those cases can be anything except a paging domain by
>>> definition.
>> A nested domain is not a paging domain. It represents a user-space page
>> table that nested on a parent paging domain. Perhaps I overlooked
>> anything?
> It only calls it on the s2_parent which is always a paging domain?
> 
> 	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);

Yea, you are right. I overlooked that part.  I'll remove the 'if'
statement and utilize a WARN_ON() function instead.

And also, I will rename this function with a meaningful name,some like
paging_domain_is_compatible()?

Thanks,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-15  2:52         ` Baolu Lu
@ 2024-10-15 12:48           ` Jason Gunthorpe
  2024-10-16  1:46             ` Baolu Lu
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2024-10-15 12:48 UTC (permalink / raw)
  To: Baolu Lu
  Cc: iommu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Yi Liu, Vasant Hegde, linux-kernel

On Tue, Oct 15, 2024 at 10:52:19AM +0800, Baolu Lu wrote:
> On 2024/10/15 3:24, Jason Gunthorpe wrote:
> > On Mon, Oct 14, 2024 at 09:25:03AM +0800, Baolu Lu wrote:
> > > > > +	if (domain->type & __IOMMU_DOMAIN_PAGING) {
> > > > It looks like this entire function is already never called for
> > > > anything but paging?
> > > > 
> > > > The only three callers are:
> > > > 
> > > > 	.default_domain_ops = &(const struct iommu_domain_ops) {
> > > > 		.attach_dev		= intel_iommu_attach_device,
> > > > 		.set_dev_pasid		= intel_iommu_set_dev_pasid,
> > > > 
> > > > and
> > > > 
> > > > static const struct iommu_domain_ops intel_nested_domain_ops = {
> > > > 	.attach_dev		= intel_nested_attach_dev,
> > > > 
> > > > And none of those cases can be anything except a paging domain by
> > > > definition.
> > > A nested domain is not a paging domain. It represents a user-space page
> > > table that nested on a parent paging domain. Perhaps I overlooked
> > > anything?
> > It only calls it on the s2_parent which is always a paging domain?
> > 
> > 	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
> 
> Yea, you are right. I overlooked that part.  I'll remove the 'if'
> statement and utilize a WARN_ON() function instead.
> 
> And also, I will rename this function with a meaningful name,some like
> paging_domain_is_compatible()?

That sounds good too

Ultimately you want to try to structure the driver so that there is a
struct paging_domain that is always the paging domain type and
everything is easy to understand. Don't re-use the same struct for
identity/blocked/nested domains.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach
  2024-10-15 12:48           ` Jason Gunthorpe
@ 2024-10-16  1:46             ` Baolu Lu
  0 siblings, 0 replies; 24+ messages in thread
From: Baolu Lu @ 2024-10-16  1:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Will Deacon, Robin Murphy,
	Kevin Tian, Yi Liu, Vasant Hegde, linux-kernel

On 2024/10/15 20:48, Jason Gunthorpe wrote:
> On Tue, Oct 15, 2024 at 10:52:19AM +0800, Baolu Lu wrote:
>> On 2024/10/15 3:24, Jason Gunthorpe wrote:
>>> On Mon, Oct 14, 2024 at 09:25:03AM +0800, Baolu Lu wrote:
>>>>>> +	if (domain->type & __IOMMU_DOMAIN_PAGING) {
>>>>> It looks like this entire function is already never called for
>>>>> anything but paging?
>>>>>
>>>>> The only three callers are:
>>>>>
>>>>> 	.default_domain_ops = &(const struct iommu_domain_ops) {
>>>>> 		.attach_dev		= intel_iommu_attach_device,
>>>>> 		.set_dev_pasid		= intel_iommu_set_dev_pasid,
>>>>>
>>>>> and
>>>>>
>>>>> static const struct iommu_domain_ops intel_nested_domain_ops = {
>>>>> 	.attach_dev		= intel_nested_attach_dev,
>>>>>
>>>>> And none of those cases can be anything except a paging domain by
>>>>> definition.
>>>> A nested domain is not a paging domain. It represents a user-space page
>>>> table that nested on a parent paging domain. Perhaps I overlooked
>>>> anything?
>>> It only calls it on the s2_parent which is always a paging domain?
>>>
>>> 	ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
>> Yea, you are right. I overlooked that part.  I'll remove the 'if'
>> statement and utilize a WARN_ON() function instead.
>>
>> And also, I will rename this function with a meaningful name,some like
>> paging_domain_is_compatible()?
> That sounds good too
> 
> Ultimately you want to try to structure the driver so that there is a
> struct paging_domain that is always the paging domain type and
> everything is easy to understand. Don't re-use the same struct for
> identity/blocked/nested domains.

Yes, agreed!

Thanks,
baolu

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-10-16  1:46 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-11  4:27 [PATCH 0/7] iommu/vt-d: Add domain_alloc_paging support Lu Baolu
2024-10-11  4:27 ` [PATCH 1/7] " Lu Baolu
2024-10-11 13:22   ` Jason Gunthorpe
2024-10-11 16:33     ` Jason Gunthorpe
2024-10-14  0:53     ` Baolu Lu
2024-10-14 19:23       ` Jason Gunthorpe
2024-10-11  4:27 ` [PATCH 2/7] iommu/vt-d: Remove unused domain_alloc callback Lu Baolu
2024-10-11 16:17   ` Jason Gunthorpe
2024-10-11  4:27 ` [PATCH 3/7] iommu/vt-d: Enhance compatibility check for paging domain attach Lu Baolu
2024-10-11 16:27   ` Jason Gunthorpe
2024-10-14  1:25     ` Baolu Lu
2024-10-14 19:24       ` Jason Gunthorpe
2024-10-15  2:52         ` Baolu Lu
2024-10-15 12:48           ` Jason Gunthorpe
2024-10-16  1:46             ` Baolu Lu
2024-10-11  4:27 ` [PATCH 4/7] iommu/vt-d: Remove domain_update_iommu_cap() Lu Baolu
2024-10-11 16:30   ` Jason Gunthorpe
2024-10-11  4:27 ` [PATCH 5/7] iommu/vt-d: Remove domain_update_iommu_superpage() Lu Baolu
2024-10-11 16:30   ` Jason Gunthorpe
2024-10-11  4:27 ` [PATCH 6/7] iommu/vt-d: Refactor first_level_by_default() Lu Baolu
2024-10-11 16:31   ` Jason Gunthorpe
2024-10-11  4:27 ` [PATCH 7/7] iommu/vt-d: Refine intel_iommu_domain_alloc_user() Lu Baolu
2024-10-11 11:32   ` Jason Gunthorpe
2024-10-11 16:31   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox