public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support
@ 2025-10-01  6:09 Suravee Suthikulpanit
  2025-10-01  6:09 ` [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

This series introduces support for AMD IOMMU nested page table translation
with the host (v1) and guest (v2) page tables.

In this mode, the AMD IOMMU driver configures the Device Table Entry (DTE)
with host page table root pointer, which is configured by allocating domain
with page table type IOMMU_HWPT_ALLOC_NEST_PARENT.

The guest page tables and Guest CR3 (GCR3) tables are managed by Guest OS,
and stored in the guest DTE (gDTE) in guest memory. VMM is responsible for
passing gDTE information to the host IOMMU driver using struct
iommu_hwpt_amd_guest when allocating a domain type IOMMU_DOMAIN_NESTED.
Then, the gDTE is parsed and program onto host DTE by the AMD IOMMU driver.

In addition, this series introduces base code for IOMMUFD vIOMMU for AMD
IOMMU, and implements vIOMMU-based nested domain allocation interface.
The struct nested_domain to store nested domain information, and
set_dte_nested() helper function to handle DTE programing for the nested
domain.

The series is separated into two parts:
 * Patch 1-7 are preparatory patches.
 * Patch 8-12 implement nest-parent and nested domains support
   for IOMMUFD vIOMMU.

Thanks,
Suravee

Note: This series is rebased on top of:
 * [PATCH v5] iommu/amd: Add support for hw_info for iommu capability query
   https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/T/#u 

Changes from V1:
(https://lore.kernel.org/linux-iommu/20250820113009.5233-1-suravee.suthikulpanit@amd.com/)
 * Patch 10: Introduce struct nested_domain instead of reusing
             struct protection_domain.
 * Patch 11: Introduce set_dte_nested() insted of reusing set_dte_entry().
 * Patch 12: Introduce struct iommu_viommu_amd and base code for IOMMUFD
             vIOMMU. This is mainly to allow nested domain integration with
             struct iommufd_viommu_ops.alloc_domain_nested for vIOMMU-based
             nesting support.

Suravee Suthikulpanit (12):
  iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
  iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
  iommu/amd: Make amd_iommu_pdom_id_free() non-static
  iommu/amd: Make amd_iommu_device_flush_dte() non-static
  iommu/amd: Make amd_iommu_update_dte256() non-static
  iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
  iommu/amd: Make amd_iommu_completion_wait() non-static
  iommufd: Introduce data struct for AMD nested domain allocation
  iommu/amd: Add support for nest parent domain allocation
  iommu/amd: Add support for nested domain allocation
  iommu/amd: Add support for nested domain attach/detach
  iommu/amd: Introduce IOMMUFD vIOMMU support for AMD

 drivers/iommu/amd/Makefile          |   2 +-
 drivers/iommu/amd/amd_iommu.h       |  33 ++++++
 drivers/iommu/amd/amd_iommu_types.h |  45 +++++--
 drivers/iommu/amd/init.c            |   3 +
 drivers/iommu/amd/iommu.c           | 178 ++++++++++++++++++++--------
 drivers/iommu/amd/iommufd.c         |   8 ++
 drivers/iommu/amd/iommufd.h         |   2 +
 drivers/iommu/amd/nested.c          | 172 +++++++++++++++++++++++++++
 include/uapi/linux/iommufd.h        |  30 +++++
 9 files changed, 413 insertions(+), 60 deletions(-)
 create mode 100644 drivers/iommu/amd/nested.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:12   ` Nicolin Chen
  2025-10-06 14:36   ` Jason Gunthorpe
  2025-10-01  6:09 ` [PATCH v2 02/12] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

Also change the define to use GENMASK_ULL instead.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 drivers/iommu/amd/iommu.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index a698a2e7ce2a..556f1df32d53 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -422,7 +422,7 @@
 
 #define DTE_FLAG_IOTLB	BIT_ULL(32)
 #define DTE_FLAG_MASK	(0x3ffULL << 32)
-#define DEV_DOMID_MASK	0xffffULL
+#define DTE_DOMID_MASK	GENMASK_ULL(15, 0)
 
 #define DTE_GCR3_14_12	GENMASK_ULL(60, 58)
 #define DTE_GCR3_30_15	GENMASK_ULL(31, 16)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b57a6993179d..a9b17d31a969 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2094,7 +2094,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
 	if (dev_data->ats_enabled)
 		new.data[1] |= DTE_FLAG_IOTLB;
 
-	old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+	old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
 	new.data[1] |= domid;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 02/12] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
  2025-10-01  6:09 ` [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:12   ` Nicolin Chen
  2025-10-01  6:09 ` [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To allow reuse in other files in subsequent patches.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h | 1 +
 drivers/iommu/amd/iommu.c     | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 9b4b589a54b5..6ea549816a1f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -26,6 +26,7 @@ void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
 void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
 void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
 				  gfp_t gfp, size_t size);
+int amd_iommu_pdom_id_alloc(void);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a9b17d31a969..78b3e5485006 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1818,7 +1818,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
  *
  ****************************************************************************/
 
-static int pdom_id_alloc(void)
+int amd_iommu_pdom_id_alloc(void)
 {
 	return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
 }
@@ -1906,7 +1906,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
 		return -EBUSY;
 
 	/* Allocate per device domain ID */
-	domid = pdom_id_alloc();
+	domid = amd_iommu_pdom_id_alloc();
 	if (domid <= 0)
 		return -ENOSPC;
 	gcr3_info->domid = domid;
@@ -2489,7 +2489,7 @@ struct protection_domain *protection_domain_alloc(void)
 	if (!domain)
 		return NULL;
 
-	domid = pdom_id_alloc();
+	domid = amd_iommu_pdom_id_alloc();
 	if (domid <= 0) {
 		kfree(domain);
 		return NULL;
@@ -2681,7 +2681,7 @@ void amd_iommu_init_identity_domain(void)
 	domain->ops = &identity_domain_ops;
 	domain->owner = &amd_iommu_ops;
 
-	identity_domain.id = pdom_id_alloc();
+	identity_domain.id = amd_iommu_pdom_id_alloc();
 
 	protection_domain_init(&identity_domain);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
  2025-10-01  6:09 ` [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
  2025-10-01  6:09 ` [PATCH v2 02/12] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:13   ` Nicolin Chen
  2025-10-06 14:37   ` Jason Gunthorpe
  2025-10-01  6:09 ` [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To allow reuse in other files in subsequent patches.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  1 +
 drivers/iommu/amd/iommu.c     | 10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 6ea549816a1f..322c8c73444a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -27,6 +27,7 @@ void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
 void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
 				  gfp_t gfp, size_t size);
 int amd_iommu_pdom_id_alloc(void);
+void amd_iommu_pdom_id_free(int id);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 78b3e5485006..0b61059e485d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1823,7 +1823,7 @@ int amd_iommu_pdom_id_alloc(void)
 	return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
 }
 
-static void pdom_id_free(int id)
+void amd_iommu_pdom_id_free(int id)
 {
 	ida_free(&pdom_ids, id);
 }
@@ -1870,7 +1870,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
 	gcr3_info->glx = 0;
 
 	/* Free per device domain ID */
-	pdom_id_free(gcr3_info->domid);
+	amd_iommu_pdom_id_free(gcr3_info->domid);
 
 	iommu_free_pages(gcr3_info->gcr3_tbl);
 	gcr3_info->gcr3_tbl = NULL;
@@ -1913,7 +1913,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
 
 	gcr3_info->gcr3_tbl = iommu_alloc_pages_node_sz(nid, GFP_ATOMIC, SZ_4K);
 	if (gcr3_info->gcr3_tbl == NULL) {
-		pdom_id_free(domid);
+		amd_iommu_pdom_id_free(domid);
 		return -ENOMEM;
 	}
 
@@ -2573,7 +2573,7 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
 	domain->pd_mode = pgtable;
 	ret = pdom_setup_pgtable(domain, dev);
 	if (ret) {
-		pdom_id_free(domain->id);
+		amd_iommu_pdom_id_free(domain->id);
 		kfree(domain);
 		return ERR_PTR(ret);
 	}
@@ -2631,7 +2631,7 @@ void amd_iommu_domain_free(struct iommu_domain *dom)
 	WARN_ON(!list_empty(&domain->dev_list));
 	if (domain->domain.type & __IOMMU_DOMAIN_PAGING)
 		free_io_pgtable_ops(&domain->iop.pgtbl.ops);
-	pdom_id_free(domain->id);
+	amd_iommu_pdom_id_free(domain->id);
 	kfree(domain);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:14   ` Nicolin Chen
  2025-10-06 14:37   ` Jason Gunthorpe
  2025-10-01  6:09 ` [PATCH v2 05/12] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To allow reuse in other files in subsequent patches.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h | 3 +++
 drivers/iommu/amd/iommu.c     | 8 ++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 322c8c73444a..079fb1d44c00 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -188,4 +188,7 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
 
+/* DTE */
+int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
+
 #endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0b61059e485d..fad74d2bc1b1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1562,7 +1562,7 @@ static int device_flush_dte_alias(struct pci_dev *pdev, u16 alias, void *data)
 /*
  * Command send function for invalidating a device table entry
  */
-static int device_flush_dte(struct iommu_dev_data *dev_data)
+int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data)
 {
 	struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
 	struct pci_dev *pdev = NULL;
@@ -1788,7 +1788,7 @@ void amd_iommu_update_and_flush_device_table(struct protection_domain *domain)
 	}
 
 	list_for_each_entry(dev_data, &domain->dev_list, list)
-		device_flush_dte(dev_data);
+		amd_iommu_device_flush_dte(dev_data);
 
 	domain_flush_complete(domain);
 }
@@ -2144,7 +2144,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
 		clear_dte_entry(iommu, dev_data);
 
 	clone_aliases(iommu, dev_data->dev);
-	device_flush_dte(dev_data);
+	amd_iommu_device_flush_dte(dev_data);
 	iommu_completion_wait(iommu);
 }
 
@@ -2874,7 +2874,7 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
 		spin_unlock(&dev_data->dte_lock);
 
 		/* Flush device DTE */
-		device_flush_dte(dev_data);
+		amd_iommu_device_flush_dte(dev_data);
 		domain_flush = true;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 05/12] iommu/amd: Make amd_iommu_update_dte256() non-static
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:15   ` Nicolin Chen
  2025-10-01  6:09 ` [PATCH v2 06/12] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To allow reuse in other files in subsequent patches.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  3 +++
 drivers/iommu/amd/iommu.c     | 11 ++++++-----
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 079fb1d44c00..eb46e8914eaf 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,5 +190,8 @@ struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
 
 /* DTE */
 int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
+void amd_iommu_update_dte256(struct amd_iommu *iommu,
+			     struct iommu_dev_data *dev_data,
+			     struct dev_table_entry *new);
 
 #endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index fad74d2bc1b1..3f2c61509b60 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -132,8 +132,9 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
  * This function is used only by code, which updates DMA translation part of the DTE.
  * So, only consider control bits related to DMA when updating the entry.
  */
-static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
-			  struct dev_table_entry *new)
+void amd_iommu_update_dte256(struct amd_iommu *iommu,
+			     struct iommu_dev_data *dev_data,
+			     struct dev_table_entry *new)
 {
 	unsigned long flags;
 	struct dev_table_entry *dev_table = get_dev_table(iommu);
@@ -413,7 +414,7 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 		ret = -EINVAL;
 		goto out;
 	}
-	update_dte256(iommu, alias_data, &new);
+	amd_iommu_update_dte256(iommu, alias_data, &new);
 
 	amd_iommu_set_rlookup_table(iommu, alias);
 out:
@@ -2109,7 +2110,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
 
 	set_dte_gcr3_table(iommu, dev_data, &new);
 
-	update_dte256(iommu, dev_data, &new);
+	amd_iommu_update_dte256(iommu, dev_data, &new);
 
 	/*
 	 * A kdump kernel might be replacing a domain ID that was copied from
@@ -2130,7 +2131,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
 	struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
 
 	make_clear_dte(dev_data, dte, &new);
-	update_dte256(iommu, dev_data, &new);
+	amd_iommu_update_dte256(iommu, dev_data, &new);
 }
 
 /* Update and flush DTE for the given device */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 06/12] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (4 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 05/12] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:17   ` Nicolin Chen
  2025-10-01  6:09 ` [PATCH v2 07/12] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To allow reuse in other files in subsequent patches.

Also, remove unused function parameter ptr.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  7 +++++++
 drivers/iommu/amd/iommu.c     | 13 ++-----------
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index eb46e8914eaf..c7cb4a80d44a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -193,5 +193,12 @@ int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
 void amd_iommu_update_dte256(struct amd_iommu *iommu,
 			     struct iommu_dev_data *dev_data,
 			     struct dev_table_entry *new);
+static inline void
+amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
+{
+	/* All existing DTE must have V bit set */
+	new->data128[0] = DTE_FLAG_V;
+	new->data128[1] = 0;
+}
 
 #endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3f2c61509b60..386ac96b2c02 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2008,14 +2008,6 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
 	return ret;
 }
 
-static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
-			   struct dev_table_entry *new)
-{
-	/* All existing DTE must have V bit set */
-	new->data128[0] = DTE_FLAG_V;
-	new->data128[1] = 0;
-}
-
 /*
  * Note:
  * The old value for GCR3 table and GPT have been cleared from caller.
@@ -2068,7 +2060,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
 	else
 		domid = domain->id;
 
-	make_clear_dte(dev_data, dte, &new);
+	amd_iommu_make_clear_dte(dev_data, &new);
 
 	if (domain->iop.mode != PAGE_MODE_NONE)
 		new.data[0] |= iommu_virt_to_phys(domain->iop.root);
@@ -2128,9 +2120,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
 static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
 {
 	struct dev_table_entry new = {};
-	struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
 
-	make_clear_dte(dev_data, dte, &new);
+	amd_iommu_make_clear_dte(dev_data, &new);
 	amd_iommu_update_dte256(iommu, dev_data, &new);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 07/12] iommu/amd: Make amd_iommu_completion_wait() non-static
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (5 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 06/12] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:24   ` Nicolin Chen
  2025-10-01  6:09 ` [PATCH v2 08/12] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To allow reuse in other files in subsequent patches

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  1 +
 drivers/iommu/amd/iommu.c     | 24 ++++++++++++------------
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index c7cb4a80d44a..d533bb8851ea 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -187,6 +187,7 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 				  u64 *root, int mode);
 struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+int amd_iommu_completion_wait(struct amd_iommu *iommu);
 
 /* DTE */
 int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 386ac96b2c02..e0bfcda678a8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1392,7 +1392,7 @@ static int iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd)
  * This function queues a completion wait command into the command
  * buffer of an IOMMU
  */
-static int iommu_completion_wait(struct amd_iommu *iommu)
+int amd_iommu_completion_wait(struct amd_iommu *iommu)
 {
 	struct iommu_cmd cmd;
 	unsigned long flags;
@@ -1431,7 +1431,7 @@ static void domain_flush_complete(struct protection_domain *domain)
 	 * We need to wait for completion of all commands.
 	 */
 	 xa_for_each(&domain->iommu_array, i, pdom_iommu_info)
-		iommu_completion_wait(pdom_iommu_info->iommu);
+		amd_iommu_completion_wait(pdom_iommu_info->iommu);
 }
 
 static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
@@ -1449,7 +1449,7 @@ static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
 
 	ret = iommu_flush_dte(iommu, devid);
 	if (!ret)
-		iommu_completion_wait(iommu);
+		amd_iommu_completion_wait(iommu);
 }
 
 static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
@@ -1460,7 +1460,7 @@ static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
 	for (devid = 0; devid <= last_bdf; ++devid)
 		iommu_flush_dte(iommu, devid);
 
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 /*
@@ -1479,7 +1479,7 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 		iommu_queue_command(iommu, &cmd);
 	}
 
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
@@ -1490,7 +1490,7 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
 			      dom_id, IOMMU_NO_PASID, false);
 	iommu_queue_command(iommu, &cmd);
 
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 static void amd_iommu_flush_all(struct amd_iommu *iommu)
@@ -1500,7 +1500,7 @@ static void amd_iommu_flush_all(struct amd_iommu *iommu)
 	build_inv_all(&cmd);
 
 	iommu_queue_command(iommu, &cmd);
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 static void iommu_flush_irt(struct amd_iommu *iommu, u16 devid)
@@ -1523,7 +1523,7 @@ static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
 	for (devid = 0; devid <= last_bdf; devid++)
 		iommu_flush_irt(iommu, devid);
 
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
@@ -1748,7 +1748,7 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
 	if (dev_data->ats_enabled)
 		device_flush_iotlb(dev_data, address, size, pasid, true);
 
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 static void dev_flush_pasid_all(struct iommu_dev_data *dev_data,
@@ -2137,7 +2137,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
 
 	clone_aliases(iommu, dev_data->dev);
 	amd_iommu_device_flush_dte(dev_data);
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 }
 
 /*
@@ -2421,7 +2421,7 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 
 out_err:
 
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 
 	if (FEATURE_NUM_INT_REMAP_SUP_2K(amd_iommu_efr2))
 		dev_data->max_irqs = MAX_IRQS_PER_TABLE_2K;
@@ -3255,7 +3255,7 @@ static struct irq_remap_table *alloc_irq_table(struct amd_iommu *iommu,
 		set_remap_table_entry(iommu, alias, table);
 
 out_wait:
-	iommu_completion_wait(iommu);
+	amd_iommu_completion_wait(iommu);
 
 out_unlock:
 	spin_unlock_irqrestore(&iommu_table_lock, flags);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 08/12] iommufd: Introduce data struct for AMD nested domain allocation
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (6 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 07/12] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 17:31   ` Nicolin Chen
  2025-10-01  6:09 ` [PATCH v2 09/12] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 include/uapi/linux/iommufd.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index efb52709c0a2..79d4ba43cd5f 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
 	__aligned_le64 ste[2];
 };
 
+/**
+ * struct iommu_hwpt_amd_guest - AMD IOMMU specific user-managed
+ *                               guest I/O page table data
+ * @dte: Guest Device Table Entry (DTE)
+ */
+struct iommu_hwpt_amd_guest {
+	__aligned_u64 dte[4];
+};
+
 /**
  * enum iommu_hwpt_data_type - IOMMU HWPT Data Type
  * @IOMMU_HWPT_DATA_NONE: no data
  * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
  * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
+ * @IOMMU_HWPT_DATA_AMD_GUEST: AMD IOMMU guest page table
  */
 enum iommu_hwpt_data_type {
 	IOMMU_HWPT_DATA_NONE = 0,
 	IOMMU_HWPT_DATA_VTD_S1 = 1,
 	IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
+	IOMMU_HWPT_DATA_AMD_GUEST = 3,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 09/12] iommu/amd: Add support for nest parent domain allocation
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (7 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 08/12] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 18:00   ` Nicolin Chen
  2025-10-01  6:09 ` [PATCH v2 10/12] iommu/amd: Add support for nested " Suravee Suthikulpanit
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

To support nested translation, the nest parent domain is allocated with
IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
table for stage 2 (i.e. GPA->SPA).

Also, only support nest parent domain on AMD system, which can support
the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
order to program DTE[GCR3 Table Root Pointer] with the GPA.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/iommu.c           | 27 +++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 556f1df32d53..d8c755b2045d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -107,6 +107,7 @@
 
 
 /* Extended Feature 2 Bits */
+#define FEATURE_GCR3TRPMODE	BIT_ULL(3)
 #define FEATURE_SNPAVICSUP	GENMASK_ULL(7, 5)
 #define FEATURE_SNPAVICSUP_GAM(x) \
 	(FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e0bfcda678a8..facee0f7a131 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2584,6 +2584,18 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
 	return &domain->domain;
 }
 
+static inline bool is_nest_parent_supported(u32 flags)
+{
+	/* Only allow nest parent when these features are supported */
+	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+	    (!check_feature(FEATURE_GT) ||
+	     !check_feature(FEATURE_GIOSUP) ||
+	     !check_feature2(FEATURE_GCR3TRPMODE)))
+		return false;
+
+	return true;
+}
+
 static struct iommu_domain *
 amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
 				    const struct iommu_user_data *user_data)
@@ -2591,15 +2603,22 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
 {
 	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
 	const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
-						IOMMU_HWPT_ALLOC_PASID;
+						IOMMU_HWPT_ALLOC_PASID |
+						IOMMU_HWPT_ALLOC_NEST_PARENT;
 
-	if ((flags & ~supported_flags) || user_data)
+	if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
 		return ERR_PTR(-EOPNOTSUPP);
 
 	switch (flags & supported_flags) {
 	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
-		/* Allocate domain with v1 page table for dirty tracking */
-		if (!amd_iommu_hd_support(iommu))
+	case IOMMU_HWPT_ALLOC_NEST_PARENT:
+	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+		/*
+		 * Allocate domain with v1 page table for dirty tracking
+		 * and/or Nest parent.
+		 */
+		if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+		    !amd_iommu_hd_support(iommu))
 			break;
 		return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
 	case IOMMU_HWPT_ALLOC_PASID:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (8 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 09/12] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 18:29   ` Nicolin Chen
  2025-10-06 14:49   ` Jason Gunthorpe
  2025-10-01  6:09 ` [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
  2025-10-01  6:09 ` [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD Suravee Suthikulpanit
  11 siblings, 2 replies; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest page tables. The struct iommu_hwpt_amd_guest
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/Makefile          |  2 +-
 drivers/iommu/amd/amd_iommu.h       |  5 +++
 drivers/iommu/amd/amd_iommu_types.h | 33 +++++++++-----
 drivers/iommu/amd/iommu.c           | 14 +++++-
 drivers/iommu/amd/nested.c          | 67 +++++++++++++++++++++++++++++
 5 files changed, 109 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/amd/nested.c

diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 5ae46d99a45b..afa12ca2110e 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o
-obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o
+obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o nested.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index d533bb8851ea..cc1f14899dfe 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -8,6 +8,7 @@
 #define AMD_IOMMU_H
 
 #include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
 
 #include "amd_iommu_types.h"
 
@@ -202,4 +203,8 @@ amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry
 	new->data128[1] = 0;
 }
 
+/* NESTED */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+			      const struct iommu_user_data *user_data);
 #endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index d8c755b2045d..ba27fad77b57 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -20,6 +20,8 @@
 #include <linux/irqreturn.h>
 #include <linux/io-pgtable.h>
 
+#include <uapi/linux/iommufd.h>
+
 /*
  * Maximum number of IOMMUs supported
  */
@@ -604,6 +606,25 @@ struct protection_domain {
 	struct list_head dev_data_list; /* List of pdom_dev_data */
 };
 
+/*
+ * Structure defining one entry in the device table
+ */
+struct dev_table_entry {
+	union {
+		u64 data[4];
+		u128 data128[2];
+	};
+};
+
+/*
+ * Nested domain is specifically used for nested translation
+ */
+struct nested_domain {
+	struct iommu_domain domain; /* generic domain handle used by iommu core code */
+	u16 id;	                    /* the domain id written to the device table */
+	struct dev_table_entry guest_dte; /* Guest vIOMMU DTE */
+};
+
 /*
  * This structure contains information about one PCI segment in the system.
  */
@@ -857,6 +878,8 @@ struct iommu_dev_data {
 	struct list_head list;		  /* For domain->dev_list */
 	struct llist_node dev_data_list;  /* For global dev_data_list */
 	struct protection_domain *domain; /* Domain the device is bound to */
+	struct protection_domain *parent; /* Parent Domain the device is bound to */
+	struct nested_domain *ndom;       /* Nested Domain the device is bound to */
 	struct gcr3_tbl_info gcr3_info;   /* Per-device GCR3 table */
 	struct device *dev;
 	u16 devid;			  /* PCI Device ID */
@@ -894,16 +917,6 @@ extern struct list_head amd_iommu_pci_seg_list;
  */
 extern struct list_head amd_iommu_list;
 
-/*
- * Structure defining one entry in the device table
- */
-struct dev_table_entry {
-	union {
-		u64 data[4];
-		u128 data128[2];
-	};
-};
-
 /*
  * Structure defining one entry in the command buffer
  */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index facee0f7a131..c1abb06126c1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2613,6 +2613,9 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
 	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
 	case IOMMU_HWPT_ALLOC_NEST_PARENT:
 	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+	{
+		struct iommu_domain *dom;
+
 		/*
 		 * Allocate domain with v1 page table for dirty tracking
 		 * and/or Nest parent.
@@ -2620,7 +2623,16 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
 		if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
 		    !amd_iommu_hd_support(iommu))
 			break;
-		return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
+
+		dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
+		if (!IS_ERR_VALUE(dom) &&
+		    (flags & IOMMU_HWPT_ALLOC_NEST_PARENT)) {
+			struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+
+			dev_data->parent = to_pdomain(dom);
+		}
+		return dom;
+	}
 	case IOMMU_HWPT_ALLOC_PASID:
 		/* Allocate domain with v2 page table if IOMMU supports PASID. */
 		if (!amd_iommu_pasid_supported())
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
new file mode 100644
index 000000000000..11a0237174bb
--- /dev/null
+++ b/drivers/iommu/amd/nested.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#define pr_fmt(fmt)     "AMD-Vi: " fmt
+#define dev_fmt(fmt)    pr_fmt(fmt)
+
+#include <linux/iommu.h>
+
+#include "amd_iommu.h"
+#include "amd_iommu_types.h"
+
+static const struct iommu_domain_ops nested_domain_ops = {
+	.free = amd_iommu_domain_free,
+};
+
+static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct nested_domain, domain);
+}
+
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+			      const struct iommu_user_data *user_data)
+{
+	int ret;
+	struct iommu_hwpt_amd_guest gdte;
+	struct nested_domain *ndom;
+
+	if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	/*
+	 * Need to make sure size of struct iommu_hwpt_amd_guest and
+	 * struct dev_table_entry are the same since it will be copied
+	 * from one to the other later on.
+	 */
+	if (WARN_ON(sizeof(struct dev_table_entry) != sizeof(gdte)))
+		return ERR_PTR(-EINVAL);
+
+	ret = iommu_copy_struct_from_user(&gdte, user_data,
+					  IOMMU_HWPT_DATA_AMD_GUEST,
+					  dte);
+	if (ret)
+		return ERR_PTR(ret);
+
+	ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
+	if (IS_ERR(ndom))
+		return ERR_PTR(-ENOMEM);
+
+	ndom->domain.ops = &nested_domain_ops;
+	ndom->domain.type = IOMMU_DOMAIN_NESTED;
+	memcpy(&ndom->guest_dte, &gdte, sizeof(struct dev_table_entry));
+
+	/* Due to possible aliasing issue use per-device nested domain ID */
+	ndom->id = amd_iommu_pdom_id_alloc();
+	if (ndom->id <= 0) {
+		ret = ndom->id;
+		goto out_err;
+	}
+
+	return &ndom->domain;
+out_err:
+	kfree(ndom);
+	return ERR_PTR(ret);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (9 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 10/12] iommu/amd: Add support for nested " Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 19:04   ` Nicolin Chen
  2025-10-06 14:59   ` Jason Gunthorpe
  2025-10-01  6:09 ` [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD Suravee Suthikulpanit
  11 siblings, 2 replies; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.

Also, enable the GCR3TRPMode feature when supported.

Note that nested translation is only supported with the GCR3TRP mode.
When it is enabled, the AMD IOMMU driver programs the GCR3 Table Root
Pointer field of the device table entry with the GPA provided by the guest.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h       |   2 +
 drivers/iommu/amd/amd_iommu_types.h |   3 +
 drivers/iommu/amd/init.c            |   3 +
 drivers/iommu/amd/nested.c          | 111 +++++++++++++++++++++++++++-
 4 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index cc1f14899dfe..924152973d11 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,6 +190,8 @@ struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
 int amd_iommu_completion_wait(struct amd_iommu *iommu);
 
+extern bool amd_iommu_snp_en;
+
 /* DTE */
 int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
 void amd_iommu_update_dte256(struct amd_iommu *iommu,
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index ba27fad77b57..9bc2e0e18978 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -188,6 +188,7 @@
 #define CONTROL_EPH_EN		45
 #define CONTROL_XT_EN		50
 #define CONTROL_INTCAPXT_EN	51
+#define CONTROL_GCR3TRPMODE	58
 #define CONTROL_IRTCACHEDIS	59
 #define CONTROL_SNPAVIC_EN	61
 
@@ -417,6 +418,8 @@
 #define DTE_FLAG_V	BIT_ULL(0)
 #define DTE_FLAG_TV	BIT_ULL(1)
 #define DTE_FLAG_HAD	(3ULL << 7)
+#define DTE_MODE_MASK	GENMASK_ULL(11, 9)
+#define DTE_FLAG_PPR	BIT_ULL(52)
 #define DTE_FLAG_GIOV	BIT_ULL(54)
 #define DTE_FLAG_GV	BIT_ULL(55)
 #define DTE_GLX		GENMASK_ULL(57, 56)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f2991c11867c..c45a4bd89569 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1122,6 +1122,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
 		return;
 
 	iommu_feature_enable(iommu, CONTROL_GT_EN);
+
+	if (check_feature2(FEATURE_GCR3TRPMODE))
+		iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
 }
 
 /* sets a specific bit in the device table entry. */
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 11a0237174bb..5a0c369ba283 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -11,9 +11,7 @@
 #include "amd_iommu.h"
 #include "amd_iommu_types.h"
 
-static const struct iommu_domain_ops nested_domain_ops = {
-	.free = amd_iommu_domain_free,
-};
+static const struct iommu_domain_ops nested_domain_ops;
 
 static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
 {
@@ -65,3 +63,110 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
 	kfree(ndom);
 	return ERR_PTR(ret);
 }
+
+static void set_dte_nested(struct amd_iommu *iommu,
+			   struct dev_table_entry *gdte,
+			   struct nested_domain *ndom,
+			   struct iommu_dev_data *dev_data)
+{
+	struct dev_table_entry *initial_dte;
+	struct dev_table_entry new = {0};
+	struct protection_domain *pdom = dev_data->parent;
+
+	if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
+		return;
+
+	amd_iommu_make_clear_dte(dev_data, &new);
+
+	new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
+	new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
+	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
+	new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);
+
+	if (pdom->dirty_tracking)
+		new.data[0] |= DTE_FLAG_HAD;
+
+	if (dev_data->ats_enabled)
+		new.data[1] |= DTE_FLAG_IOTLB;
+
+	/*
+	 * Use nested domain ID to program DTE.
+	 * See amd_iommu_alloc_domain_nested().
+	 */
+	new.data[1] |= ndom->id;
+
+	/*
+	 * Restore cached persistent DTE bits, which can be set by information
+	 * in IVRS table. See set_dev_entry_from_acpi().
+	 */
+	initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
+	if (initial_dte) {
+		new.data128[0] |= initial_dte->data128[0];
+		new.data128[1] |= initial_dte->data128[1];
+	}
+
+	/* Guest translation stuff */
+	new.data[0] |= (gdte->data[0] &
+		       (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
+
+	/* GCR3 table */
+	new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
+	new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
+
+	/* Guest paging mode */
+	new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);
+
+	amd_iommu_update_dte256(iommu, dev_data, &new);
+}
+
+static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
+{
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+	struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
+	struct nested_domain *ndom = to_ndomain(dom);
+	struct dev_table_entry *gdte = &ndom->guest_dte;
+	int ret = 0;
+
+	if (dev_data->ndom == ndom)
+		return ret;
+
+	if (!dev_is_pci(dev))
+		return -EINVAL;
+
+	/* Currently only support GCR3TRPMode with nested translation */
+	if (!check_feature2(FEATURE_GCR3TRPMODE))
+		return -EOPNOTSUPP;
+
+	/* We need to check host capability before setting the mode */
+	if ((FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->data[2]) == GUEST_PGTABLE_5_LEVEL) &&
+	    (amd_iommu_gpt_level < PAGE_MODE_5_LEVEL))
+		return -EOPNOTSUPP;
+
+	WARN_ON(dev_data->ndom);
+
+	dev_data->ndom = ndom;
+
+	mutex_lock(&dev_data->mutex);
+
+	/* Update device table entry */
+	set_dte_nested(iommu, gdte, ndom, dev_data);
+	amd_iommu_device_flush_dte(dev_data);
+	amd_iommu_completion_wait(iommu);
+
+	mutex_unlock(&dev_data->mutex);
+
+	return ret;
+}
+
+static void nested_domain_free(struct iommu_domain *dom)
+{
+	struct nested_domain *ndom = to_ndomain(dom);
+
+	amd_iommu_pdom_id_free(ndom->id);
+	kfree(ndom);
+}
+
+static const struct iommu_domain_ops nested_domain_ops = {
+	.attach_dev = nested_attach_device,
+	.free = nested_domain_free,
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD
  2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
                   ` (10 preceding siblings ...)
  2025-10-01  6:09 ` [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-10-01  6:09 ` Suravee Suthikulpanit
  2025-10-02 20:05   ` Nicolin Chen
  11 siblings, 1 reply; 42+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-01  6:09 UTC (permalink / raw)
  To: jgg, nicolinc
  Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit

Introduce struct amd_iommu_vminfo to store AMD HW-vIOMMU per-IOMMU data,
which is initialized when calling struct iommu_ops.viommu_init().

Currently, the struct amd_iommu_vminfo and amd_iommu_viommu_init() contain
base code to support nested domain allocation for vIOMMU using the struct
iommufd_viommu_ops.alloc_domain_nested.

Additional initialization will be added in subsequent patches.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu.h       | 10 +++++
 drivers/iommu/amd/amd_iommu_types.h |  6 +++
 drivers/iommu/amd/iommu.c           | 61 ++++++++++++++++++++++++++++-
 drivers/iommu/amd/iommufd.c         |  8 ++++
 drivers/iommu/amd/iommufd.h         |  2 +
 include/uapi/linux/iommufd.h        | 19 +++++++++
 6 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 924152973d11..6cb929b657e4 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -169,6 +169,16 @@ static inline struct amd_iommu *get_amd_iommu_from_dev_data(struct iommu_dev_dat
 	return iommu_get_iommu_dev(dev_data->dev, struct amd_iommu, iommu);
 }
 
+static inline struct amd_iommu *get_amd_iommu_from_devid(u16 devid)
+{
+	struct amd_iommu *iommu;
+
+	for_each_iommu(iommu)
+		if (iommu->devid == devid)
+			return iommu;
+	return NULL;
+}
+
 static inline struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct protection_domain, domain);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 9bc2e0e18978..dcecb5df2f72 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -17,6 +17,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/pci.h>
+#include <linux/iommufd.h>
 #include <linux/irqreturn.h>
 #include <linux/io-pgtable.h>
 
@@ -1117,6 +1118,11 @@ struct amd_irte_ops {
 	void (*clear_allocated)(struct irq_remap_table *, int);
 };
 
+struct amd_iommu_vminfo {
+	struct iommufd_viommu core;
+	u32 iommu_devid;
+};
+
 #ifdef CONFIG_IRQ_REMAP
 extern struct amd_irte_ops irte_32_ops;
 extern struct amd_irte_ops irte_128_ops;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c1abb06126c1..e3503091cd65 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3063,6 +3063,61 @@ static const struct iommu_dirty_ops amd_dirty_ops = {
 	.read_and_clear_dirty = amd_iommu_read_and_clear_dirty,
 };
 
+static size_t amd_iommu_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type)
+{
+	if (viommu_type != IOMMU_VIOMMU_TYPE_AMD)
+		return 0;
+
+	return VIOMMU_STRUCT_SIZE(struct amd_iommu_vminfo, core);
+}
+
+/*
+ * This is called from the drivers/iommu/iommufd/viommu.c: iommufd_viommu_alloc_ioctl
+ */
+static int amd_iommu_viommu_init(struct iommufd_viommu *viommu,
+				 struct iommu_domain *parent,
+				 const struct iommu_user_data *user_data)
+{
+#if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
+	int ret;
+	struct amd_iommu *iommu;
+	struct iommu_viommu_amd data;
+	struct amd_iommu_vminfo *vminfo = container_of(viommu, struct amd_iommu_vminfo, core);
+
+	if (!user_data)
+		return -EINVAL;
+
+	ret = iommu_copy_struct_from_user(&data, user_data,
+					  IOMMU_VIOMMU_TYPE_AMD,
+					  reserved);
+	if (ret)
+		return ret;
+
+	iommu = get_amd_iommu_from_devid(data.iommu_devid);
+	if (!iommu)
+		return -ENODEV;
+
+	vminfo->iommu_devid = data.iommu_devid;
+
+	/* TODO: Add AMD HW-vIOMMU initialization code */
+
+	ret = iommu_copy_struct_to_user(user_data, &data,
+					IOMMU_VIOMMU_TYPE_AMD,
+					reserved);
+	if (ret)
+		goto err_out;
+
+	viommu->ops = &amd_viommu_ops;
+
+	return 0;
+
+err_out:
+	return ret;
+#else
+	return -EOPNOTSUPP;
+#endif /* CONFIG_AMD_IOMMU_IOMMUFD */
+}
+
 const struct iommu_ops amd_iommu_ops = {
 	.capable = amd_iommu_capable,
 	.hw_info = amd_iommufd_hw_info,
@@ -3088,7 +3143,11 @@ const struct iommu_ops amd_iommu_ops = {
 		.iotlb_sync	= amd_iommu_iotlb_sync,
 		.free		= amd_iommu_domain_free,
 		.enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
-	}
+	},
+
+	/* For VIOMMU */
+	.get_viommu_size = amd_iommu_get_viommu_size,
+	.viommu_init = amd_iommu_viommu_init,
 };
 
 #ifdef CONFIG_IRQ_REMAP
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
index 72eaaa923d04..d81e4ad17d9d 100644
--- a/drivers/iommu/amd/iommufd.c
+++ b/drivers/iommu/amd/iommufd.c
@@ -29,3 +29,11 @@ void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
 
 	return hwinfo;
 }
+
+/*
+ * See include/linux/iommufd.h
+ * struct iommufd_viommu_ops - vIOMMU specific operations
+ */
+const struct iommufd_viommu_ops amd_viommu_ops = {
+	.alloc_domain_nested = amd_iommu_alloc_domain_nested,
+};
diff --git a/drivers/iommu/amd/iommufd.h b/drivers/iommu/amd/iommufd.h
index f880be80a30d..0d59ef160780 100644
--- a/drivers/iommu/amd/iommufd.h
+++ b/drivers/iommu/amd/iommufd.h
@@ -7,6 +7,8 @@
 #define AMD_IOMMUFD_H
 
 #if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
+extern const struct iommufd_viommu_ops amd_viommu_ops;
+
 void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type);
 #else
 #define amd_iommufd_hw_info NULL
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 79d4ba43cd5f..e7084dbc5c95 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -1038,11 +1038,13 @@ struct iommu_fault_alloc {
  * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type
  * @IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV (extension for ARM
  *                                    SMMUv3) enabled ARM SMMUv3 type
+ * @IOMMU_VIOMMU_TYPE_AMD: AMD HW-vIOMMU type
  */
 enum iommu_viommu_type {
 	IOMMU_VIOMMU_TYPE_DEFAULT = 0,
 	IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1,
 	IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV = 2,
+	IOMMU_VIOMMU_TYPE_AMD = 3,
 };
 
 /**
@@ -1061,6 +1063,23 @@ struct iommu_viommu_tegra241_cmdqv {
 	__aligned_u64 out_vintf_mmap_length;
 };
 
+/**
+ * struct iommu_viommu_amd - AMD vIOMMU Interface (IOMMU_VIOMMU_TYPE_AMD)
+ * @iommu_devid: Host IOMMU PCI device ID
+ * @viommu_devid: Guest vIOMMU PCI device ID
+ * @trans_devid: GPA->GVA translation device ID (host)
+ * @out_gid: (out) Guest ID
+ * @out_vfmmio_mmap_offset: (out) mmap offset for vIOMMU VF-MMIO
+ */
+struct iommu_viommu_amd {
+	__u32 iommu_devid;
+	__u32 viommu_devid;
+	__u32 trans_devid;
+	__u32 out_gid;
+	__aligned_u64 out_vfmmio_mmap_offset;
+	__u32 reserved; /* must be last */
+};
+
 /**
  * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC)
  * @size: sizeof(struct iommu_viommu_alloc)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
  2025-10-01  6:09 ` [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-10-02 17:12   ` Nicolin Chen
  2025-10-06 14:36   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:12 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:43AM +0000, Suravee Suthikulpanit wrote:
> Also change the define to use GENMASK_ULL instead.
> There is no functional change.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
 
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 02/12] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
  2025-10-01  6:09 ` [PATCH v2 02/12] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-10-02 17:12   ` Nicolin Chen
  0 siblings, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:12 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:44AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static
  2025-10-01  6:09 ` [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
@ 2025-10-02 17:13   ` Nicolin Chen
  2025-10-06 14:37   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:13 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:45AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static
  2025-10-01  6:09 ` [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
@ 2025-10-02 17:14   ` Nicolin Chen
  2025-10-06 14:37   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:14 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:46AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 05/12] iommu/amd: Make amd_iommu_update_dte256() non-static
  2025-10-01  6:09 ` [PATCH v2 05/12] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
@ 2025-10-02 17:15   ` Nicolin Chen
  0 siblings, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:15 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:47AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
 
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
  2025-10-01  6:09 ` [PATCH v2 06/12] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-10-02 17:17   ` Nicolin Chen
  0 siblings, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:17 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:48AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Also, remove unused function parameter ptr.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 07/12] iommu/amd: Make amd_iommu_completion_wait() non-static
  2025-10-01  6:09 ` [PATCH v2 07/12] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
@ 2025-10-02 17:24   ` Nicolin Chen
  0 siblings, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:24 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:49AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches

Nit: all these patches should probably drop "in subsequent patches"
because they will eventually be commits v.s. patches after getting
applied. Instead, we could just say:

This will be reused in a new iommufd.c file for nested translation.

> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 08/12] iommufd: Introduce data struct for AMD nested domain allocation
  2025-10-01  6:09 ` [PATCH v2 08/12] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-10-02 17:31   ` Nicolin Chen
  0 siblings, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 17:31 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:50AM +0000, Suravee Suthikulpanit wrote:
> Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
> which is used for stage-1 in nested translation. The data structure
> contains information necessary for setting up the AMD HW-vIOMMU support.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With a nit:

> @@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
>  	__aligned_le64 ste[2];
>  };
>  
> +/**
> + * struct iommu_hwpt_amd_guest - AMD IOMMU specific user-managed
> + *                               guest I/O page table data

Following the pattern of the other two types, I'd prefer:

/*
 * struct iommu_hwpt_amd_guest - AMD IOMMU guest I/O page table data
 *                               (IOMMU_HWPT_DATA_AMD_GUEST)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 09/12] iommu/amd: Add support for nest parent domain allocation
  2025-10-01  6:09 ` [PATCH v2 09/12] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
@ 2025-10-02 18:00   ` Nicolin Chen
  2025-10-06 14:43     ` Jason Gunthorpe
  2025-10-08 14:16     ` Suthikulpanit, Suravee
  0 siblings, 2 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 18:00 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:51AM +0000, Suravee Suthikulpanit wrote:
> To support nested translation, the nest parent domain is allocated with
> IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
> table for stage 2 (i.e. GPA->SPA).
> 
> Also, only support nest parent domain on AMD system, which can support
> the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
> order to program DTE[GCR3 Table Root Pointer] with the GPA.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

> +static inline bool is_nest_parent_supported(u32 flags)
> +{
> +	/* Only allow nest parent when these features are supported */
> +	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
> +	    (!check_feature(FEATURE_GT) ||
> +	     !check_feature(FEATURE_GIOSUP) ||
> +	     !check_feature2(FEATURE_GCR3TRPMODE)))
> +		return false;
> +
> +	return true;

If the "flags" doesn't have IOMMU_HWPT_ALLOC_NEST_PARENT while one
of the features is missing, this would return true, indicating the
HW supports nesting parent, which will be logically wrong although
it does validate the nest parent flag.

Following the existing coding style, I think this could be just:

static inline bool is_nest_parent_supported(void)
{
	/* Only allow nest parent when these features are supported */
	return check_feature(FEATURE_GT) && check_feature(FEATURE_GIOSUP) &&
	       check_feature2(FEATURE_GCR3TRPMODE);
}

> @@ -2591,15 +2603,22 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
>  {
>  	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
>  	const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> -						IOMMU_HWPT_ALLOC_PASID;
> +						IOMMU_HWPT_ALLOC_PASID |
> +						IOMMU_HWPT_ALLOC_NEST_PARENT;
>  
> -	if ((flags & ~supported_flags) || user_data)
> +	if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
>  		return ERR_PTR(-EOPNOTSUPP);

Un-change this, given the code below is already validating flags.

>  	switch (flags & supported_flags) {
>  	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
> -		/* Allocate domain with v1 page table for dirty tracking */
> -		if (!amd_iommu_hd_support(iommu))
> +	case IOMMU_HWPT_ALLOC_NEST_PARENT:
> +	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
> +		/*
> +		 * Allocate domain with v1 page table for dirty tracking
> +		 * and/or Nest parent.
> +		 */
> +		if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
> +		    !amd_iommu_hd_support(iommu))
>  			break;

And add here:
		if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT)) &&
		    !is_nest_parent_supported())
			    break;

Otherwise, this looks good to me:

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-01  6:09 ` [PATCH v2 10/12] iommu/amd: Add support for nested " Suravee Suthikulpanit
@ 2025-10-02 18:29   ` Nicolin Chen
  2025-10-06 14:49   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 18:29 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:52AM +0000, Suravee Suthikulpanit wrote:
> The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
> stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
> table along with guest page tables. The struct iommu_hwpt_amd_guest
> contains this information, and is passed from user-space as a parameter
> of the struct iommu_ops.domain_alloc_nested().
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index d533bb8851ea..cc1f14899dfe 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -8,6 +8,7 @@
>  #define AMD_IOMMU_H
>  
>  #include <linux/iommu.h>
> +#include <uapi/linux/iommufd.h>
>  
>  #include "amd_iommu_types.h"

amd_iommu_types.h is adding the uAPI header in this patch too. So,
this new "include" here is redundant.

> +++ b/drivers/iommu/amd/nested.c
> @@ -0,0 +1,67 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2025 Advanced Micro Devices, Inc.
> + */
> +
> +#define pr_fmt(fmt)     "AMD-Vi: " fmt
> +#define dev_fmt(fmt)    pr_fmt(fmt)

#define pr_fmt(fmt)     "AMD-Vi: " fmt
#define dev_fmt		pr_fmt

> +
> +#include <linux/iommu.h>
> +
> +#include "amd_iommu.h"
> +#include "amd_iommu_types.h"

amd_iommu.h covers both linux/iommu.h and amd_iommu_types.h.

> +struct iommu_domain *
> +amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> +			      const struct iommu_user_data *user_data)
> +{
> +	int ret;
> +	struct iommu_hwpt_amd_guest gdte;
> +	struct nested_domain *ndom;
> +
> +	if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	/*
> +	 * Need to make sure size of struct iommu_hwpt_amd_guest and
> +	 * struct dev_table_entry are the same since it will be copied
> +	 * from one to the other later on.
> +	 */
> +	if (WARN_ON(sizeof(struct dev_table_entry) != sizeof(gdte)))
> +		return ERR_PTR(-EINVAL);

	static_assert(sizeof(struct dev_table_entry) == sizeof(gdte));

> +
> +	ret = iommu_copy_struct_from_user(&gdte, user_data,
> +					  IOMMU_HWPT_DATA_AMD_GUEST,
> +					  dte);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
> +	if (IS_ERR(ndom))
> +		return ERR_PTR(-ENOMEM);

	if (!ndom)
		return ERR_PTR(-ENOMEM);

> +
> +	ndom->domain.ops = &nested_domain_ops;
> +	ndom->domain.type = IOMMU_DOMAIN_NESTED;
> +	memcpy(&ndom->guest_dte, &gdte, sizeof(struct dev_table_entry));
> +
> +	/* Due to possible aliasing issue use per-device nested domain ID */
> +	ndom->id = amd_iommu_pdom_id_alloc();
> +	if (ndom->id <= 0) {
> +		ret = ndom->id;
> +		goto out_err;

if ndom->id = 0, ret = 0, meaning this function does not fail..

Given that ida_alloc_range has the range [1, MAX_DOMAIN_ID - 1],
0 should never be returned. So just check if (ndom->id < 0) will
be enough.

Or you could override the "ret" entirely like other places do:

	if (ndom->id <= 0) {
		ret = -ENOSPC;
		goto out_err;
	}

With all these fixed,

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-01  6:09 ` [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-10-02 19:04   ` Nicolin Chen
  2025-10-06 14:59   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 19:04 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
>  /* sets a specific bit in the device table entry. */
> diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
> index 11a0237174bb..5a0c369ba283 100644
> --- a/drivers/iommu/amd/nested.c
> +++ b/drivers/iommu/amd/nested.c
> @@ -11,9 +11,7 @@
>  #include "amd_iommu.h"
>  #include "amd_iommu_types.h"
>  
> -static const struct iommu_domain_ops nested_domain_ops = {
> -	.free = amd_iommu_domain_free,
> -};

Oh no, amd_iommu_domain_free() with to_pdomain() won't work. So
you should move the nested_domain_free() to the previous patch,
pairing with amd_iommu_alloc_domain_nested().

> +static void set_dte_nested(struct amd_iommu *iommu,
> +			   struct dev_table_entry *gdte,
> +			   struct nested_domain *ndom,
> +			   struct iommu_dev_data *dev_data)
> +{
> +	struct dev_table_entry *initial_dte;
> +	struct dev_table_entry new = {0};
> +	struct protection_domain *pdom = dev_data->parent;
> +
> +	if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
> +		return;
> +
> +	amd_iommu_make_clear_dte(dev_data, &new);
> +
> +	new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
> +	new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
> +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> +	new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);

	new.data[0] |= DTE_FLAG_PPR & gdte->data[0];

> +	/* Guest translation stuff */
> +	new.data[0] |= (gdte->data[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));

	new.data[0] |= gdte->data[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV);
> +
> +	/* GCR3 table */
> +	new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
> +	new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
> +
> +	/* Guest paging mode */
> +	new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);

All these outer parentheses are redundant.

> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> +	struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> +	struct nested_domain *ndom = to_ndomain(dom);
> +	struct dev_table_entry *gdte = &ndom->guest_dte;
> +	int ret = 0;
> +
> +	if (dev_data->ndom == ndom)
> +		return ret;
> +
> +	if (!dev_is_pci(dev))
> +		return -EINVAL;
> +
> +	/* Currently only support GCR3TRPMode with nested translation */
> +	if (!check_feature2(FEATURE_GCR3TRPMODE))
> +		return -EOPNOTSUPP;

The amd_iommu_alloc_domain_nested() should probably validate this
feature, so !FEATURE_GCR3TRPMODE wouldn't allocate a nested domain
at the first place, and then no need to revalidate it in attach().

> +
> +	/* We need to check host capability before setting the mode */
> +	if ((FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->data[2]) == GUEST_PGTABLE_5_LEVEL) &&
> +	    (amd_iommu_gpt_level < PAGE_MODE_5_LEVEL))
> +		return -EOPNOTSUPP;

Ditto.

The attach callback function should only check things related to
the compatibility between a device and a domain, while this is a
domain specific validation. Better do it in alloc() IMHO.

> +	WARN_ON(dev_data->ndom);

	return -EBUSY;
?

With these fixed,

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD
  2025-10-01  6:09 ` [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD Suravee Suthikulpanit
@ 2025-10-02 20:05   ` Nicolin Chen
  2025-10-06 15:01     ` Jason Gunthorpe
  0 siblings, 1 reply; 42+ messages in thread
From: Nicolin Chen @ 2025-10-02 20:05 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:54AM +0000, Suravee Suthikulpanit wrote:
> Introduce struct amd_iommu_vminfo to store AMD HW-vIOMMU per-IOMMU data,
> which is initialized when calling struct iommu_ops.viommu_init().
> 
> Currently, the struct amd_iommu_vminfo and amd_iommu_viommu_init() contain
> base code to support nested domain allocation for vIOMMU using the struct
> iommufd_viommu_ops.alloc_domain_nested.
> 
> Additional initialization will be added in subsequent patches.

This is the last patch in the series. You mean some future patch?

And pls briefly elaborate what is the additional initialization for.

> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index c1abb06126c1..e3503091cd65 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3063,6 +3063,61 @@ static const struct iommu_dirty_ops amd_dirty_ops = {
>  	.read_and_clear_dirty = amd_iommu_read_and_clear_dirty,
>  };
>  
> +static size_t amd_iommu_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type)
> +{
> +	if (viommu_type != IOMMU_VIOMMU_TYPE_AMD)
> +		return 0;
> +
> +	return VIOMMU_STRUCT_SIZE(struct amd_iommu_vminfo, core);
> +}
> +
> +/*
> + * This is called from the drivers/iommu/iommufd/viommu.c: iommufd_viommu_alloc_ioctl
> + */
> +static int amd_iommu_viommu_init(struct iommufd_viommu *viommu,
> +static int amd_iommu_viommu_init(struct iommufd_viommu *viommu,
> +				 struct iommu_domain *parent,
> +				 const struct iommu_user_data *user_data)
> +{
> +#if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)

These two should be put in the iommufd.c file, and the header
should define the followings (next to amd_iommufd_hw_info):

 #if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
 void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type);
+size_t amd_iommu_get_viommu_size(struct device *dev,
+				 enum iommu_viommu_type viommu_type);
+int amd_iommu_viommu_init(struct iommufd_viommu *viommu,
+			  struct iommu_domain *parent,
+			  const struct iommu_user_data *user_data);
 else
 #define amd_iommufd_hw_info NULL
+#define amd_iommu_get_viommu_size NULL
+#define amd_iommu_viommu_init NULL
 #endif

The core would return -EOPNOTSUPP if either of them is NULL.

> +	if (ret)
> +		return ret;
> +
> +	iommu = get_amd_iommu_from_devid(data.iommu_devid);
> +	if (!iommu)
> +		return -ENODEV;
> +
> +	vminfo->iommu_devid = data.iommu_devid;

If you want the struct amd_iommu pointer, do this instead:

	iommu = container_of(viommu->iommu_dev, struct amd_iommu, iommu);

Then iommu_devid and get_amd_iommu_from_devid() aren't used anywhere
else. So both could be dropped.

> +/*
> + * See include/linux/iommufd.h
> + * struct iommufd_viommu_ops - vIOMMU specific operations
> + */
> +const struct iommufd_viommu_ops amd_viommu_ops = {
> +	.alloc_domain_nested = amd_iommu_alloc_domain_nested,
> +};

Unfortunately, a viommu_ops with alloc_domain_nested is incomplete,
IMHO. If this series gets merged alone, it declares that the kernel
now supports AMD IOMMU's virtualization, which actually won't work
without a cache invalidation op (SW) or hw_queue (HW-acceleration).

> diff --git a/drivers/iommu/amd/iommufd.h b/drivers/iommu/amd/iommufd.h
> index f880be80a30d..0d59ef160780 100644
> --- a/drivers/iommu/amd/iommufd.h
> +++ b/drivers/iommu/amd/iommufd.h
> @@ -7,6 +7,8 @@
>  #define AMD_IOMMUFD_H
>  
>  #if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
> +extern const struct iommufd_viommu_ops amd_viommu_ops;

You don't need this if defining the amd_iommu_get_viommu_size and
amd_iommu_viommu_init here.

> +/**
> + * struct iommu_viommu_amd - AMD vIOMMU Interface (IOMMU_VIOMMU_TYPE_AMD)
> + * @iommu_devid: Host IOMMU PCI device ID

Though I don't think you need this, just curious, how does user
space know the host PCI device ID?

> + * @viommu_devid: Guest vIOMMU PCI device ID

This isn't used in the patch, so I am not sure. But it sounds like
a vDEVICE virt_id to me. Though it probably works for the AMD case
because AMD IOMMU is per PCI device (IIRC), I think it'd be better
to define a vDEVICE forwarding this ID. I don't know, but for CC,
AMD might end up with a vDEVICE object anyway?

> + * @trans_devid: GPA->GVA translation device ID (host)

This is unclear to me either, and it's not being used, so I cannot
tell what it is for. But it feels like a part of vDEVICE object..

> + * @out_gid: (out) Guest ID

Again, not being used, needs some elaboration.

> + * @out_vfmmio_mmap_offset: (out) mmap offset for vIOMMU VF-MMIO

"(out)" is redudant.

@out_vfmmio_mmap_offset: mmap offset argument for vIOMMU VF-MMIO

And you could have an @out_vfmmio_mmap_length too, even if it is
just for validation in the iommufd core.

> + */
> +struct iommu_viommu_amd {
> +	__u32 iommu_devid;
> +	__u32 viommu_devid;
> +	__u32 trans_devid;
> +	__u32 out_gid;
> +	__aligned_u64 out_vfmmio_mmap_offset;
> +	__u32 reserved; /* must be last */

"reserved" is useless here.

Usually we add reserved field for paddings, and we usually call
it "__reserved".

struct iommu_viommu_amd is perfectly 64-bit aligned without the
last "reserved" field. I would drop it.

Nicolin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
  2025-10-01  6:09 ` [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
  2025-10-02 17:12   ` Nicolin Chen
@ 2025-10-06 14:36   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 14:36 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:43AM +0000, Suravee Suthikulpanit wrote:
> Also change the define to use GENMASK_ULL instead.
> There is no functional change.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu_types.h | 2 +-
>  drivers/iommu/amd/iommu.c           | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static
  2025-10-01  6:09 ` [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
  2025-10-02 17:13   ` Nicolin Chen
@ 2025-10-06 14:37   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 14:37 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:45AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu.h |  1 +
>  drivers/iommu/amd/iommu.c     | 10 +++++-----
>  2 files changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static
  2025-10-01  6:09 ` [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
  2025-10-02 17:14   ` Nicolin Chen
@ 2025-10-06 14:37   ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 14:37 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:46AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu.h | 3 +++
>  drivers/iommu/amd/iommu.c     | 8 ++++----
>  2 files changed, 7 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 09/12] iommu/amd: Add support for nest parent domain allocation
  2025-10-02 18:00   ` Nicolin Chen
@ 2025-10-06 14:43     ` Jason Gunthorpe
  2025-10-08 14:16     ` Suthikulpanit, Suravee
  1 sibling, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 14:43 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
	kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Thu, Oct 02, 2025 at 11:00:14AM -0700, Nicolin Chen wrote:

> >  	switch (flags & supported_flags) {
> >  	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
> > -		/* Allocate domain with v1 page table for dirty tracking */
> > -		if (!amd_iommu_hd_support(iommu))
> > +	case IOMMU_HWPT_ALLOC_NEST_PARENT:
> > +	case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
> > +		/*
> > +		 * Allocate domain with v1 page table for dirty tracking
> > +		 * and/or Nest parent.
> > +		 */
> > +		if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
> > +		    !amd_iommu_hd_support(iommu))
> >  			break;
> 
> And add here:
> 		if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT)) &&
> 		    !is_nest_parent_supported())
> 			    break;

Yeah this looks better

On VT-D we have V1/V2 specific alloc functions and these tests were
put at the top of those functions in this form. This is close enough,
it is helpful to be consistent.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-01  6:09 ` [PATCH v2 10/12] iommu/amd: Add support for nested " Suravee Suthikulpanit
  2025-10-02 18:29   ` Nicolin Chen
@ 2025-10-06 14:49   ` Jason Gunthorpe
  2025-10-07 20:36     ` Suthikulpanit, Suravee
  1 sibling, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 14:49 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:52AM +0000, Suravee Suthikulpanit wrote:

> @@ -857,6 +878,8 @@ struct iommu_dev_data {
>  	struct list_head list;		  /* For domain->dev_list */
>  	struct llist_node dev_data_list;  /* For global dev_data_list */
>  	struct protection_domain *domain; /* Domain the device is bound to */
> +	struct protection_domain *parent; /* Parent Domain the device is bound to */
> +	struct nested_domain *ndom;       /* Nested Domain the device is bound to */

These two should never be needed. domain should point to the nested
domain and it can get the other information from it.

> @@ -2620,7 +2623,16 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
>  		if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
>  		    !amd_iommu_hd_support(iommu))
>  			break;
> -		return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> +
> +		dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> +		if (!IS_ERR_VALUE(dom) &&
> +		    (flags & IOMMU_HWPT_ALLOC_NEST_PARENT)) {
> +			struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> +
> +			dev_data->parent = to_pdomain(dom);

We talked about this many times already. Domain allocation DOES NOT
CHANGE the dev_data.

> +#define pr_fmt(fmt)     "AMD-Vi: " fmt
> +#define dev_fmt(fmt)    pr_fmt(fmt)

I'm not sure these make sense, you should not be using pr_fmt, use a
dev_XX on the iommu instance.

> +struct iommu_domain *
> +amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> +			      const struct iommu_user_data *user_data)
> +{
> +	int ret;
> +	struct iommu_hwpt_amd_guest gdte;
> +	struct nested_domain *ndom;
> +
> +	if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	/*
> +	 * Need to make sure size of struct iommu_hwpt_amd_guest and
> +	 * struct dev_table_entry are the same since it will be copied
> +	 * from one to the other later on.
> +	 */
> +	if (WARN_ON(sizeof(struct dev_table_entry) != sizeof(gdte)))
> +		return ERR_PTR(-EINVAL);

use static_assert for something like this.

> +
> +	ret = iommu_copy_struct_from_user(&gdte, user_data,
> +					  IOMMU_HWPT_DATA_AMD_GUEST,
> +					  dte);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
> +	if (IS_ERR(ndom))
> +		return ERR_PTR(-ENOMEM);
> +
> +	ndom->domain.ops = &nested_domain_ops;
> +	ndom->domain.type = IOMMU_DOMAIN_NESTED;
> +	memcpy(&ndom->guest_dte, &gdte, sizeof(struct dev_table_entry));
> +
> +	/* Due to possible aliasing issue use per-device nested domain ID */
> +	ndom->id = amd_iommu_pdom_id_alloc();

I've forgotten the details, but doesn't the gdet have the virtual
domain ID in side it? Shouldn't this be going to the viommu struct and
mapping virtual domain IDs to physical ones so they can be shared if
the guest says it is safe to share them? I guess that is an
optimization, but it should have a note here explaining it.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-01  6:09 ` [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
  2025-10-02 19:04   ` Nicolin Chen
@ 2025-10-06 14:59   ` Jason Gunthorpe
  2025-10-07 19:22     ` Suthikulpanit, Suravee
  1 sibling, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 14:59 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
> +static void set_dte_nested(struct amd_iommu *iommu,
> +			   struct dev_table_entry *gdte,
> +			   struct nested_domain *ndom,
> +			   struct iommu_dev_data *dev_data)
> +{
> +	struct dev_table_entry *initial_dte;
> +	struct dev_table_entry new = {0};
> +	struct protection_domain *pdom = dev_data->parent;

No, this is ndom->parent.

The parent is NOT required to be attached to the device already.

> +	if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
> +		return;
> +
> +	amd_iommu_make_clear_dte(dev_data, &new);
> +
> +	new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
> +	new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
> +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> +	new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);

> +	if (pdom->dirty_tracking)
> +		new.data[0] |= DTE_FLAG_HAD;
> +
> +	if (dev_data->ats_enabled)
> +		new.data[1] |= DTE_FLAG_IOTLB;

This sequence should be in some set_dte_gcr3() ??

> +	/*
> +	 * Restore cached persistent DTE bits, which can be set by information
> +	 * in IVRS table. See set_dev_entry_from_acpi().
> +	 */
> +	initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
> +	if (initial_dte) {
> +		new.data128[0] |= initial_dte->data128[0];
> +		new.data128[1] |= initial_dte->data128[1];
> +	}

This should go into amd_iommu_make_clear_dte() I think, and refactor
it out of iommu_update_dte256() ?

Every created DTE needs these bits set, right?

> +
> +	/* Guest translation stuff */
> +	new.data[0] |= (gdte->data[0] &
> +		       (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
> +
> +	/* GCR3 table */
> +	new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
> +	new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
> +
> +	/* Guest paging mode */
> +	new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);

I didn't see anything validating gdte has only permitted set bits in
the prior patch?

If this is goint to decode array item by item then why not use struct
iommu_hwpt_amd_guest in the nested_domain ?

> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> +	struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> +	struct nested_domain *ndom = to_ndomain(dom);
> +	struct dev_table_entry *gdte = &ndom->guest_dte;
> +	int ret = 0;
> +
> +	if (dev_data->ndom == ndom)
> +		return ret;
> +
> +	if (!dev_is_pci(dev))
> +		return -EINVAL;

Why?

> +	/* Currently only support GCR3TRPMode with nested translation */
> +	if (!check_feature2(FEATURE_GCR3TRPMODE))
> +		return -EOPNOTSUPP;

This is impossible since we can't allocate a nest parent. If you want
to make a redundent check then call is_nest_parent_supported()

> +	/* We need to check host capability before setting the mode */
> +	if ((FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->data[2]) == GUEST_PGTABLE_5_LEVEL) &&
> +	    (amd_iommu_gpt_level < PAGE_MODE_5_LEVEL))
> +		return -EOPNOTSUPP;

I wonder if this should be done during alloc

> +	WARN_ON(dev_data->ndom);
> +
> +	dev_data->ndom = ndom;

Useless?

> +	mutex_lock(&dev_data->mutex);
> +
> +	/* Update device table entry */
> +	set_dte_nested(iommu, gdte, ndom, dev_data);
> +	amd_iommu_device_flush_dte(dev_data);
> +	amd_iommu_completion_wait(iommu);

Hurray

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD
  2025-10-02 20:05   ` Nicolin Chen
@ 2025-10-06 15:01     ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-06 15:01 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
	kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Thu, Oct 02, 2025 at 01:05:32PM -0700, Nicolin Chen wrote:
> > +/*
> > + * See include/linux/iommufd.h
> > + * struct iommufd_viommu_ops - vIOMMU specific operations
> > + */
> > +const struct iommufd_viommu_ops amd_viommu_ops = {
> > +	.alloc_domain_nested = amd_iommu_alloc_domain_nested,
> > +};
> 
> Unfortunately, a viommu_ops with alloc_domain_nested is incomplete,
> IMHO. If this series gets merged alone, it declares that the kernel
> now supports AMD IOMMU's virtualization, which actually won't work
> without a cache invalidation op (SW) or hw_queue (HW-acceleration).

Yeah, I'd move this patch to a series implementing viommu fully.

I'm pretty sure this series is OK as is without it, but it would be
good to at least share a sketch of the viommu support to be sure
before merging it.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-06 14:59   ` Jason Gunthorpe
@ 2025-10-07 19:22     ` Suthikulpanit, Suravee
  2025-10-07 23:43       ` Jason Gunthorpe
  0 siblings, 1 reply; 42+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-07 19:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez



On 10/6/2025 9:59 AM, Jason Gunthorpe wrote:
> On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
>> ....
>> +	if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
>> +		return;
>> +
>> +	amd_iommu_make_clear_dte(dev_data, &new);
>> +
>> +	new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
>> +	new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
>> +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
>> +	new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);
> 
>> +	if (pdom->dirty_tracking)
>> +		new.data[0] |= DTE_FLAG_HAD;
>> +
>> +	if (dev_data->ats_enabled)
>> +		new.data[1] |= DTE_FLAG_IOTLB;
> 
> This sequence should be in some set_dte_gcr3() ??

Not sure what you mean. This logic was in set_dte_entry(), and 
duplicated here in the set_dte_nested() since we no longer calling 
set_dte_entry() from the nested path. Also, it's not really related to 
GCR3 table.

>> +	/*
>> +	 * Restore cached persistent DTE bits, which can be set by information
>> +	 * in IVRS table. See set_dev_entry_from_acpi().
>> +	 */
>> +	initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
>> +	if (initial_dte) {
>> +		new.data128[0] |= initial_dte->data128[0];
>> +		new.data128[1] |= initial_dte->data128[1];
>> +	}
> 
> This should go into amd_iommu_make_clear_dte() I think, and refactor
> it out of iommu_update_dte256() ?
> Every created DTE needs these bits set, right?

Currently, the amd_iommu_make_clear_dte() clears all the DTE bits and 
set the DTE[V] (valid) bit. This is used when preparing the DTE for 
programming, detaching domain, and when setting the blocking domain. 
Putting this logic in the function would change the behavior.

These bits affect Interrupt remapping (Lint0/Lint1/NMI/INIT/ExtInt 
interrupt pass-through) and System management message behavior. It 
should be okay to set these up for the specified devices in the current 
implementation.

>> +
>> +	/* Guest translation stuff */
>> +	new.data[0] |= (gdte->data[0] &
>> +		       (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
>> +
>> +	/* GCR3 table */
>> +	new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
>> +	new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
>> +
>> +	/* Guest paging mode */
>> +	new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);
> 
> I didn't see anything validating gdte has only permitted set bits in
> the prior patch?

Not sure which on are you referring to.

> If this is goint to decode array item by item then why not use struct
> iommu_hwpt_amd_guest in the nested_domain ?

The struct dev_table_entry *gdte is basically the same information as in 
the struct iommu_hwpt_amd_guest.dte that we copied from the userspace 
into the more appropriate in-kernel data structure type, which is used 
within the driver.

Here, we just select only what we needed for configuring guest page 
table specifically to be programmed onto the host DTE.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-06 14:49   ` Jason Gunthorpe
@ 2025-10-07 20:36     ` Suthikulpanit, Suravee
  2025-10-07 23:39       ` Jason Gunthorpe
  0 siblings, 1 reply; 42+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-07 20:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez



On 10/6/2025 9:49 AM, Jason Gunthorpe wrote:
> On Wed, Oct 01, 2025 at 06:09:52AM +0000, Suravee Suthikulpanit wrote:
> ....
>> +
>> +	ret = iommu_copy_struct_from_user(&gdte, user_data,
>> +					  IOMMU_HWPT_DATA_AMD_GUEST,
>> +					  dte);
>> +	if (ret)
>> +		return ERR_PTR(ret);
>> +
>> +	ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
>> +	if (IS_ERR(ndom))
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	ndom->domain.ops = &nested_domain_ops;
>> +	ndom->domain.type = IOMMU_DOMAIN_NESTED;
>> +	memcpy(&ndom->guest_dte, &gdte, sizeof(struct dev_table_entry));
>> +
>> +	/* Due to possible aliasing issue use per-device nested domain ID */
>> +	ndom->id = amd_iommu_pdom_id_alloc();
> 
> I've forgotten the details, but doesn't the gdet have the virtual
> domain ID in side it? Shouldn't this be going to the viommu struct and
> mapping virtual domain IDs to physical ones so they can be shared if
> the guest says it is safe to share them? I guess that is an
> optimization, but it should have a note here explaining it.

The gDTE[DomainID] field contains guest Domain ID (gDomID). The host 
IOMMU driver uses the gDomId and guest ID (gid) to index the Domain ID 
mapping table, and store the host Domain ID (hDomID) in the table entry. 
This data structure is required by hw to translation gDomID->hDomID to 
virtualize guest invalidation command. This will be part of the upcoming 
series to enable hw-vIOMMU.

This ndom->id is the hDomID, which is currently allocated per-device to 
avoid TLB aliasing i.e. A guest w/ multiple pass-through devices w/ the 
same hDomID (same stage 2 table) and different stage-1 tables with same 
PASID. IOMMU would use the same TLB tag, which results in TLB aliasing 
issue. Therefore, we workaround the issue by allocating per-device 
hDomID for nested domain.

Thanks,
Suravee



Thanks,
Suravee

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-07 20:36     ` Suthikulpanit, Suravee
@ 2025-10-07 23:39       ` Jason Gunthorpe
  2025-10-09  6:22         ` Sairaj Kodilkar
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-07 23:39 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Tue, Oct 07, 2025 at 03:36:58PM -0500, Suthikulpanit, Suravee wrote:
> The gDTE[DomainID] field contains guest Domain ID (gDomID). The host IOMMU
> driver uses the gDomId and guest ID (gid) to index the Domain ID mapping
> table, and store the host Domain ID (hDomID) in the table entry. This data
> structure is required by hw to translation gDomID->hDomID to virtualize
> guest invalidation command. This will be part of the upcoming series to
> enable hw-vIOMMU.

Sure, this translation is part of viommu

> This ndom->id is the hDomID, which is currently allocated per-device to
> avoid TLB aliasing i.e. A guest w/ multiple pass-through devices w/ the same
> hDomID (same stage 2 table) and different stage-1 tables with same PASID.
> IOMMU would use the same TLB tag, which results in TLB aliasing issue.
> Therefore, we workaround the issue by allocating per-device hDomID for
> nested domain.

But this is what I mean here, the gDomId should be 1:1 with the hDomId
and here you are making it 1:N.

It has to be like this or you cannot manage invalidation.

Given this series is not really functional it is OK to leave a little
hack I guess, but it is worth noting how it is supposed to work.

It also probably means we should see the viommu series pretty quickly
with a goal to merge them all together in one cycle.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-07 19:22     ` Suthikulpanit, Suravee
@ 2025-10-07 23:43       ` Jason Gunthorpe
  2025-10-09  7:18         ` Sairaj Kodilkar
  0 siblings, 1 reply; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-07 23:43 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez

On Tue, Oct 07, 2025 at 02:22:30PM -0500, Suthikulpanit, Suravee wrote:
> 
> 
> On 10/6/2025 9:59 AM, Jason Gunthorpe wrote:
> > On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
> > > ....
> > > +	if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
> > > +		return;
> > > +
> > > +	amd_iommu_make_clear_dte(dev_data, &new);
> > > +
> > > +	new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
> > > +	new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
> > > +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> > > +	new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);
> > 
> > > +	if (pdom->dirty_tracking)
> > > +		new.data[0] |= DTE_FLAG_HAD;
> > > +
> > > +	if (dev_data->ats_enabled)
> > > +		new.data[1] |= DTE_FLAG_IOTLB;
> > 
> > This sequence should be in some set_dte_gcr3() ??
> 
> Not sure what you mean. This logic was in set_dte_entry(), and duplicated
> here in the set_dte_nested() since we no longer calling set_dte_entry() from
> the nested path. Also, it's not really related to GCR3 table.

Sorry some make_ste_v1()

This logic should be the same for any v1 page table.
> > > +	/*
> > > +	 * Restore cached persistent DTE bits, which can be set by information
> > > +	 * in IVRS table. See set_dev_entry_from_acpi().
> > > +	 */
> > > +	initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
> > > +	if (initial_dte) {
> > > +		new.data128[0] |= initial_dte->data128[0];
> > > +		new.data128[1] |= initial_dte->data128[1];
> > > +	}
> > 
> > This should go into amd_iommu_make_clear_dte() I think, and refactor
> > it out of iommu_update_dte256() ?
> > Every created DTE needs these bits set, right?
> 
> Currently, the amd_iommu_make_clear_dte() clears all the DTE bits and set
> the DTE[V] (valid) bit. This is used when preparing the DTE for programming,
> detaching domain, and when setting the blocking domain. Putting this logic
> in the function would change the behavior.

Yes, but it seems like that is the right behavior?

> These bits affect Interrupt remapping (Lint0/Lint1/NMI/INIT/ExtInt interrupt
> pass-through) and System management message behavior. 

Because these bits shouldn't be cleared on a blocking domain!
Interrupts are still expected to work.


> > > +
> > > +	/* Guest translation stuff */
> > > +	new.data[0] |= (gdte->data[0] &
> > > +		       (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
> > > +
> > > +	/* GCR3 table */
> > > +	new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
> > > +	new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
> > > +
> > > +	/* Guest paging mode */
> > > +	new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);
> > 
> > I didn't see anything validating gdte has only permitted set bits in
> > the prior patch?
> 
> Not sure which on are you referring to.

You have to validate gdte has only valid bits.

> > If this is goint to decode array item by item then why not use struct
> > iommu_hwpt_amd_guest in the nested_domain ?
> 
> The struct dev_table_entry *gdte is basically the same information as in the
> struct iommu_hwpt_amd_guest.dte that we copied from the userspace into the
> more appropriate in-kernel data structure type, which is used within the
> driver.

The appropriate type is the userspace type if you don't actually ever
use anything unique to the kernel type. You can avoid the special
assert/etc since this way of accessing it is not sensitive to layout.

> Here, we just select only what we needed for configuring guest page table
> specifically to be programmed onto the host DTE.

Everything you don't pick up should be checked to be 0. VMM needs to
filter out unsuopported things or generate a bad DTE error.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 09/12] iommu/amd: Add support for nest parent domain allocation
  2025-10-02 18:00   ` Nicolin Chen
  2025-10-06 14:43     ` Jason Gunthorpe
@ 2025-10-08 14:16     ` Suthikulpanit, Suravee
  1 sibling, 0 replies; 42+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-08 14:16 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
	vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
	jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
	joao.m.martins, alejandro.j.jimenez



On 10/2/2025 1:00 PM, Nicolin Chen wrote:
> On Wed, Oct 01, 2025 at 06:09:51AM +0000, Suravee Suthikulpanit wrote:
>> To support nested translation, the nest parent domain is allocated with
>> IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
>> table for stage 2 (i.e. GPA->SPA).
>>
>> Also, only support nest parent domain on AMD system, which can support
>> the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
>> order to program DTE[GCR3 Table Root Pointer] with the GPA.
>>
>> Signed-off-by: Suravee Suthikulpanit<suravee.suthikulpanit@amd.com>
>> +static inline bool is_nest_parent_supported(u32 flags)
>> +{
>> +	/* Only allow nest parent when these features are supported */
>> +	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
>> +	    (!check_feature(FEATURE_GT) ||
>> +	     !check_feature(FEATURE_GIOSUP) ||
>> +	     !check_feature2(FEATURE_GCR3TRPMODE)))
>> +		return false;
>> +
>> +	return true;
> If the "flags" doesn't have IOMMU_HWPT_ALLOC_NEST_PARENT while one
> of the features is missing, this would return true, indicating the
> HW supports nesting parent, which will be logically wrong although
> it does validate the nest parent flag.
> 
> Following the existing coding style, I think this could be just:
> 
> static inline bool is_nest_parent_supported(void)
> {
> 	/* Only allow nest parent when these features are supported */
> 	return check_feature(FEATURE_GT) && check_feature(FEATURE_GIOSUP) &&
> 	       check_feature2(FEATURE_GCR3TRPMODE);
> }

Ahh, I missed this part.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-07 23:39       ` Jason Gunthorpe
@ 2025-10-09  6:22         ` Sairaj Kodilkar
  2025-10-09  9:16           ` Sairaj Kodilkar
  2025-10-09 14:34           ` Jason Gunthorpe
  0 siblings, 2 replies; 42+ messages in thread
From: Sairaj Kodilkar @ 2025-10-09  6:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Suthikulpanit, Suravee
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez



On 10/8/2025 5:09 AM, Jason Gunthorpe wrote:
> On Tue, Oct 07, 2025 at 03:36:58PM -0500, Suthikulpanit, Suravee wrote:
>> The gDTE[DomainID] field contains guest Domain ID (gDomID). The host IOMMU
>> driver uses the gDomId and guest ID (gid) to index the Domain ID mapping
>> table, and store the host Domain ID (hDomID) in the table entry. This data
>> structure is required by hw to translation gDomID->hDomID to virtualize
>> guest invalidation command. This will be part of the upcoming series to
>> enable hw-vIOMMU.
> Sure, this translation is part of viommu
>
>> This ndom->id is the hDomID, which is currently allocated per-device to
>> avoid TLB aliasing i.e. A guest w/ multiple pass-through devices w/ the same
>> hDomID (same stage 2 table) and different stage-1 tables with same PASID.
>> IOMMU would use the same TLB tag, which results in TLB aliasing issue.
>> Therefore, we workaround the issue by allocating per-device hDomID for
>> nested domain.
> But this is what I mean here, the gDomId should be 1:1 with the hDomId
> and here you are making it 1:N.
Hi Jason,
The guest will only see V2 page table when we are using hardware vIOMMU.
Since IOMMU driver allocates per device domains when it is using V2 page 
table,
the mappings are still N:N and invalidations will work similar to V2 
page table mode in
host.

Thanks
Sairaj
>
> It has to be like this or you cannot manage invalidation.
>
> Given this series is not really functional it is OK to leave a little
> hack I guess, but it is worth noting how it is supposed to work.
>
> It also probably means we should see the viommu series pretty quickly
> with a goal to merge them all together in one cycle.
>
> Jason


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-07 23:43       ` Jason Gunthorpe
@ 2025-10-09  7:18         ` Sairaj Kodilkar
  2025-10-09 14:35           ` Jason Gunthorpe
  0 siblings, 1 reply; 42+ messages in thread
From: Sairaj Kodilkar @ 2025-10-09  7:18 UTC (permalink / raw)
  To: Jason Gunthorpe, Suthikulpanit, Suravee
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez



On 10/8/2025 5:13 AM, Jason Gunthorpe wrote:
> On Tue, Oct 07, 2025 at 02:22:30PM -0500, Suthikulpanit, Suravee wrote:
>>
>> On 10/6/2025 9:59 AM, Jason Gunthorpe wrote:
>>> On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
>>>> ....
>>>> +	if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
>>>> +		return;
>>>> +
>>>> +	amd_iommu_make_clear_dte(dev_data, &new);
>>>> +
>>>> +	new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
>>>> +	new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
>>>> +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
>>>> +	new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);
>>>> +	if (pdom->dirty_tracking)
>>>> +		new.data[0] |= DTE_FLAG_HAD;
>>>> +
>>>> +	if (dev_data->ats_enabled)
>>>> +		new.data[1] |= DTE_FLAG_IOTLB;
>>> This sequence should be in some set_dte_gcr3() ??
>> Not sure what you mean. This logic was in set_dte_entry(), and duplicated
>> here in the set_dte_nested() since we no longer calling set_dte_entry() from
>> the nested path. Also, it's not really related to GCR3 table.
> Sorry some make_ste_v1()
>
> This logic should be the same for any v1 page table.
>>>> +	/*
>>>> +	 * Restore cached persistent DTE bits, which can be set by information
>>>> +	 * in IVRS table. See set_dev_entry_from_acpi().
>>>> +	 */
>>>> +	initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
>>>> +	if (initial_dte) {
>>>> +		new.data128[0] |= initial_dte->data128[0];
>>>> +		new.data128[1] |= initial_dte->data128[1];
>>>> +	}
>>> This should go into amd_iommu_make_clear_dte() I think, and refactor
>>> it out of iommu_update_dte256() ?
>>> Every created DTE needs these bits set, right?
>> Currently, the amd_iommu_make_clear_dte() clears all the DTE bits and set
>> the DTE[V] (valid) bit. This is used when preparing the DTE for programming,
>> detaching domain, and when setting the blocking domain. Putting this logic
>> in the function would change the behavior.
> Yes, but it seems like that is the right behavior?
>
>> These bits affect Interrupt remapping (Lint0/Lint1/NMI/INIT/ExtInt interrupt
>> pass-through) and System management message behavior.
> Because these bits shouldn't be cleared on a blocking domain!
> Interrupts are still expected to work.
>
>
>>>> +
>>>> +	/* Guest translation stuff */
>>>> +	new.data[0] |= (gdte->data[0] &
>>>> +		       (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
>>>> +
>>>> +	/* GCR3 table */
>>>> +	new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
>>>> +	new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
>>>> +
>>>> +	/* Guest paging mode */
>>>> +	new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);
>>> I didn't see anything validating gdte has only permitted set bits in
>>> the prior patch?
>> Not sure which on are you referring to.
> You have to validate gdte has only valid bits.
>
>>> If this is goint to decode array item by item then why not use struct
>>> iommu_hwpt_amd_guest in the nested_domain ?
>> The struct dev_table_entry *gdte is basically the same information as in the
>> struct iommu_hwpt_amd_guest.dte that we copied from the userspace into the
>> more appropriate in-kernel data structure type, which is used within the
>> driver.
> The appropriate type is the userspace type if you don't actually ever
> use anything unique to the kernel type. You can avoid the special
> assert/etc since this way of accessing it is not sensitive to layout.
>
>> Here, we just select only what we needed for configuring guest page table
>> specifically to be programmed onto the host DTE.
> Everything you don't pick up should be checked to be 0. VMM needs to
> filter out unsuopported things or generate a bad DTE error.
An alternative can be to introduce a struct which only contains relevant 
fields.
Something similar to -->

struct iommu_hwpt_amd_guest {
    void *gcr3;
    unsigned char guest_paging_mode;
}

VMM can take care of writing to these fields.

Thanks
Sairaj

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-09  6:22         ` Sairaj Kodilkar
@ 2025-10-09  9:16           ` Sairaj Kodilkar
  2025-10-09 14:34           ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Sairaj Kodilkar @ 2025-10-09  9:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Suthikulpanit, Suravee
  Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
	jsnitsel, vasant.hegde, iommu, santosh.shukla,
	sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
	kpsingh, joao.m.martins, alejandro.j.jimenez



On 10/9/2025 11:52 AM, Sairaj Kodilkar wrote:
>
>
> On 10/8/2025 5:09 AM, Jason Gunthorpe wrote:
>> On Tue, Oct 07, 2025 at 03:36:58PM -0500, Suthikulpanit, Suravee wrote:
>>> The gDTE[DomainID] field contains guest Domain ID (gDomID). The host 
>>> IOMMU
>>> driver uses the gDomId and guest ID (gid) to index the Domain ID 
>>> mapping
>>> table, and store the host Domain ID (hDomID) in the table entry. 
>>> This data
>>> structure is required by hw to translation gDomID->hDomID to virtualize
>>> guest invalidation command. This will be part of the upcoming series to
>>> enable hw-vIOMMU.
>> Sure, this translation is part of viommu
>>
>>> This ndom->id is the hDomID, which is currently allocated per-device to
>>> avoid TLB aliasing i.e. A guest w/ multiple pass-through devices w/ 
>>> the same
>>> hDomID (same stage 2 table) and different stage-1 tables with same 
>>> PASID.
>>> IOMMU would use the same TLB tag, which results in TLB aliasing issue.
>>> Therefore, we workaround the issue by allocating per-device hDomID for
>>> nested domain.
>> But this is what I mean here, the gDomId should be 1:1 with the hDomId
>> and here you are making it 1:N.
> Hi Jason,
> The guest will only see V2 page table when we are using hardware vIOMMU.
> Since IOMMU driver allocates per device domains when it is using V2 
> page table,
> the mappings are still N:N and invalidations will work similar to V2 
> page table mode in
> host.
Small correction --> "mappings are still 1:1" because guest IOMMU driver 
allocates per device domain.

Thanks
sairaj

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation
  2025-10-09  6:22         ` Sairaj Kodilkar
  2025-10-09  9:16           ` Sairaj Kodilkar
@ 2025-10-09 14:34           ` Jason Gunthorpe
  1 sibling, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-09 14:34 UTC (permalink / raw)
  To: Sairaj Kodilkar
  Cc: Suthikulpanit, Suravee, nicolinc, linux-kernel, robin.murphy,
	will, joro, kevin.tian, jsnitsel, vasant.hegde, iommu,
	santosh.shukla, sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw,
	wnliu, gptran, kpsingh, joao.m.martins, alejandro.j.jimenez

On Thu, Oct 09, 2025 at 11:52:23AM +0530, Sairaj Kodilkar wrote:
> 
> 
> On 10/8/2025 5:09 AM, Jason Gunthorpe wrote:
> > On Tue, Oct 07, 2025 at 03:36:58PM -0500, Suthikulpanit, Suravee wrote:
> > > The gDTE[DomainID] field contains guest Domain ID (gDomID). The host IOMMU
> > > driver uses the gDomId and guest ID (gid) to index the Domain ID mapping
> > > table, and store the host Domain ID (hDomID) in the table entry. This data
> > > structure is required by hw to translation gDomID->hDomID to virtualize
> > > guest invalidation command. This will be part of the upcoming series to
> > > enable hw-vIOMMU.
> > Sure, this translation is part of viommu
> > 
> > > This ndom->id is the hDomID, which is currently allocated per-device to
> > > avoid TLB aliasing i.e. A guest w/ multiple pass-through devices w/ the same
> > > hDomID (same stage 2 table) and different stage-1 tables with same PASID.
> > > IOMMU would use the same TLB tag, which results in TLB aliasing issue.
> > > Therefore, we workaround the issue by allocating per-device hDomID for
> > > nested domain.
> > But this is what I mean here, the gDomId should be 1:1 with the hDomId
> > and here you are making it 1:N.
> Hi Jason,
> The guest will only see V2 page table when we are using hardware vIOMMU.

??

This patch is about adding the gDTE support to the driver and the GDTE
is the mechanism for userspace to inform the kernel about the V2 page
table in the guest.

If the idea at this point is to not support V2 page table then have
this function validate the gDTE to exclude it.

> Since IOMMU driver allocates per device domains when it is using V2
> page table, the mappings are still N:N and invalidations will work
> similar to V2 page table mode in host.

I don't see how this can work. Invalidations will be pushed by the
guest kernel directly to the HW invalidation queue using the
gDOMID. That must translate to a single hDOMID to invalidate the right
stuff.

If there is a hDOMID per device it cannot work unless the guest is
also making per-device IDs.

But we can't make this assumption in the viommu code.

So you must not do this, the gDOMID must be mapped to exactly one
hDOMID, and the viommu object should be managing this. When attaching
a gDTE the kernel should validate that the gDOMID maps to a hDOMID
that has the same V1 page table.

Jason


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach
  2025-10-09  7:18         ` Sairaj Kodilkar
@ 2025-10-09 14:35           ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2025-10-09 14:35 UTC (permalink / raw)
  To: Sairaj Kodilkar
  Cc: Suthikulpanit, Suravee, nicolinc, linux-kernel, robin.murphy,
	will, joro, kevin.tian, jsnitsel, vasant.hegde, iommu,
	santosh.shukla, sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw,
	wnliu, gptran, kpsingh, joao.m.martins, alejandro.j.jimenez

On Thu, Oct 09, 2025 at 12:48:27PM +0530, Sairaj Kodilkar wrote:
> > > Here, we just select only what we needed for configuring guest page table
> > > specifically to be programmed onto the host DTE.
> > Everything you don't pick up should be checked to be 0. VMM needs to
> > filter out unsuopported things or generate a bad DTE error.
> An alternative can be to introduce a struct which only contains relevant
> fields.

We don't want this as a uAPI, use the normal DTE and have the kernel
check the things it currently support. Future kernels can support new
things through the same ABI.

VMM is reponsible to 0 out things the kernel shouldn't see or it
handles on its own.

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2025-10-09 14:36 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-01  6:09 [PATCH v2 00/12] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-01  6:09 ` [PATCH v2 01/12] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2025-10-02 17:12   ` Nicolin Chen
2025-10-06 14:36   ` Jason Gunthorpe
2025-10-01  6:09 ` [PATCH v2 02/12] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
2025-10-02 17:12   ` Nicolin Chen
2025-10-01  6:09 ` [PATCH v2 03/12] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
2025-10-02 17:13   ` Nicolin Chen
2025-10-06 14:37   ` Jason Gunthorpe
2025-10-01  6:09 ` [PATCH v2 04/12] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
2025-10-02 17:14   ` Nicolin Chen
2025-10-06 14:37   ` Jason Gunthorpe
2025-10-01  6:09 ` [PATCH v2 05/12] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
2025-10-02 17:15   ` Nicolin Chen
2025-10-01  6:09 ` [PATCH v2 06/12] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
2025-10-02 17:17   ` Nicolin Chen
2025-10-01  6:09 ` [PATCH v2 07/12] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
2025-10-02 17:24   ` Nicolin Chen
2025-10-01  6:09 ` [PATCH v2 08/12] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
2025-10-02 17:31   ` Nicolin Chen
2025-10-01  6:09 ` [PATCH v2 09/12] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
2025-10-02 18:00   ` Nicolin Chen
2025-10-06 14:43     ` Jason Gunthorpe
2025-10-08 14:16     ` Suthikulpanit, Suravee
2025-10-01  6:09 ` [PATCH v2 10/12] iommu/amd: Add support for nested " Suravee Suthikulpanit
2025-10-02 18:29   ` Nicolin Chen
2025-10-06 14:49   ` Jason Gunthorpe
2025-10-07 20:36     ` Suthikulpanit, Suravee
2025-10-07 23:39       ` Jason Gunthorpe
2025-10-09  6:22         ` Sairaj Kodilkar
2025-10-09  9:16           ` Sairaj Kodilkar
2025-10-09 14:34           ` Jason Gunthorpe
2025-10-01  6:09 ` [PATCH v2 11/12] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-10-02 19:04   ` Nicolin Chen
2025-10-06 14:59   ` Jason Gunthorpe
2025-10-07 19:22     ` Suthikulpanit, Suravee
2025-10-07 23:43       ` Jason Gunthorpe
2025-10-09  7:18         ` Sairaj Kodilkar
2025-10-09 14:35           ` Jason Gunthorpe
2025-10-01  6:09 ` [PATCH v2 12/12] iommu/amd: Introduce IOMMUFD vIOMMU support for AMD Suravee Suthikulpanit
2025-10-02 20:05   ` Nicolin Chen
2025-10-06 15:01     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox