* [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support
@ 2025-10-21 1:43 Suravee Suthikulpanit
2025-10-21 1:43 ` [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
` (15 more replies)
0 siblings, 16 replies; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This series introduces support for AMD IOMMU nested page table translation
with the host (v1) and guest (v2) page tables.
In this mode, the AMD IOMMU driver configures the Device Table Entry (DTE)
with host page table root pointer, which is configured by allocating domain
with page table type IOMMU_HWPT_ALLOC_NEST_PARENT.
The guest page tables and Guest CR3 (GCR3) tables are managed by Guest OS,
and stored in the guest DTE (gDTE) in guest memory. VMM is responsible for
passing gDTE information to the host IOMMU driver using struct
iommu_hwpt_amd_guest when allocating a domain type IOMMU_DOMAIN_NESTED.
Then, the gDTE is parsed and program onto host DTE by the AMD IOMMU driver.
In addition, this series introduces base code for IOMMUFD vIOMMU for AMD
IOMMU, and implements vIOMMU-based nested domain allocation interface.
The struct nested_domain to store nested domain information, and
set_dte_nested() helper function to handle DTE programing for the nested
domain.
The series is separated into two parts:
* Patch 1-7 are preparatory patches.
* Patch 8-16 implement nest-parent and nested domains support
for IOMMUFD vIOMMU.
Note: This series is rebased on top of:
* [PATCH v5] iommu/amd: Add support for hw_info for iommu capability query
https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/T/#u
Changes from V3:
(https://lore.kernel.org/linux-iommu/20251009235755.4497-1-suravee.suthikulpanit@amd.com)
* (New) Patch 11 introduces struct amd_iommu_viommu and amd_iommufd_viommu_init(),
which contain minimal code to store reference to nest parent domain for
nested domain attachment. (Per Jason / Nicolin)
* (New) Patch 13 refactor hDomID-related logic, and invalidate host (S2) page
table with host domain IDs stored in an xarray. (Per Jason)
* Patch 15: Clean up logic in set_dte_entry().
Thanks,
Suravee
Suravee Suthikulpanit (16):
iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
iommu/amd: Make amd_iommu_pdom_id_free() non-static
iommu/amd: Make amd_iommu_device_flush_dte() non-static
iommu/amd: Make amd_iommu_update_dte256() non-static
iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
iommu/amd: Make amd_iommu_completion_wait() non-static
iommufd: Introduce data struct for AMD nested domain allocation
iommu/amd: Always enable GCR3TRPMode when supported.
iommu/amd: Add support for nest parent domain allocation
iommu/amd: Introduce struct amd_iommu_viommu
iommu/amd: Add support for nested domain allocation
iommu/amd: Track host Domain ID mapping for each guest Domain ID
iommu/amd: Refactor persistent DTE bits programming into
amd_iommu_make_clear_dte()
iommu/amd: Refactor logic to program the host page table in DTE
iommu/amd: Add support for nested domain attach/detach
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 36 +++++
drivers/iommu/amd/amd_iommu_types.h | 28 +++-
drivers/iommu/amd/init.c | 3 +
drivers/iommu/amd/iommu.c | 200 ++++++++++++++----------
drivers/iommu/amd/iommufd.c | 17 ++
drivers/iommu/amd/iommufd.h | 5 +
drivers/iommu/amd/nested.c | 233 ++++++++++++++++++++++++++++
include/uapi/linux/iommufd.h | 11 ++
9 files changed, 453 insertions(+), 82 deletions(-)
create mode 100644 drivers/iommu/amd/nested.c
--
2.34.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:25 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 02/16] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
` (14 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Also change the define to use GENMASK_ULL instead.
There is no functional change.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index a698a2e7ce2a..556f1df32d53 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -422,7 +422,7 @@
#define DTE_FLAG_IOTLB BIT_ULL(32)
#define DTE_FLAG_MASK (0x3ffULL << 32)
-#define DEV_DOMID_MASK 0xffffULL
+#define DTE_DOMID_MASK GENMASK_ULL(15, 0)
#define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
#define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b57a6993179d..a9b17d31a969 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2094,7 +2094,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
if (dev_data->ats_enabled)
new.data[1] |= DTE_FLAG_IOTLB;
- old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+ old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 02/16] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-21 1:43 ` [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:24 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 03/16] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
` (13 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 9b4b589a54b5..6ea549816a1f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -26,6 +26,7 @@ void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
+int amd_iommu_pdom_id_alloc(void);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a9b17d31a969..78b3e5485006 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1818,7 +1818,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
*
****************************************************************************/
-static int pdom_id_alloc(void)
+int amd_iommu_pdom_id_alloc(void)
{
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
@@ -1906,7 +1906,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
return -EBUSY;
/* Allocate per device domain ID */
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0)
return -ENOSPC;
gcr3_info->domid = domid;
@@ -2489,7 +2489,7 @@ struct protection_domain *protection_domain_alloc(void)
if (!domain)
return NULL;
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0) {
kfree(domain);
return NULL;
@@ -2681,7 +2681,7 @@ void amd_iommu_init_identity_domain(void)
domain->ops = &identity_domain_ops;
domain->owner = &amd_iommu_ops;
- identity_domain.id = pdom_id_alloc();
+ identity_domain.id = amd_iommu_pdom_id_alloc();
protection_domain_init(&identity_domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 03/16] iommu/amd: Make amd_iommu_pdom_id_free() non-static
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-21 1:43 ` [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2025-10-21 1:43 ` [PATCH v4 02/16] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:26 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 04/16] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
` (12 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 10 +++++-----
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 6ea549816a1f..322c8c73444a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -27,6 +27,7 @@ void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
int amd_iommu_pdom_id_alloc(void);
+void amd_iommu_pdom_id_free(int id);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 78b3e5485006..0b61059e485d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1823,7 +1823,7 @@ int amd_iommu_pdom_id_alloc(void)
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
-static void pdom_id_free(int id)
+void amd_iommu_pdom_id_free(int id)
{
ida_free(&pdom_ids, id);
}
@@ -1870,7 +1870,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
gcr3_info->glx = 0;
/* Free per device domain ID */
- pdom_id_free(gcr3_info->domid);
+ amd_iommu_pdom_id_free(gcr3_info->domid);
iommu_free_pages(gcr3_info->gcr3_tbl);
gcr3_info->gcr3_tbl = NULL;
@@ -1913,7 +1913,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
gcr3_info->gcr3_tbl = iommu_alloc_pages_node_sz(nid, GFP_ATOMIC, SZ_4K);
if (gcr3_info->gcr3_tbl == NULL) {
- pdom_id_free(domid);
+ amd_iommu_pdom_id_free(domid);
return -ENOMEM;
}
@@ -2573,7 +2573,7 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
domain->pd_mode = pgtable;
ret = pdom_setup_pgtable(domain, dev);
if (ret) {
- pdom_id_free(domain->id);
+ amd_iommu_pdom_id_free(domain->id);
kfree(domain);
return ERR_PTR(ret);
}
@@ -2631,7 +2631,7 @@ void amd_iommu_domain_free(struct iommu_domain *dom)
WARN_ON(!list_empty(&domain->dev_list));
if (domain->domain.type & __IOMMU_DOMAIN_PAGING)
free_io_pgtable_ops(&domain->iop.pgtbl.ops);
- pdom_id_free(domain->id);
+ amd_iommu_pdom_id_free(domain->id);
kfree(domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 04/16] iommu/amd: Make amd_iommu_device_flush_dte() non-static
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (2 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 03/16] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:27 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 05/16] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
` (11 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 3 +++
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 322c8c73444a..079fb1d44c00 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -188,4 +188,7 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+/* DTE */
+int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
+
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0b61059e485d..fad74d2bc1b1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1562,7 +1562,7 @@ static int device_flush_dte_alias(struct pci_dev *pdev, u16 alias, void *data)
/*
* Command send function for invalidating a device table entry
*/
-static int device_flush_dte(struct iommu_dev_data *dev_data)
+int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data)
{
struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
struct pci_dev *pdev = NULL;
@@ -1788,7 +1788,7 @@ void amd_iommu_update_and_flush_device_table(struct protection_domain *domain)
}
list_for_each_entry(dev_data, &domain->dev_list, list)
- device_flush_dte(dev_data);
+ amd_iommu_device_flush_dte(dev_data);
domain_flush_complete(domain);
}
@@ -2144,7 +2144,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
clear_dte_entry(iommu, dev_data);
clone_aliases(iommu, dev_data->dev);
- device_flush_dte(dev_data);
+ amd_iommu_device_flush_dte(dev_data);
iommu_completion_wait(iommu);
}
@@ -2874,7 +2874,7 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
spin_unlock(&dev_data->dte_lock);
/* Flush device DTE */
- device_flush_dte(dev_data);
+ amd_iommu_device_flush_dte(dev_data);
domain_flush = true;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 05/16] iommu/amd: Make amd_iommu_update_dte256() non-static
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (3 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 04/16] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:27 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 06/16] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
` (10 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 3 +++
drivers/iommu/amd/iommu.c | 11 ++++++-----
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 079fb1d44c00..eb46e8914eaf 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,5 +190,8 @@ struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
/* DTE */
int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
+void amd_iommu_update_dte256(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index fad74d2bc1b1..3f2c61509b60 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -132,8 +132,9 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
* This function is used only by code, which updates DMA translation part of the DTE.
* So, only consider control bits related to DMA when updating the entry.
*/
-static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
- struct dev_table_entry *new)
+void amd_iommu_update_dte256(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
{
unsigned long flags;
struct dev_table_entry *dev_table = get_dev_table(iommu);
@@ -413,7 +414,7 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
ret = -EINVAL;
goto out;
}
- update_dte256(iommu, alias_data, &new);
+ amd_iommu_update_dte256(iommu, alias_data, &new);
amd_iommu_set_rlookup_table(iommu, alias);
out:
@@ -2109,7 +2110,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
set_dte_gcr3_table(iommu, dev_data, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
@@ -2130,7 +2131,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
make_clear_dte(dev_data, dte, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte256(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 06/16] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (4 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 05/16] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:27 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 07/16] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
` (9 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Also, remove unused function parameter ptr.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 7 +++++++
drivers/iommu/amd/iommu.c | 13 ++-----------
2 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index eb46e8914eaf..c7cb4a80d44a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -193,5 +193,12 @@ int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
void amd_iommu_update_dte256(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *new);
+static inline void
+amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
+{
+ /* All existing DTE must have V bit set */
+ new->data128[0] = DTE_FLAG_V;
+ new->data128[1] = 0;
+}
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3f2c61509b60..386ac96b2c02 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2008,14 +2008,6 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
-static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
- struct dev_table_entry *new)
-{
- /* All existing DTE must have V bit set */
- new->data128[0] = DTE_FLAG_V;
- new->data128[1] = 0;
-}
-
/*
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
@@ -2068,7 +2060,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
else
domid = domain->id;
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
if (domain->iop.mode != PAGE_MODE_NONE)
new.data[0] |= iommu_virt_to_phys(domain->iop.root);
@@ -2128,9 +2120,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
struct dev_table_entry new = {};
- struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
amd_iommu_update_dte256(iommu, dev_data, &new);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 07/16] iommu/amd: Make amd_iommu_completion_wait() non-static
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (5 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 06/16] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:32 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 08/16] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
` (8 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 24 ++++++++++++------------
2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index c7cb4a80d44a..d533bb8851ea 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -187,6 +187,7 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
u64 *root, int mode);
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+int amd_iommu_completion_wait(struct amd_iommu *iommu);
/* DTE */
int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 386ac96b2c02..e0bfcda678a8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1392,7 +1392,7 @@ static int iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd)
* This function queues a completion wait command into the command
* buffer of an IOMMU
*/
-static int iommu_completion_wait(struct amd_iommu *iommu)
+int amd_iommu_completion_wait(struct amd_iommu *iommu)
{
struct iommu_cmd cmd;
unsigned long flags;
@@ -1431,7 +1431,7 @@ static void domain_flush_complete(struct protection_domain *domain)
* We need to wait for completion of all commands.
*/
xa_for_each(&domain->iommu_array, i, pdom_iommu_info)
- iommu_completion_wait(pdom_iommu_info->iommu);
+ amd_iommu_completion_wait(pdom_iommu_info->iommu);
}
static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
@@ -1449,7 +1449,7 @@ static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
ret = iommu_flush_dte(iommu, devid);
if (!ret)
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
@@ -1460,7 +1460,7 @@ static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
for (devid = 0; devid <= last_bdf; ++devid)
iommu_flush_dte(iommu, devid);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
/*
@@ -1479,7 +1479,7 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
iommu_queue_command(iommu, &cmd);
}
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
@@ -1490,7 +1490,7 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
dom_id, IOMMU_NO_PASID, false);
iommu_queue_command(iommu, &cmd);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void amd_iommu_flush_all(struct amd_iommu *iommu)
@@ -1500,7 +1500,7 @@ static void amd_iommu_flush_all(struct amd_iommu *iommu)
build_inv_all(&cmd);
iommu_queue_command(iommu, &cmd);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void iommu_flush_irt(struct amd_iommu *iommu, u16 devid)
@@ -1523,7 +1523,7 @@ static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
for (devid = 0; devid <= last_bdf; devid++)
iommu_flush_irt(iommu, devid);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
@@ -1748,7 +1748,7 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
if (dev_data->ats_enabled)
device_flush_iotlb(dev_data, address, size, pasid, true);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void dev_flush_pasid_all(struct iommu_dev_data *dev_data,
@@ -2137,7 +2137,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
clone_aliases(iommu, dev_data->dev);
amd_iommu_device_flush_dte(dev_data);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
/*
@@ -2421,7 +2421,7 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
out_err:
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
if (FEATURE_NUM_INT_REMAP_SUP_2K(amd_iommu_efr2))
dev_data->max_irqs = MAX_IRQS_PER_TABLE_2K;
@@ -3255,7 +3255,7 @@ static struct irq_remap_table *alloc_irq_table(struct amd_iommu *iommu,
set_remap_table_entry(iommu, alias, table);
out_wait:
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
out_unlock:
spin_unlock_irqrestore(&iommu_table_lock, flags);
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 08/16] iommufd: Introduce data struct for AMD nested domain allocation
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (6 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 07/16] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-11-08 17:30 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
` (7 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
include/uapi/linux/iommufd.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index efb52709c0a2..d111ee1dc572 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
__aligned_le64 ste[2];
};
+/**
+ * struct iommu_hwpt_amd_guest - AMD IOMMU guest I/O page table data
+ * (IOMMU_HWPT_DATA_AMD_GUEST)
+ * @dte: Guest Device Table Entry (DTE)
+ */
+struct iommu_hwpt_amd_guest {
+ __aligned_u64 dte[4];
+};
+
/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
* @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
+ * @IOMMU_HWPT_DATA_AMD_GUEST: AMD IOMMU guest page table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE = 0,
IOMMU_HWPT_DATA_VTD_S1 = 1,
IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
+ IOMMU_HWPT_DATA_AMD_GUEST = 3,
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported.
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (7 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 08/16] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-23 2:24 ` Nicolin Chen
2025-11-08 17:39 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
` (6 subsequent siblings)
15 siblings, 2 replies; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
with GPA (instead of SPA). This simplifies the implementation, and is
a pre-requisite for nested translation support.
Therefore, always enable this feature if available.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/init.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 556f1df32d53..9226edd8af69 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -185,6 +185,7 @@
#define CONTROL_EPH_EN 45
#define CONTROL_XT_EN 50
#define CONTROL_INTCAPXT_EN 51
+#define CONTROL_GCR3TRPMODE 58
#define CONTROL_IRTCACHEDIS 59
#define CONTROL_SNPAVIC_EN 61
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f2991c11867c..c45a4bd89569 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1122,6 +1122,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
return;
iommu_feature_enable(iommu, CONTROL_GT_EN);
+
+ if (check_feature2(FEATURE_GCR3TRPMODE))
+ iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
}
/* sets a specific bit in the device table entry. */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (8 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-23 2:27 ` Nicolin Chen
2025-10-21 1:43 ` [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
` (5 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To support nested translation, the nest parent domain is allocated with
IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
table for stage 2 (i.e. GPA->SPA).
Also, only support nest parent domain on AMD system, which can support
the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
order to program DTE[GCR3 Table Root Pointer] with the GPA.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/iommu.c | 26 +++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 9226edd8af69..c34604cf1811 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -107,6 +107,7 @@
/* Extended Feature 2 Bits */
+#define FEATURE_GCR3TRPMODE BIT_ULL(3)
#define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
#define FEATURE_SNPAVICSUP_GAM(x) \
(FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e0bfcda678a8..e489e360bb77 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2584,6 +2584,14 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
return &domain->domain;
}
+static inline bool is_nest_parent_supported(u32 flags)
+{
+ /* Only allow nest parent when these features are supported */
+ return check_feature(FEATURE_GT) &&
+ check_feature(FEATURE_GIOSUP) &&
+ check_feature2(FEATURE_GCR3TRPMODE);
+}
+
static struct iommu_domain *
amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
@@ -2591,16 +2599,28 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
{
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
- IOMMU_HWPT_ALLOC_PASID;
+ IOMMU_HWPT_ALLOC_PASID |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
if ((flags & ~supported_flags) || user_data)
return ERR_PTR(-EOPNOTSUPP);
switch (flags & supported_flags) {
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
- /* Allocate domain with v1 page table for dirty tracking */
- if (!amd_iommu_hd_support(iommu))
+ case IOMMU_HWPT_ALLOC_NEST_PARENT:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+ /*
+ * Allocate domain with v1 page table for dirty tracking
+ * and/or Nest parent.
+ */
+ if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+ !amd_iommu_hd_support(iommu))
+ break;
+
+ if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+ !is_nest_parent_supported(flags))
break;
+
return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
case IOMMU_HWPT_ALLOC_PASID:
/* Allocate domain with v2 page table if IOMMU supports PASID. */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (9 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-22 20:00 ` Jason Gunthorpe
2025-10-23 2:33 ` Nicolin Chen
2025-10-21 1:43 ` [PATCH v4 12/16] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
` (4 subsequent siblings)
15 siblings, 2 replies; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Which stores reference to nested parent domain assigned during the call to
struct iommu_ops.viommu_init(). Information in the nest parent is needed
when setting up the nested translation.
Note that the viommu initialization will be introduced in subsequent
commit.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 6 ++++++
drivers/iommu/amd/iommu.c | 4 +++-
drivers/iommu/amd/iommufd.c | 17 +++++++++++++++++
drivers/iommu/amd/iommufd.h | 5 +++++
4 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index c34604cf1811..a0c7e7329233 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -17,6 +17,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/pci.h>
+#include <linux/iommufd.h>
#include <linux/irqreturn.h>
#include <linux/io-pgtable.h>
@@ -586,6 +587,11 @@ struct pdom_iommu_info {
u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */
};
+struct amd_iommu_viommu {
+ struct iommufd_viommu core;
+ struct protection_domain *parent; /* nest parent domain for this viommu */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e489e360bb77..c4ff18adcf03 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3077,7 +3077,9 @@ const struct iommu_ops amd_iommu_ops = {
.iotlb_sync = amd_iommu_iotlb_sync,
.free = amd_iommu_domain_free,
.enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
- }
+ },
+ .get_viommu_size = amd_iommufd_get_viommu_size,
+ .viommu_init = amd_iommufd_viommu_init,
};
#ifdef CONFIG_IRQ_REMAP
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
index 72eaaa923d04..5ba50799a6fa 100644
--- a/drivers/iommu/amd/iommufd.c
+++ b/drivers/iommu/amd/iommufd.c
@@ -29,3 +29,20 @@ void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
return hwinfo;
}
+
+size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type)
+{
+ return VIOMMU_STRUCT_SIZE(struct amd_iommu_viommu, core);
+}
+
+int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
+ const struct iommu_user_data *user_data)
+{
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+
+ /*
+ */
+ aviommu->parent = to_pdomain(parent);
+
+ return 0;
+}
diff --git a/drivers/iommu/amd/iommufd.h b/drivers/iommu/amd/iommufd.h
index f880be80a30d..f05aad495b5b 100644
--- a/drivers/iommu/amd/iommufd.h
+++ b/drivers/iommu/amd/iommufd.h
@@ -8,8 +8,13 @@
#if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type);
+size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type);
+int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
+ const struct iommu_user_data *user_data);
#else
#define amd_iommufd_hw_info NULL
+#define amd_iommufd_viommu_init NULL
+#define amd_iommufd_get_viommu_size NULL
#endif /* CONFIG_AMD_IOMMU_IOMMUFD */
#endif /* AMD_IOMMUFD_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 12/16] iommu/amd: Add support for nested domain allocation
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (10 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-22 20:01 ` Jason Gunthorpe
2025-10-21 1:43 ` [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID Suravee Suthikulpanit
` (3 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest page tables. The struct iommu_hwpt_amd_guest
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 4 +
drivers/iommu/amd/amd_iommu_types.h | 14 ++++
drivers/iommu/amd/nested.c | 111 ++++++++++++++++++++++++++++
4 files changed, 130 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/amd/nested.c
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 5ae46d99a45b..afa12ca2110e 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o
-obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o
+obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o nested.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index d533bb8851ea..3730d8bbe6dc 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -202,4 +202,8 @@ amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry
new->data128[1] = 0;
}
+/* NESTED */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index a0c7e7329233..e0f0cd3d34f2 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -21,6 +21,8 @@
#include <linux/irqreturn.h>
#include <linux/io-pgtable.h>
+#include <uapi/linux/iommufd.h>
+
/*
* Maximum number of IOMMUs supported
*/
@@ -417,6 +419,8 @@
#define DTE_FLAG_V BIT_ULL(0)
#define DTE_FLAG_TV BIT_ULL(1)
#define DTE_FLAG_HAD (3ULL << 7)
+#define DTE_MODE_MASK GENMASK_ULL(11, 9)
+#define DTE_HOST_TRP GENMASK_ULL(51, 12)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
@@ -592,6 +596,16 @@ struct amd_iommu_viommu {
struct protection_domain *parent; /* nest parent domain for this viommu */
};
+/*
+ * Nested domain is specifically used for nested translation
+ */
+struct nested_domain {
+ struct iommu_domain domain; /* generic domain handle used by iommu core code */
+ u16 gdom_id; /* domain ID from gDTE */
+ struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
+ struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
new file mode 100644
index 000000000000..e7b6f69a9d0c
--- /dev/null
+++ b/drivers/iommu/amd/nested.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#define dev_fmt(fmt) "AMD-Vi: " fmt
+
+#include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
+
+#include "amd_iommu.h"
+
+static const struct iommu_domain_ops nested_domain_ops;
+
+static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct nested_domain, domain);
+}
+
+/*
+ * Validate guest DTE to make sure that configuration for host (v1)
+ * and guest (v2) page tables are valid when allocating nested domain.
+ */
+static int validate_gdte_nested(struct iommu_hwpt_amd_guest *gdte)
+{
+ u32 gpt_level = FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->dte[2]);
+
+ /* Must be zero: Mode, Host-TPR */
+ if (FIELD_GET(DTE_MODE_MASK, gdte->dte[0]) != 0 ||
+ FIELD_GET(DTE_HOST_TRP, gdte->dte[0]) != 0)
+ return -EINVAL;
+
+ /* Must be non-zero: V, GIOV, GV, GCR3 TRP */
+ if (FIELD_GET(DTE_FLAG_V, gdte->dte[0]) == 0 ||
+ FIELD_GET(DTE_FLAG_GIOV, gdte->dte[0]) == 0 ||
+ FIELD_GET(DTE_FLAG_GV, gdte->dte[0]) == 0 ||
+ (FIELD_GET(DTE_GCR3_14_12, gdte->dte[0]) == 0 &&
+ FIELD_GET(DTE_GCR3_30_15, gdte->dte[1]) == 0 &&
+ FIELD_GET(DTE_GCR3_51_31, gdte->dte[1]) == 0))
+ return -EINVAL;
+
+ /* Valid Guest Paging Mode values are 0 and 1 */
+ if (gpt_level != 0 && gpt_level != 1)
+ return -EINVAL;
+
+ /* GLX = 3 is reserved */
+ if (FIELD_GET(DTE_GLX, gdte->dte[0]) == 3)
+ return -EINVAL;
+
+ /*
+ * We need to check host capability before setting
+ * the Guest Paging Mode
+ */
+ if (gpt_level == GUEST_PGTABLE_5_LEVEL &&
+ amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+/*
+ * This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
+ * during the call to struct iommu_ops.viommu_init().
+ */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ int ret;
+ struct nested_domain *ndom;
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+
+ if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
+ if (!ndom)
+ return ERR_PTR(-ENOMEM);
+
+ ret = iommu_copy_struct_from_user(&ndom->gdte, user_data,
+ IOMMU_HWPT_DATA_AMD_GUEST,
+ dte);
+ if (ret)
+ goto out_err;
+
+ ret = validate_gdte_nested(&ndom->gdte);
+ if (ret)
+ goto out_err;
+
+ ndom->gdom_id = FIELD_GET(DTE_DOMID_MASK, ndom->gdte.dte[1]);
+ ndom->domain.ops = &nested_domain_ops;
+ ndom->domain.type = IOMMU_DOMAIN_NESTED;
+ ndom->viommu = aviommu;
+
+ return &ndom->domain;
+out_err:
+ kfree(ndom);
+ return ERR_PTR(ret);
+}
+
+static void nested_domain_free(struct iommu_domain *dom)
+{
+ struct nested_domain *ndom = to_ndomain(dom);
+
+ kfree(ndom);
+}
+
+static const struct iommu_domain_ops nested_domain_ops = {
+ .free = nested_domain_free,
+};
+
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (11 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 12/16] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-22 20:08 ` Jason Gunthorpe
2025-10-21 1:43 ` [PATCH v4 14/16] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
` (2 subsequent siblings)
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Each nested domain is assigned guest domain ID (gDomID), which guest OS
programs into guest Device Table Entry (gDTE). For each gDomID, the driver
assigns a corresponding host domain ID (hDomID), which will be programmed
into the host Device Table Entry (hDTE).
The gDTE to hDTE 1:1 mapping is stored in the nest parent domain using
an xarray (struct protection_domain.gdomid_array). When invalidate the
nest parent domain, the INVALIDATE_IOMMU_PAGES must be issued for each
hDomID in the gdomid_array.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 3 ++
drivers/iommu/amd/iommu.c | 30 ++++++++++++++++
drivers/iommu/amd/nested.c | 53 ++++++++++++++++++++++++++++-
3 files changed, 85 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index e0f0cd3d34f2..09952b7a256d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -602,6 +602,7 @@ struct amd_iommu_viommu {
struct nested_domain {
struct iommu_domain domain; /* generic domain handle used by iommu core code */
u16 gdom_id; /* domain ID from gDTE */
+ u16 hdom_id; /* domain ID written to the device table */
struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
};
@@ -623,6 +624,8 @@ struct protection_domain {
struct mmu_notifier mn; /* mmu notifier for the SVA domain */
struct list_head dev_data_list; /* List of pdom_dev_data */
+
+ struct xarray gdomid_array; /* gDomID mapping to this nest parent domain */
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c4ff18adcf03..e4db6f304599 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1493,6 +1493,23 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
amd_iommu_completion_wait(iommu);
}
+static int iommu_flush_hdom_ids(struct amd_iommu *iommu,
+ u64 address, size_t size,
+ struct protection_domain *parent)
+{
+ int ret = 0;
+ unsigned long i;
+ struct iommu_cmd cmd;
+ struct nested_domain *ndom;
+
+ xa_for_each(&parent->gdomid_array, i, ndom) {
+ build_inv_iommu_pages(&cmd, address, size, ndom->hdom_id,
+ IOMMU_NO_PASID, false);
+ ret |= iommu_queue_command(iommu, &cmd);
+ }
+ return ret;
+}
+
static void amd_iommu_flush_all(struct amd_iommu *iommu)
{
struct iommu_cmd cmd;
@@ -1639,6 +1656,18 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
* We need a TLB flush
*/
ret |= iommu_queue_command(pdom_iommu_info->iommu, &cmd);
+
+ /*
+ * A domain w/ v1 table can be a nest parent, which can have
+ * multiple nested domains. Each nested domain has 1:1 mapping
+ * between gDomID and hDomID, which is stored in the gdomid_array.
+ * Therefore, we also need to flush every hDomID associated to this
+ * nest parent domain.
+ *
+ * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
+ */
+ ret |= iommu_flush_hdom_ids(pdom_iommu_info->iommu, address,
+ size, pdom);
}
return ret;
@@ -2470,6 +2499,7 @@ static void protection_domain_init(struct protection_domain *domain)
INIT_LIST_HEAD(&domain->dev_list);
INIT_LIST_HEAD(&domain->dev_data_list);
xa_init(&domain->iommu_array);
+ xa_init(&domain->gdomid_array);
}
struct protection_domain *protection_domain_alloc(void)
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index e7b6f69a9d0c..383a3c7b4a91 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -67,7 +67,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
{
int ret;
- struct nested_domain *ndom;
+ struct nested_domain *ndom, *curr;
struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
@@ -92,6 +92,49 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
ndom->domain.type = IOMMU_DOMAIN_NESTED;
ndom->viommu = aviommu;
+ /*
+ * Normally, when a guest has multiple pass-through devices,
+ * the IOMMU driver setup DTEs with the same stage-2 table and
+ * use the same host domain ID (hDomId). In case of nested translation,
+ * if the guest setup different stage-1 tables with same PASID,
+ * IOMMU would use the same TLB tag. This will results in TLB
+ * aliasing issue.
+ *
+ * The guest is assigning gDomIDs based on its own algorithm for managing
+ * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
+ * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
+ * to a single hDomID. This is done using an xarray in the nest parent domain to
+ * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
+ * command must be issued for each hDomID in the xarray.
+ *
+ * Since there is no invalidation support and no viommu yet, just always use a
+ * unique hDomID for now.
+ */
+ curr = xa_cmpxchg(&aviommu->parent->gdomid_array,
+ ndom->gdom_id, NULL, ndom, GFP_ATOMIC);
+ if (curr) {
+ if (xa_err(curr)) {
+ ret = -EINVAL;
+ } else {
+ /* The gDomID already exist */
+ pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, ndom->hdom_id);
+ ret = -EBUSY;
+ }
+ goto out_err;
+ }
+
+ ndom->hdom_id = amd_iommu_pdom_id_alloc();
+ if (ndom->hdom_id <= 0) {
+ xa_cmpxchg(&aviommu->parent->gdomid_array,
+ ndom->gdom_id, ndom, NULL, GFP_ATOMIC);
+ ret = -ENOSPC;
+ goto out_err;
+ }
+
+ pr_debug("%s: Allocate gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, ndom->hdom_id);
+
return &ndom->domain;
out_err:
kfree(ndom);
@@ -101,7 +144,15 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
static void nested_domain_free(struct iommu_domain *dom)
{
struct nested_domain *ndom = to_ndomain(dom);
+ struct amd_iommu_viommu *aviommu = ndom->viommu;
+
+ pr_debug("%s: Free gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, ndom->hdom_id);
+
+ xa_cmpxchg(&aviommu->parent->gdomid_array,
+ ndom->gdom_id, ndom, NULL, GFP_ATOMIC);
+ amd_iommu_pdom_id_free(ndom->hdom_id);
kfree(ndom);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 14/16] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (12 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-23 2:49 ` Nicolin Chen
2025-10-21 1:43 ` [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
2025-10-21 1:43 ` [PATCH v4 16/16] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To help avoid duplicate logic when programing DTE for nested translation.
Note that this commit changes behavior of when the IOMMU driver is
switching domain during attach and the blocking domain, where DTE bit
fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt)
and System management message could be affected. These DTE bits are
specified in the IVRS table for specific devices, and should be persistent.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 13 +++++++++++++
drivers/iommu/amd/iommu.c | 11 -----------
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 3730d8bbe6dc..cfb63de7732a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -197,9 +197,22 @@ void amd_iommu_update_dte256(struct amd_iommu *iommu,
static inline void
amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
{
+ struct dev_table_entry *initial_dte;
+ struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
+
/* All existing DTE must have V bit set */
new->data128[0] = DTE_FLAG_V;
new->data128[1] = 0;
+
+ /*
+ * Restore cached persistent DTE bits, which can be set by information
+ * in IVRS table. See set_dev_entry_from_acpi().
+ */
+ initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
+ if (initial_dte) {
+ new->data128[0] |= initial_dte->data128[0];
+ new->data128[1] |= initial_dte->data128[1];
+ }
}
/* NESTED */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e4db6f304599..e3330f3b8c14 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2078,7 +2078,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
{
u16 domid;
u32 old_domid;
- struct dev_table_entry *initial_dte;
struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
@@ -2119,16 +2118,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
- /*
- * Restore cached persistent DTE bits, which can be set by information
- * in IVRS table. See set_dev_entry_from_acpi().
- */
- initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
- if (initial_dte) {
- new.data128[0] |= initial_dte->data128[0];
- new.data128[1] |= initial_dte->data128[1];
- }
-
set_dte_gcr3_table(iommu, dev_data, &new);
amd_iommu_update_dte256(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (13 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 14/16] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-23 13:08 ` Jason Gunthorpe
2025-10-21 1:43 ` [PATCH v4 16/16] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce the set_dte_v1() helper function to configure IOMMU host (v1)
page table into DTE.
There is no functional change.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 3 ++
drivers/iommu/amd/iommu.c | 57 ++++++++++++++++++++---------------
2 files changed, 35 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index cfb63de7732a..5e61fdb2c6c0 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,6 +190,9 @@ struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
int amd_iommu_completion_wait(struct amd_iommu *iommu);
/* DTE */
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain,
+ struct dev_table_entry *new);
int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
void amd_iommu_update_dte256(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e3330f3b8c14..428008cff06a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2041,16 +2041,12 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
*/
-static void set_dte_gcr3_table(struct amd_iommu *iommu,
- struct iommu_dev_data *dev_data,
+static void set_dte_gcr3_table(struct iommu_dev_data *dev_data,
struct dev_table_entry *target)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
u64 gcr3;
- if (!gcr3_info->gcr3_tbl)
- return;
-
pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
__func__, dev_data->devid, gcr3_info->glx,
(unsigned long long)gcr3_info->gcr3_tbl);
@@ -2071,6 +2067,26 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
else
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
+
+ /* Note: PRI is only supported w/ GCR3 table */
+ if (dev_data->ppr)
+ target->data[0] |= 1ULL << DEV_ENTRY_PPR;
+}
+
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain,
+ struct dev_table_entry *new)
+{
+ u64 htrp;
+
+ new->data[0] |= FIELD_PREP(DTE_MODE_MASK, domain->iop.mode);
+
+ htrp = FIELD_GET(GENMASK_ULL(51, 12), iommu_virt_to_phys(domain->iop.root));
+ new->data[0] |= FIELD_PREP(DTE_HOST_TRP, htrp);
+
+ /* Note Dirty tracking is used for v1 table only for now */
+ if (domain->dirty_tracking)
+ new->data[0] |= DTE_FLAG_HAD;
}
static void set_dte_entry(struct amd_iommu *iommu,
@@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
else
domid = domain->id;
- amd_iommu_make_clear_dte(dev_data, &new);
-
- if (domain->iop.mode != PAGE_MODE_NONE)
- new.data[0] |= iommu_virt_to_phys(domain->iop.root);
-
- new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
- << DEV_ENTRY_MODE_SHIFT;
-
- new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
-
/*
* When SNP is enabled, we can only support TV=1 with non-zero domain ID.
* This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
* do_iommu_domain_alloc().
*/
WARN_ON(amd_iommu_snp_en && (domid == 0));
- new.data[0] |= DTE_FLAG_TV;
- if (dev_data->ppr)
- new.data[0] |= 1ULL << DEV_ENTRY_PPR;
+ amd_iommu_make_clear_dte(dev_data, &new);
- if (domain->dirty_tracking)
- new.data[0] |= DTE_FLAG_HAD;
+ old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- if (dev_data->ats_enabled)
- new.data[1] |= DTE_FLAG_IOTLB;
+ if (gcr3_info && gcr3_info->gcr3_tbl)
+ set_dte_gcr3_table(dev_data, &new);
+ else if (domain->iop.mode != PAGE_MODE_NONE)
+ amd_iommu_set_dte_v1(dev_data, domain, &new);
- old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- new.data[1] |= domid;
+ /* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
+ new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
+ new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
- set_dte_gcr3_table(iommu, dev_data, &new);
+ if (dev_data->ats_enabled)
+ new.data[1] |= DTE_FLAG_IOTLB;
amd_iommu_update_dte256(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 16/16] iommu/amd: Add support for nested domain attach/detach
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (14 preceding siblings ...)
2025-10-21 1:43 ` [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2025-10-21 1:43 ` Suravee Suthikulpanit
2025-10-23 13:17 ` Jason Gunthorpe
15 siblings, 1 reply; 43+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-21 1:43 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/nested.c | 71 +++++++++++++++++++++++++++++
2 files changed, 72 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 09952b7a256d..243dc1dfd394 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -421,6 +421,7 @@
#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_MODE_MASK GENMASK_ULL(11, 9)
#define DTE_HOST_TRP GENMASK_ULL(51, 12)
+#define DTE_FLAG_PPR BIT_ULL(52)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 383a3c7b4a91..911ab331be6c 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -141,6 +141,76 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
return ERR_PTR(ret);
}
+static void set_dte_nested(struct amd_iommu *iommu,
+ struct iommu_domain *dom,
+ struct iommu_dev_data *dev_data)
+{
+ struct protection_domain *parent;
+ struct dev_table_entry new = {0};
+ struct nested_domain *ndom = to_ndomain(dom);
+ struct iommu_hwpt_amd_guest *gdte = &ndom->gdte;
+
+ /*
+ * The nest parent domain is attached during the call to the
+ * struct iommu_ops.viommu_init(), which will be stored as part
+ * of the struct amd_iommu_viommu.parent.
+ */
+ if (WARN_ON(!ndom->viommu || !ndom->viommu->parent))
+ return;
+
+ parent = ndom->viommu->parent;
+ amd_iommu_make_clear_dte(dev_data, &new);
+
+ /* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
+ new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
+ new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, ndom->hdom_id);
+
+ if (dev_data->ats_enabled)
+ new.data[1] |= DTE_FLAG_IOTLB;
+
+ /*
+ * Use nested domain ID to program DTE.
+ * See amd_iommu_alloc_domain_nested().
+ */
+ amd_iommu_set_dte_v1(dev_data, parent, &new);
+
+ /* Guest PPR */
+ new.data[0] |= gdte->dte[0] & DTE_FLAG_PPR;
+
+ /* Guest translation stuff */
+ new.data[0] |= gdte->dte[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV);
+
+ /* GCR3 table */
+ new.data[0] |= gdte->dte[0] & DTE_GCR3_14_12;
+ new.data[1] |= gdte->dte[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31);
+
+ /* Guest paging mode */
+ new.data[2] |= gdte->dte[2] & DTE_GPT_LEVEL_MASK;
+
+ amd_iommu_update_dte256(iommu, dev_data, &new);
+}
+
+static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
+{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+ struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
+ int ret = 0;
+
+ if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
+ return -EINVAL;
+
+ mutex_lock(&dev_data->mutex);
+
+ /* Update device table entry */
+ set_dte_nested(iommu, dom, dev_data);
+ amd_iommu_device_flush_dte(dev_data);
+ amd_iommu_completion_wait(iommu);
+
+ mutex_unlock(&dev_data->mutex);
+
+ return ret;
+}
+
static void nested_domain_free(struct iommu_domain *dom)
{
struct nested_domain *ndom = to_ndomain(dom);
@@ -157,6 +227,7 @@ static void nested_domain_free(struct iommu_domain *dom)
}
static const struct iommu_domain_ops nested_domain_ops = {
+ .attach_dev = nested_attach_device,
.free = nested_domain_free,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu
2025-10-21 1:43 ` [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
@ 2025-10-22 20:00 ` Jason Gunthorpe
2025-10-23 2:33 ` Nicolin Chen
1 sibling, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-10-22 20:00 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:19AM +0000, Suravee Suthikulpanit wrote:
> +int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
> + const struct iommu_user_data *user_data)
> +{
> + struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> +
> + /*
> + */
> + aviommu->parent = to_pdomain(parent);
Drop the /* */ - otherwise looks OK
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 12/16] iommu/amd: Add support for nested domain allocation
2025-10-21 1:43 ` [PATCH v4 12/16] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
@ 2025-10-22 20:01 ` Jason Gunthorpe
0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-10-22 20:01 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:20AM +0000, Suravee Suthikulpanit wrote:
> The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
> stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
> table along with guest page tables. The struct iommu_hwpt_amd_guest
> contains this information, and is passed from user-space as a parameter
> of the struct iommu_ops.domain_alloc_nested().
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/Makefile | 2 +-
> drivers/iommu/amd/amd_iommu.h | 4 +
> drivers/iommu/amd/amd_iommu_types.h | 14 ++++
> drivers/iommu/amd/nested.c | 111 ++++++++++++++++++++++++++++
Might be nicer to put this in amd/iommufd.c, but up to you
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID
2025-10-21 1:43 ` [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID Suravee Suthikulpanit
@ 2025-10-22 20:08 ` Jason Gunthorpe
2025-11-05 10:50 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-10-22 20:08 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:21AM +0000, Suravee Suthikulpanit wrote:
> Each nested domain is assigned guest domain ID (gDomID), which guest OS
> programs into guest Device Table Entry (gDTE). For each gDomID, the driver
> assigns a corresponding host domain ID (hDomID), which will be programmed
> into the host Device Table Entry (hDTE).
>
> The gDTE to hDTE 1:1 mapping is stored in the nest parent domain using
> an xarray (struct protection_domain.gdomid_array). When invalidate the
> nest parent domain, the INVALIDATE_IOMMU_PAGES must be issued for each
> hDomID in the gdomid_array.
I think this should be stored in the viommu..
It is a small unrealistic detail but very pedantically the API allows
creating two VIOMMU's from the same NEST PARENT domain and if someone
did this then each of the VIOMMU should have its own private gDomID
number space and own separated xarray.
Allowing two VIOMMUs to share the same hDomID could be problematic
because we don't know the PASID layout is consistent.
> +static int iommu_flush_hdom_ids(struct amd_iommu *iommu,
> + u64 address, size_t size,
> + struct protection_domain *parent)
> +{
> + int ret = 0;
> + unsigned long i;
> + struct iommu_cmd cmd;
> + struct nested_domain *ndom;
> +
> + xa_for_each(&parent->gdomid_array, i, ndom) {
This doesn't seem right.. There could be many nested_domains sharing
the same gDomID..
I expect this xarray to have a struct like
struct gdomid {
refcount_t users;
u32 hdomid;
};
And each nested_domain will go into the viommu and either allocate a
new gdomid or ++users for the existing one. Inverse when destroying a
nested_domain.
> @@ -92,6 +92,49 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> ndom->domain.type = IOMMU_DOMAIN_NESTED;
> ndom->viommu = aviommu;
>
> + /*
> + * Normally, when a guest has multiple pass-through devices,
> + * the IOMMU driver setup DTEs with the same stage-2 table and
> + * use the same host domain ID (hDomId). In case of nested translation,
> + * if the guest setup different stage-1 tables with same PASID,
> + * IOMMU would use the same TLB tag. This will results in TLB
> + * aliasing issue.
> + *
> + * The guest is assigning gDomIDs based on its own algorithm for managing
> + * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
> + * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
> + * to a single hDomID. This is done using an xarray in the nest parent domain to
> + * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
> + * command must be issued for each hDomID in the xarray.
> + *
> + * Since there is no invalidation support and no viommu yet, just always use a
> + * unique hDomID for now.
It is not "for now" anymore, this is the correct algorithm..
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported.
2025-10-21 1:43 ` [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2025-10-23 2:24 ` Nicolin Chen
2025-11-08 17:39 ` Vasant Hegde
1 sibling, 0 replies; 43+ messages in thread
From: Nicolin Chen @ 2025-10-23 2:24 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:17AM +0000, Suravee Suthikulpanit wrote:
> The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
> with GPA (instead of SPA). This simplifies the implementation, and is
> a pre-requisite for nested translation support.
>
> Therefore, always enable this feature if available.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation
2025-10-21 1:43 ` [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
@ 2025-10-23 2:27 ` Nicolin Chen
2025-10-23 20:08 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 43+ messages in thread
From: Nicolin Chen @ 2025-10-23 2:27 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:18AM +0000, Suravee Suthikulpanit wrote:
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 9226edd8af69..c34604cf1811 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -107,6 +107,7 @@
>
>
> /* Extended Feature 2 Bits */
> +#define FEATURE_GCR3TRPMODE BIT_ULL(3)
> #define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
> #define FEATURE_SNPAVICSUP_GAM(x) \
> (FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
This now should be moved to the previous patch?
Nicolin
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu
2025-10-21 1:43 ` [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
2025-10-22 20:00 ` Jason Gunthorpe
@ 2025-10-23 2:33 ` Nicolin Chen
1 sibling, 0 replies; 43+ messages in thread
From: Nicolin Chen @ 2025-10-23 2:33 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:19AM +0000, Suravee Suthikulpanit wrote:
> Which stores reference to nested parent domain assigned during the call to
> struct iommu_ops.viommu_init(). Information in the nest parent is needed
> when setting up the nested translation.
>
> Note that the viommu initialization will be introduced in subsequent
> commit.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 14/16] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2025-10-21 1:43 ` [PATCH v4 14/16] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2025-10-23 2:49 ` Nicolin Chen
0 siblings, 0 replies; 43+ messages in thread
From: Nicolin Chen @ 2025-10-23 2:49 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:22AM +0000, Suravee Suthikulpanit wrote:
> To help avoid duplicate logic when programing DTE for nested translation.
>
> Note that this commit changes behavior of when the IOMMU driver is
> switching domain during attach and the blocking domain, where DTE bit
> fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt)
> and System management message could be affected. These DTE bits are
> specified in the IVRS table for specific devices, and should be persistent.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE
2025-10-21 1:43 ` [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2025-10-23 13:08 ` Jason Gunthorpe
2025-11-08 17:26 ` Vasant Hegde
0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-10-23 13:08 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
> @@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
> else
> domid = domain->id;
>
> /*
> * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
> * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
> * do_iommu_domain_alloc().
> */
> WARN_ON(amd_iommu_snp_en && (domid == 0));
>
> + amd_iommu_make_clear_dte(dev_data, &new);
>
> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
This old_domid stuff doesn't make any sense. I think the commit that
added it is bonkers: 36b7200f67df ("iommu/amd: Flush old domains in kdump kernel")
The problem is that the dom ids that are present in the re-used DTE
table have to be reserved from the domain id alloctor at boot.
If the kdump kernel tries to create new DTEs it MUST NOT re-use any
IDs that are actively being using in DTEs already or you get data
corruption. Randomly flushing IDs is just getting lucky..
Not for this series, but something to think about.
> + if (gcr3_info && gcr3_info->gcr3_tbl)
> + set_dte_gcr3_table(dev_data, &new);
> + else if (domain->iop.mode != PAGE_MODE_NONE)
> + amd_iommu_set_dte_v1(dev_data, domain, &new);
>
> + /* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
> + new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> + new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
>
> + if (dev_data->ats_enabled)
> + new.data[1] |= DTE_FLAG_IOTLB;
These three should be merged into the two functions so they stand
alone. These sets have to be made in the next patch, doesn't make
sense to open code them in callers.
Like this, it is simple and readable. It directly answers the question
'what bits are set when the driver creates this kind of DTE'
static void set_dte_gcr3_table(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *target)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
target->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
DTE_FLAG_GV | FIELD_PREP(DTE_GLX, gcr3_info->glx) |
FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
(pdom_is_v2_pgtbl_mode(dev_data->domain) ?
target->data[0] |= DTE_FLAG_GIOV :
0);
target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31) |
FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid);
/* Guest page table can only support 4 and 5 levels */
if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
else
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
}
void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
struct protection_domain *domain,
struct dev_table_entry *new)
{
u64 htrp = iommu_virt_to_phys(domain->iop.root);
new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
FIELD_PREP(DTE_MODE_MASK, domain->iop.mode) |
FIELD_PREP(DTE_HOST_TRP, htrp >> 12) |
(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
(domain->dirty_tracking ? DTE_FLAG_HAD : 0);
new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id);
}
(It is nice to sort the fields by the spec order, I didn't do that)
Looks like an identity one is needed too:
void set_dte_identity(struct dev_table_entry *new)
{
new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
}
Then the whole function is pretty much just:
amd_iommu_make_clear_dte(dev_data, &new);
if (gcr3_info && gcr3_info->gcr3_tbl)
set_dte_gcr3_table(dev_data, &new);
else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
set_dte_identity(&new)
else if (domain->domain.type == IOMMU_DOMAIN_PAGING && io.mode == V1)
amd_iommu_set_dte_v1(dev_data, domain, &new);
else
WARN_ON(true);
?
(though how does IDENTITY on a device with a PASID installed work?)
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 16/16] iommu/amd: Add support for nested domain attach/detach
2025-10-21 1:43 ` [PATCH v4 16/16] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-10-23 13:17 ` Jason Gunthorpe
0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-10-23 13:17 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Tue, Oct 21, 2025 at 01:43:24AM +0000, Suravee Suthikulpanit wrote:
> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> + int ret = 0;
> +
> + if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
> + return -EINVAL;
> +
> + mutex_lock(&dev_data->mutex);
> +
> + /* Update device table entry */
> + set_dte_nested(iommu, dom, dev_data);
> + amd_iommu_device_flush_dte(dev_data);
> + amd_iommu_completion_wait(iommu);
The structure is still upside down and it should probably be fixed
rather than duplicate this code (and missing the clone_aliases!)
dev_update_dte() should take in the struct dev_table_entry *:
static void dev_update_dte(struct iommu_dev_data *dev_data,
struct dev_table_entry *dte)
{
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
amd_iommu_update_dte256(iommu, dev_data, dte);
clone_aliases(iommu, dev_data->dev);
device_flush_dte(dev_data);
iommu_completion_wait(iommu);
}
And this function should be exported and called instead of open coding
it above.
Rework set_dte_entry() to return new instead of calling
update_dte256().
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation
2025-10-23 2:27 ` Nicolin Chen
@ 2025-10-23 20:08 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 43+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-23 20:08 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 10/22/2025 9:27 PM, Nicolin Chen wrote:
> On Tue, Oct 21, 2025 at 01:43:18AM +0000, Suravee Suthikulpanit wrote:
>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>> index 9226edd8af69..c34604cf1811 100644
>> --- a/drivers/iommu/amd/amd_iommu_types.h
>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>> @@ -107,6 +107,7 @@
>>
>>
>> /* Extended Feature 2 Bits */
>> +#define FEATURE_GCR3TRPMODE BIT_ULL(3)
>> #define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
>> #define FEATURE_SNPAVICSUP_GAM(x) \
>> (FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
>
> This now should be moved to the previous patch?
>
> Nicolin
Ah, I was moving stuff around and missed this.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID
2025-10-22 20:08 ` Jason Gunthorpe
@ 2025-11-05 10:50 ` Suthikulpanit, Suravee
2025-11-05 13:35 ` Jason Gunthorpe
0 siblings, 1 reply; 43+ messages in thread
From: Suthikulpanit, Suravee @ 2025-11-05 10:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
Jason
On 10/23/2025 3:08 AM, Jason Gunthorpe wrote:
> On Tue, Oct 21, 2025 at 01:43:21AM +0000, Suravee Suthikulpanit wrote:
>> Each nested domain is assigned guest domain ID (gDomID), which guest OS
>> programs into guest Device Table Entry (gDTE). For each gDomID, the driver
>> assigns a corresponding host domain ID (hDomID), which will be programmed
>> into the host Device Table Entry (hDTE).
>>
>> The gDTE to hDTE 1:1 mapping is stored in the nest parent domain using
>> an xarray (struct protection_domain.gdomid_array). When invalidate the
>> nest parent domain, the INVALIDATE_IOMMU_PAGES must be issued for each
>> hDomID in the gdomid_array.
>
> I think this should be stored in the viommu..
>
> It is a small unrealistic detail but very pedantically the API allows
> creating two VIOMMU's from the same NEST PARENT domain and if someone
> did this then each of the VIOMMU should have its own private gDomID
> number space and own separated xarray.
Actually, to support nested translation w/ HW-based vIOMMU support in
the guest w/ two VFIO devices on two different physical IOMMUs, it would
require setting up two iommufd_viommu structures (one for each IOMMU)
and share the same parent domain (single GPA-SPA mapping). Also, AMD
HW-vIOMMU use a single domain ID (gDomID-to-hDomID) mapping table per
guest-ID. Since the table is indexed using gDomID, it requires single
gDomID space per guest.
In this case, it makes more sense to store the gDomID-to-hDomID mapping
in the parent domain since:
1. There is one gDomID-space per guest and there is one parent
domain per guest.
2. When host issues invalidation for a parent domain, IOMMU driver
needs to issue an invalidate command for each hDomId used for the same
parent domain (on each IOMMU). We can't do this if we store xarray in
the viommu. Otherwise, we would need to store a list of vIOMMUs per
parent domain.
> Allowing two VIOMMUs to share the same hDomID could be problematic
> because we don't know the PASID layout is consistent.
Not sure why PASID layout matters here?
>> +static int iommu_flush_hdom_ids(struct amd_iommu *iommu,
>> + u64 address, size_t size,
>> + struct protection_domain *parent)
>> +{
>> + int ret = 0;
>> + unsigned long i;
>> + struct iommu_cmd cmd;
>> + struct nested_domain *ndom;
>> +
>> + xa_for_each(&parent->gdomid_array, i, ndom) {
>
> This doesn't seem right.. There could be many nested_domains sharing
> the same gDomID..
>
> I expect this xarray to have a struct like
>
> struct gdomid {
> refcount_t users;
> u32 hdomid;
> };
>
> And each nested_domain will go into the viommu and either allocate a
> new gdomid or ++users for the existing one. Inverse when destroying a
> nested_domain.
Got it. I have new code for this and will send out in v5 soon.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID
2025-11-05 10:50 ` Suthikulpanit, Suravee
@ 2025-11-05 13:35 ` Jason Gunthorpe
0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-11-05 13:35 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Wed, Nov 05, 2025 at 05:50:44PM +0700, Suthikulpanit, Suravee wrote:
> Jason
>
> On 10/23/2025 3:08 AM, Jason Gunthorpe wrote:
> > On Tue, Oct 21, 2025 at 01:43:21AM +0000, Suravee Suthikulpanit wrote:
> > > Each nested domain is assigned guest domain ID (gDomID), which guest OS
> > > programs into guest Device Table Entry (gDTE). For each gDomID, the driver
> > > assigns a corresponding host domain ID (hDomID), which will be programmed
> > > into the host Device Table Entry (hDTE).
> > >
> > > The gDTE to hDTE 1:1 mapping is stored in the nest parent domain using
> > > an xarray (struct protection_domain.gdomid_array). When invalidate the
> > > nest parent domain, the INVALIDATE_IOMMU_PAGES must be issued for each
> > > hDomID in the gdomid_array.
> >
> > I think this should be stored in the viommu..
> >
> > It is a small unrealistic detail but very pedantically the API allows
> > creating two VIOMMU's from the same NEST PARENT domain and if someone
> > did this then each of the VIOMMU should have its own private gDomID
> > number space and own separated xarray.
>
> Actually, to support nested translation w/ HW-based vIOMMU support in the
> guest w/ two VFIO devices on two different physical IOMMUs, it would require
> setting up two iommufd_viommu structures (one for each IOMMU) and share the
> same parent domain (single GPA-SPA mapping). Also, AMD HW-vIOMMU use a
> single domain ID (gDomID-to-hDomID) mapping table per guest-ID. Since the
> table is indexed using gDomID, it requires single gDomID space per guest.
Yes, this is why the guest ID needs to come from the viommu object,
not the parent domain.
> In this case, it makes more sense to store the gDomID-to-hDomID mapping in
> the parent domain since:
>
> 1. There is one gDomID-space per guest and there is one parent domain per
> guest.
No. There is one gDomID-space *per viommu*. The driver has no concept
of what a guest is, that isn't part of our API.
For AMD hardware it is the viommu object that must hold the gDomID,
and all the other related HW datastructures for a "guest".
One viommu object per what the AMD spec calls a "guest".
> 2. When host issues invalidation for a parent domain, IOMMU driver needs
> to issue an invalidate command for each hDomId used for the same parent
> domain (on each IOMMU). We can't do this if we store xarray in the viommu.
> Otherwise, we would need to store a list of vIOMMUs per parent domain.
Correct, however this is just showing that the driver invalidation
logic is incorrect. See how Nicolin is working to fix this same
problem in the ARM driver. The driver cannot assume a domain has only
a single domain id in a viommu world.
For now you should still store the xarray in the viommu and then
prevent the parent domain from being used with more than one viommu
somehow so it doesn't get into trouble with the limited invalidation
logic.
Someday maybe we can fix the invalidation logic and allow domain
sharing. But for now the code should be structured along its own
limitations..
> > Allowing two VIOMMUs to share the same hDomID could be problematic
> > because we don't know the PASID layout is consistent.
>
> Not sure why PASID layout matters here?
I'm saying you cannot share hDomID across two VIOMMU objects, it is
not allowed.
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 02/16] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
2025-10-21 1:43 ` [PATCH v4 02/16] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-11-08 17:24 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:24 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> This will be reused in a new iommufd.c file for nested translation.
s/iommufd.c/nested.c/ -OR- something like "it will be used in nested translation
support".
Ditto for next few patches.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
> ---
> drivers/iommu/amd/amd_iommu.h | 1 +
> drivers/iommu/amd/iommu.c | 8 ++++----
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index 9b4b589a54b5..6ea549816a1f 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -26,6 +26,7 @@ void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
> void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
> void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
> gfp_t gfp, size_t size);
> +int amd_iommu_pdom_id_alloc(void);
>
> #ifdef CONFIG_AMD_IOMMU_DEBUGFS
> void amd_iommu_debugfs_setup(void);
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a9b17d31a969..78b3e5485006 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1818,7 +1818,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
> *
> ****************************************************************************/
>
> -static int pdom_id_alloc(void)
> +int amd_iommu_pdom_id_alloc(void)
> {
> return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
> }
> @@ -1906,7 +1906,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
> return -EBUSY;
>
> /* Allocate per device domain ID */
> - domid = pdom_id_alloc();
> + domid = amd_iommu_pdom_id_alloc();
> if (domid <= 0)
> return -ENOSPC;
> gcr3_info->domid = domid;
> @@ -2489,7 +2489,7 @@ struct protection_domain *protection_domain_alloc(void)
> if (!domain)
> return NULL;
>
> - domid = pdom_id_alloc();
> + domid = amd_iommu_pdom_id_alloc();
> if (domid <= 0) {
> kfree(domain);
> return NULL;
> @@ -2681,7 +2681,7 @@ void amd_iommu_init_identity_domain(void)
> domain->ops = &identity_domain_ops;
> domain->owner = &amd_iommu_ops;
>
> - identity_domain.id = pdom_id_alloc();
> + identity_domain.id = amd_iommu_pdom_id_alloc();
>
> protection_domain_init(&identity_domain);
> }
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2025-10-21 1:43 ` [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-11-08 17:25 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:25 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> Also change the define to use GENMASK_ULL instead.
> There is no functional change.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE
2025-10-23 13:08 ` Jason Gunthorpe
@ 2025-11-08 17:26 ` Vasant Hegde
2025-11-08 23:03 ` Jason Gunthorpe
0 siblings, 1 reply; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:26 UTC (permalink / raw)
To: Jason Gunthorpe, Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
Jason,
On 10/23/2025 6:38 PM, Jason Gunthorpe wrote:
> On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
>> @@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
>> else
>> domid = domain->id;
>>
>> /*
>> * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
>> * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
>> * do_iommu_domain_alloc().
>> */
>> WARN_ON(amd_iommu_snp_en && (domid == 0));
>>
>> + amd_iommu_make_clear_dte(dev_data, &new);
>>
>> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
>
> This old_domid stuff doesn't make any sense. I think the commit that
> added it is bonkers: 36b7200f67df ("iommu/amd: Flush old domains in kdump kernel")
>
> The problem is that the dom ids that are present in the re-used DTE
> table have to be reserved from the domain id alloctor at boot.
> > If the kdump kernel tries to create new DTEs it MUST NOT re-use any
> IDs that are actively being using in DTEs already or you get data
> corruption. Randomly flushing IDs is just getting lucky..
Good catch. Thanks!
Looks like commit 38e5f33ee359 ("iommu/amd: Reuse device table for kdump") broke
domain ID reservation in kdump path :-( We will fix it.
>
> Not for this series, but something to think about.
>
>> + if (gcr3_info && gcr3_info->gcr3_tbl)
>> + set_dte_gcr3_table(dev_data, &new);
>> + else if (domain->iop.mode != PAGE_MODE_NONE)
>> + amd_iommu_set_dte_v1(dev_data, domain, &new);
>>
>> + /* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
>> + new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
>> + new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
>>
>> + if (dev_data->ats_enabled)
>> + new.data[1] |= DTE_FLAG_IOTLB;
>
> These three should be merged into the two functions so they stand
> alone. These sets have to be made in the next patch, doesn't make
> sense to open code them in callers.
>
> Like this, it is simple and readable. It directly answers the question
> 'what bits are set when the driver creates this kind of DTE'
>
> static void set_dte_gcr3_table(struct amd_iommu *iommu,
> struct iommu_dev_data *dev_data,
> struct dev_table_entry *target)
> {
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
>
> target->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> DTE_FLAG_GV | FIELD_PREP(DTE_GLX, gcr3_info->glx) |
> FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
> (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
> (pdom_is_v2_pgtbl_mode(dev_data->domain) ?
> target->data[0] |= DTE_FLAG_GIOV :
> 0);
>
> target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
> FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31) |
> FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid);
>
> /* Guest page table can only support 4 and 5 levels */
> if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
> else
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
> }
>
> void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
> struct protection_domain *domain,
> struct dev_table_entry *new)
> {
> u64 htrp = iommu_virt_to_phys(domain->iop.root);
>
> new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> FIELD_PREP(DTE_MODE_MASK, domain->iop.mode) |
> FIELD_PREP(DTE_HOST_TRP, htrp >> 12) |
> (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
> (domain->dirty_tracking ? DTE_FLAG_HAD : 0);
> new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id);
> }
>
> (It is nice to sort the fields by the spec order, I didn't do that)
>
> Looks like an identity one is needed too:
>
> void set_dte_identity(struct dev_table_entry *new)
> {
> new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
> }
>
> Then the whole function is pretty much just:
>
> amd_iommu_make_clear_dte(dev_data, &new);
> if (gcr3_info && gcr3_info->gcr3_tbl)
> set_dte_gcr3_table(dev_data, &new);
> else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> set_dte_identity(&new)
> else if (domain->domain.type == IOMMU_DOMAIN_PAGING && io.mode == V1)
> amd_iommu_set_dte_v1(dev_data, domain, &new);
> else
> WARN_ON(true);
>
> ?
>
> (though how does IDENTITY on a device with a PASID installed work?)
Probably set_dte_identity() should call set_dte_gcr3_table () and update
relevant fields.
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 03/16] iommu/amd: Make amd_iommu_pdom_id_free() non-static
2025-10-21 1:43 ` [PATCH v4 03/16] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
@ 2025-11-08 17:26 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:26 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> This will be reused in a new iommufd.c file for nested translation.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 04/16] iommu/amd: Make amd_iommu_device_flush_dte() non-static
2025-10-21 1:43 ` [PATCH v4 04/16] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
@ 2025-11-08 17:27 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:27 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> This will be reused in a new iommufd.c file for nested translation.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 05/16] iommu/amd: Make amd_iommu_update_dte256() non-static
2025-10-21 1:43 ` [PATCH v4 05/16] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
@ 2025-11-08 17:27 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:27 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> This will be reused in a new iommufd.c file for nested translation.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 06/16] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
2025-10-21 1:43 ` [PATCH v4 06/16] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-11-08 17:27 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:27 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> This will be reused in a new iommufd.c file for nested translation.
>
> Also, remove unused function parameter ptr.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 08/16] iommufd: Introduce data struct for AMD nested domain allocation
2025-10-21 1:43 ` [PATCH v4 08/16] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-11-08 17:30 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:30 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
> which is used for stage-1 in nested translation. The data structure
> contains information necessary for setting up the AMD HW-vIOMMU support.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
> ---
> include/uapi/linux/iommufd.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index efb52709c0a2..d111ee1dc572 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
> __aligned_le64 ste[2];
> };
>
> +/**
> + * struct iommu_hwpt_amd_guest - AMD IOMMU guest I/O page table data
> + * (IOMMU_HWPT_DATA_AMD_GUEST)
> + * @dte: Guest Device Table Entry (DTE)
> + */
> +struct iommu_hwpt_amd_guest {
> + __aligned_u64 dte[4];
> +};
> +
> /**
> * enum iommu_hwpt_data_type - IOMMU HWPT Data Type
> * @IOMMU_HWPT_DATA_NONE: no data
> * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
> * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
> + * @IOMMU_HWPT_DATA_AMD_GUEST: AMD IOMMU guest page table
> */
> enum iommu_hwpt_data_type {
> IOMMU_HWPT_DATA_NONE = 0,
> IOMMU_HWPT_DATA_VTD_S1 = 1,
> IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
> + IOMMU_HWPT_DATA_AMD_GUEST = 3,
> };
>
> /**
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 07/16] iommu/amd: Make amd_iommu_completion_wait() non-static
2025-10-21 1:43 ` [PATCH v4 07/16] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
@ 2025-11-08 17:32 ` Vasant Hegde
0 siblings, 0 replies; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:32 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> This will be reused in a new iommufd.c file for nested translation.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Patch looks good. But like Jason mentioned in oatch 16, if we use
dev_update_dte() we may not needs this patch.
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported.
2025-10-21 1:43 ` [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
2025-10-23 2:24 ` Nicolin Chen
@ 2025-11-08 17:39 ` Vasant Hegde
2025-11-11 13:59 ` Suthikulpanit, Suravee
1 sibling, 1 reply; 43+ messages in thread
From: Vasant Hegde @ 2025-11-08 17:39 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
> The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
> with GPA (instead of SPA). This simplifies the implementation, and is
> a pre-requisite for nested translation support.
>
> Therefore, always enable this feature if available.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 1 +
> drivers/iommu/amd/init.c | 3 +++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 556f1df32d53..9226edd8af69 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -185,6 +185,7 @@
> #define CONTROL_EPH_EN 45
> #define CONTROL_XT_EN 50
> #define CONTROL_INTCAPXT_EN 51
> +#define CONTROL_GCR3TRPMODE 58
> #define CONTROL_IRTCACHEDIS 59
> #define CONTROL_SNPAVIC_EN 61
>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index f2991c11867c..c45a4bd89569 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -1122,6 +1122,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
> return;
>
> iommu_feature_enable(iommu, CONTROL_GT_EN);
> +
> + if (check_feature2(FEATURE_GCR3TRPMODE))
> + iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
This works for now as iommu_snp_enable() enable is getting called later.
But how about moving it to iommu_init_flags() ? Also we should probably add a
comment here.
Not for this series, but may be we should rename FEATURE -> FEATURE2 so that its
clear that these are coming from EFR2 register.
-Vasant
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-08 17:26 ` Vasant Hegde
@ 2025-11-08 23:03 ` Jason Gunthorpe
2025-11-12 18:40 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2025-11-08 23:03 UTC (permalink / raw)
To: Vasant Hegde
Cc: Suravee Suthikulpanit, nicolinc, linux-kernel, robin.murphy, will,
joro, kevin.tian, jsnitsel, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Sat, Nov 08, 2025 at 10:56:38PM +0530, Vasant Hegde wrote:
> > On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
> >> @@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
> >> else
> >> domid = domain->id;
> >>
> >> /*
> >> * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
> >> * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
> >> * do_iommu_domain_alloc().
> >> */
> >> WARN_ON(amd_iommu_snp_en && (domid == 0));
> >>
> >> + amd_iommu_make_clear_dte(dev_data, &new);
> >>
> >> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
> >
> > This old_domid stuff doesn't make any sense. I think the commit that
> > added it is bonkers: 36b7200f67df ("iommu/amd: Flush old domains in kdump kernel")
> >
> > The problem is that the dom ids that are present in the re-used DTE
> > table have to be reserved from the domain id alloctor at boot.
> > > If the kdump kernel tries to create new DTEs it MUST NOT re-use any
> > IDs that are actively being using in DTEs already or you get data
> > corruption. Randomly flushing IDs is just getting lucky..
>
> Good catch. Thanks!
>
> Looks like commit 38e5f33ee359 ("iommu/amd: Reuse device table for kdump") broke
> domain ID reservation in kdump path :-( We will fix it.
I suggest just reading the whole DTE and permanently reserving any
DomIDs from the ida. Just leak them forever, that is good enough for a
kdump kernel. Then get rid of this stuff above.
> > (though how does IDENTITY on a device with a PASID installed work?)
>
> Probably set_dte_identity() should call set_dte_gcr3_table () and update
> relevant fields.
Yeah, maybe so, though it might be best if the caller can detect it
and have it call both functions instead of trying to bury it.
There is a fair number of variations here as you have
blocked/identity, RID/PASID attach and PASID detach to all consider.
It would be nice if it worked properly, identity on RID while PASID is
in use has a strong real use case.
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported.
2025-11-08 17:39 ` Vasant Hegde
@ 2025-11-11 13:59 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 43+ messages in thread
From: Suthikulpanit, Suravee @ 2025-11-11 13:59 UTC (permalink / raw)
To: Vasant Hegde, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 11/9/2025 12:39 AM, Vasant Hegde wrote:
>
>
> On 10/21/2025 7:13 AM, Suravee Suthikulpanit wrote:
>> The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
>> with GPA (instead of SPA). This simplifies the implementation, and is
>> a pre-requisite for nested translation support.
>>
>> Therefore, always enable this feature if available.
>>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> drivers/iommu/amd/amd_iommu_types.h | 1 +
>> drivers/iommu/amd/init.c | 3 +++
>> 2 files changed, 4 insertions(+)
>>
>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>> index 556f1df32d53..9226edd8af69 100644
>> --- a/drivers/iommu/amd/amd_iommu_types.h
>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>> @@ -185,6 +185,7 @@
>> #define CONTROL_EPH_EN 45
>> #define CONTROL_XT_EN 50
>> #define CONTROL_INTCAPXT_EN 51
>> +#define CONTROL_GCR3TRPMODE 58
>> #define CONTROL_IRTCACHEDIS 59
>> #define CONTROL_SNPAVIC_EN 61
>>
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index f2991c11867c..c45a4bd89569 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -1122,6 +1122,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
>> return;
>>
>> iommu_feature_enable(iommu, CONTROL_GT_EN);
>> +
>> + if (check_feature2(FEATURE_GCR3TRPMODE))
>> + iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
>
> This works for now as iommu_snp_enable() enable is getting called later.
> But how about moving it to iommu_init_flags() ?
I put it here because the GCR3TRP feature depends on GT supported /
enabled, which is checked in this function. This avoid duplicate checks.
> Also we should probably add a comment here.
I will note in the comment regarding this needs to be called prior to
iommu_snp_enable()
Thanks,
Suravee
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-08 23:03 ` Jason Gunthorpe
@ 2025-11-12 18:40 ` Suthikulpanit, Suravee
2025-11-17 18:08 ` Jason Gunthorpe
0 siblings, 1 reply; 43+ messages in thread
From: Suthikulpanit, Suravee @ 2025-11-12 18:40 UTC (permalink / raw)
To: Jason Gunthorpe, Vasant Hegde
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, iommu, santosh.shukla, sairaj.arunkodilkar, jon.grimm,
prashanthpra, wvw, wnliu, gptran, kpsingh, joao.m.martins,
alejandro.j.jimenez
On 11/9/2025 6:03 AM, Jason Gunthorpe wrote:
> On Sat, Nov 08, 2025 at 10:56:38PM +0530, Vasant Hegde wrote:
>> On 10/23/2025 6:38 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
>>>
>>> (though how does IDENTITY on a device with a PASID installed work?)
>>
>> Probably set_dte_identity() should call set_dte_gcr3_table () and update
>> relevant fields.
Actually, PASID would not work with IDENTITY since it has no page table
(i.e. iommu=pt means DTE[Mode]=0 and does not have host table pointer).
PASID only work with GCR3 table.
Therefore, it does not make sense for set_dte_identity() to call
set_dte_gcr3_table(). Each one should be stand alone.
Thanks,
Suravee
> Yeah, maybe so, though it might be best if the caller can detect it
> and have it call both functions instead of trying to bury it.
>
> There is a fair number of variations here as you have
> blocked/identity, RID/PASID attach and PASID detach to all consider.
>
> It would be nice if it worked properly, identity on RID while PASID is
> in use has a strong real use case.
>
> Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-12 18:40 ` Suthikulpanit, Suravee
@ 2025-11-17 18:08 ` Jason Gunthorpe
0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2025-11-17 18:08 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: Vasant Hegde, nicolinc, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Thu, Nov 13, 2025 at 01:40:47AM +0700, Suthikulpanit, Suravee wrote:
>
>
> On 11/9/2025 6:03 AM, Jason Gunthorpe wrote:
> > On Sat, Nov 08, 2025 at 10:56:38PM +0530, Vasant Hegde wrote:
> > > On 10/23/2025 6:38 PM, Jason Gunthorpe wrote:
> > > > On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
> > > >
> > > > (though how does IDENTITY on a device with a PASID installed work?)
> > >
> > > Probably set_dte_identity() should call set_dte_gcr3_table () and update
> > > relevant fields.
>
> Actually, PASID would not work with IDENTITY since it has no page table
> (i.e. iommu=pt means DTE[Mode]=0 and does not have host table pointer).
> PASID only work with GCR3 table.
>
> Therefore, it does not make sense for set_dte_identity() to call
> set_dte_gcr3_table(). Each one should be stand alone.
OK, so the HW cannot do this?
On SMMUv3 there are bits in the "DTE" (S1DSS) that control how
no-PASID, ie PASID 0 transactions are handled which allow it to be set
to indentity. Intel can do the same.
IMHO this is a HW gap that AMD probably should fix.
In the mean time the driver should be blocking this unsupported
combination, I don't remember seeing that code? It will make it more
difficult to use SVA on AMD platforms, but not much that can be done
about that :\
Jason
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2025-11-17 18:08 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 1:43 [PATCH v4 00/16] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-21 1:43 ` [PATCH v4 01/16] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2025-11-08 17:25 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 02/16] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
2025-11-08 17:24 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 03/16] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
2025-11-08 17:26 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 04/16] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
2025-11-08 17:27 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 05/16] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
2025-11-08 17:27 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 06/16] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
2025-11-08 17:27 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 07/16] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
2025-11-08 17:32 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 08/16] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
2025-11-08 17:30 ` Vasant Hegde
2025-10-21 1:43 ` [PATCH v4 09/16] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
2025-10-23 2:24 ` Nicolin Chen
2025-11-08 17:39 ` Vasant Hegde
2025-11-11 13:59 ` Suthikulpanit, Suravee
2025-10-21 1:43 ` [PATCH v4 10/16] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
2025-10-23 2:27 ` Nicolin Chen
2025-10-23 20:08 ` Suthikulpanit, Suravee
2025-10-21 1:43 ` [PATCH v4 11/16] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
2025-10-22 20:00 ` Jason Gunthorpe
2025-10-23 2:33 ` Nicolin Chen
2025-10-21 1:43 ` [PATCH v4 12/16] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
2025-10-22 20:01 ` Jason Gunthorpe
2025-10-21 1:43 ` [PATCH v4 13/16] iommu/amd: Track host Domain ID mapping for each guest Domain ID Suravee Suthikulpanit
2025-10-22 20:08 ` Jason Gunthorpe
2025-11-05 10:50 ` Suthikulpanit, Suravee
2025-11-05 13:35 ` Jason Gunthorpe
2025-10-21 1:43 ` [PATCH v4 14/16] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
2025-10-23 2:49 ` Nicolin Chen
2025-10-21 1:43 ` [PATCH v4 15/16] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
2025-10-23 13:08 ` Jason Gunthorpe
2025-11-08 17:26 ` Vasant Hegde
2025-11-08 23:03 ` Jason Gunthorpe
2025-11-12 18:40 ` Suthikulpanit, Suravee
2025-11-17 18:08 ` Jason Gunthorpe
2025-10-21 1:43 ` [PATCH v4 16/16] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-10-23 13:17 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox