* [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support
@ 2025-10-09 23:57 Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
` (14 more replies)
0 siblings, 15 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This series introduces support for AMD IOMMU nested page table translation
with the host (v1) and guest (v2) page tables.
In this mode, the AMD IOMMU driver configures the Device Table Entry (DTE)
with host page table root pointer, which is configured by allocating domain
with page table type IOMMU_HWPT_ALLOC_NEST_PARENT.
The guest page tables and Guest CR3 (GCR3) tables are managed by Guest OS,
and stored in the guest DTE (gDTE) in guest memory. VMM is responsible for
passing gDTE information to the host IOMMU driver using struct
iommu_hwpt_amd_guest when allocating a domain type IOMMU_DOMAIN_NESTED.
Then, the gDTE is parsed and program onto host DTE by the AMD IOMMU driver.
In addition, this series introduces base code for IOMMUFD vIOMMU for AMD
IOMMU, and implements vIOMMU-based nested domain allocation interface.
The struct nested_domain to store nested domain information, and
set_dte_nested() helper function to handle DTE programing for the nested
domain.
The series is separated into two parts:
* Patch 1-7 are preparatory patches.
* Patch 8-15 implement nest-parent and nested domains support
for IOMMUFD vIOMMU.
Note: This series is rebased on top of:
* [PATCH v5] iommu/amd: Add support for hw_info for iommu capability query
https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/T/#u
Changes from V2:
(https://lore.kernel.org/linux-iommu/20251001060954.5030-1-suravee.suthikulpanit@amd.com)
* Patch 9 (new)
* Patch 10: Update logic per Nicolin
* Patch 11:
- Do not change struct iommu_dev_data (remove parent, ndom)
- Add comment on domain ID allocation
- Move nested_domain_free() here
* Patch 12, 13, 14 (new)
* Patch 15:
- Introduce struct amd_iommu_viommu to store struct iommufd_viommu,
which is used to retrieve the parent domaina
- Clean up the nested_attach_device() to use the new
amd_iommu_set_dte_v1() instead of duplicating the logic.
Thanks,
Suravee
Suravee Suthikulpanit (15):
iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
iommu/amd: Make amd_iommu_pdom_id_free() non-static
iommu/amd: Make amd_iommu_device_flush_dte() non-static
iommu/amd: Make amd_iommu_update_dte256() non-static
iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
iommu/amd: Make amd_iommu_completion_wait() non-static
iommufd: Introduce data struct for AMD nested domain allocation
iommu/amd: Always enable GCR3TRPMode when supported.
iommu/amd: Add support for nest parent domain allocation
iommu/amd: Add support for nested domain allocation
iommu/amd: Validate guest DTE for nested translation
iommu/amd: Refactor persistent DTE bits programming into
amd_iommu_make_clear_dte()
iommu/amd: Refactor logic to program the host page table in DTE
iommu/amd: Add support for nested domain attach/detach
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 36 ++++++
drivers/iommu/amd/amd_iommu_types.h | 45 +++++--
drivers/iommu/amd/init.c | 3 +
drivers/iommu/amd/iommu.c | 165 ++++++++++++------------
drivers/iommu/amd/nested.c | 191 ++++++++++++++++++++++++++++
include/uapi/linux/iommufd.h | 11 ++
7 files changed, 361 insertions(+), 92 deletions(-)
create mode 100644 drivers/iommu/amd/nested.c
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-13 10:51 ` Sairaj Kodilkar
2025-10-09 23:57 ` [PATCH v3 02/15] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
` (13 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Also change the define to use GENMASK_ULL instead.
There is no functional change.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index a698a2e7ce2a..556f1df32d53 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -422,7 +422,7 @@
#define DTE_FLAG_IOTLB BIT_ULL(32)
#define DTE_FLAG_MASK (0x3ffULL << 32)
-#define DEV_DOMID_MASK 0xffffULL
+#define DTE_DOMID_MASK GENMASK_ULL(15, 0)
#define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
#define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b57a6993179d..a9b17d31a969 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2094,7 +2094,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
if (dev_data->ats_enabled)
new.data[1] |= DTE_FLAG_IOTLB;
- old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+ old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 02/15] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 03/15] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
` (12 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 9b4b589a54b5..6ea549816a1f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -26,6 +26,7 @@ void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
+int amd_iommu_pdom_id_alloc(void);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a9b17d31a969..78b3e5485006 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1818,7 +1818,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
*
****************************************************************************/
-static int pdom_id_alloc(void)
+int amd_iommu_pdom_id_alloc(void)
{
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
@@ -1906,7 +1906,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
return -EBUSY;
/* Allocate per device domain ID */
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0)
return -ENOSPC;
gcr3_info->domid = domid;
@@ -2489,7 +2489,7 @@ struct protection_domain *protection_domain_alloc(void)
if (!domain)
return NULL;
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0) {
kfree(domain);
return NULL;
@@ -2681,7 +2681,7 @@ void amd_iommu_init_identity_domain(void)
domain->ops = &identity_domain_ops;
domain->owner = &amd_iommu_ops;
- identity_domain.id = pdom_id_alloc();
+ identity_domain.id = amd_iommu_pdom_id_alloc();
protection_domain_init(&identity_domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 03/15] iommu/amd: Make amd_iommu_pdom_id_free() non-static
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 02/15] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 04/15] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
` (11 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 10 +++++-----
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 6ea549816a1f..322c8c73444a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -27,6 +27,7 @@ void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
int amd_iommu_pdom_id_alloc(void);
+void amd_iommu_pdom_id_free(int id);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 78b3e5485006..0b61059e485d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1823,7 +1823,7 @@ int amd_iommu_pdom_id_alloc(void)
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
-static void pdom_id_free(int id)
+void amd_iommu_pdom_id_free(int id)
{
ida_free(&pdom_ids, id);
}
@@ -1870,7 +1870,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
gcr3_info->glx = 0;
/* Free per device domain ID */
- pdom_id_free(gcr3_info->domid);
+ amd_iommu_pdom_id_free(gcr3_info->domid);
iommu_free_pages(gcr3_info->gcr3_tbl);
gcr3_info->gcr3_tbl = NULL;
@@ -1913,7 +1913,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
gcr3_info->gcr3_tbl = iommu_alloc_pages_node_sz(nid, GFP_ATOMIC, SZ_4K);
if (gcr3_info->gcr3_tbl == NULL) {
- pdom_id_free(domid);
+ amd_iommu_pdom_id_free(domid);
return -ENOMEM;
}
@@ -2573,7 +2573,7 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
domain->pd_mode = pgtable;
ret = pdom_setup_pgtable(domain, dev);
if (ret) {
- pdom_id_free(domain->id);
+ amd_iommu_pdom_id_free(domain->id);
kfree(domain);
return ERR_PTR(ret);
}
@@ -2631,7 +2631,7 @@ void amd_iommu_domain_free(struct iommu_domain *dom)
WARN_ON(!list_empty(&domain->dev_list));
if (domain->domain.type & __IOMMU_DOMAIN_PAGING)
free_io_pgtable_ops(&domain->iop.pgtbl.ops);
- pdom_id_free(domain->id);
+ amd_iommu_pdom_id_free(domain->id);
kfree(domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 04/15] iommu/amd: Make amd_iommu_device_flush_dte() non-static
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (2 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 03/15] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 05/15] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
` (10 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 3 +++
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 322c8c73444a..079fb1d44c00 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -188,4 +188,7 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+/* DTE */
+int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
+
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0b61059e485d..fad74d2bc1b1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1562,7 +1562,7 @@ static int device_flush_dte_alias(struct pci_dev *pdev, u16 alias, void *data)
/*
* Command send function for invalidating a device table entry
*/
-static int device_flush_dte(struct iommu_dev_data *dev_data)
+int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data)
{
struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
struct pci_dev *pdev = NULL;
@@ -1788,7 +1788,7 @@ void amd_iommu_update_and_flush_device_table(struct protection_domain *domain)
}
list_for_each_entry(dev_data, &domain->dev_list, list)
- device_flush_dte(dev_data);
+ amd_iommu_device_flush_dte(dev_data);
domain_flush_complete(domain);
}
@@ -2144,7 +2144,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
clear_dte_entry(iommu, dev_data);
clone_aliases(iommu, dev_data->dev);
- device_flush_dte(dev_data);
+ amd_iommu_device_flush_dte(dev_data);
iommu_completion_wait(iommu);
}
@@ -2874,7 +2874,7 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
spin_unlock(&dev_data->dte_lock);
/* Flush device DTE */
- device_flush_dte(dev_data);
+ amd_iommu_device_flush_dte(dev_data);
domain_flush = true;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 05/15] iommu/amd: Make amd_iommu_update_dte256() non-static
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (3 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 04/15] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 06/15] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
` (9 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 3 +++
drivers/iommu/amd/iommu.c | 11 ++++++-----
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 079fb1d44c00..eb46e8914eaf 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,5 +190,8 @@ struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
/* DTE */
int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
+void amd_iommu_update_dte256(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index fad74d2bc1b1..3f2c61509b60 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -132,8 +132,9 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
* This function is used only by code, which updates DMA translation part of the DTE.
* So, only consider control bits related to DMA when updating the entry.
*/
-static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
- struct dev_table_entry *new)
+void amd_iommu_update_dte256(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
{
unsigned long flags;
struct dev_table_entry *dev_table = get_dev_table(iommu);
@@ -413,7 +414,7 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
ret = -EINVAL;
goto out;
}
- update_dte256(iommu, alias_data, &new);
+ amd_iommu_update_dte256(iommu, alias_data, &new);
amd_iommu_set_rlookup_table(iommu, alias);
out:
@@ -2109,7 +2110,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
set_dte_gcr3_table(iommu, dev_data, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
@@ -2130,7 +2131,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
make_clear_dte(dev_data, dte, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte256(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 06/15] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (4 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 05/15] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 07/15] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
` (8 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Also, remove unused function parameter ptr.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 7 +++++++
drivers/iommu/amd/iommu.c | 13 ++-----------
2 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index eb46e8914eaf..c7cb4a80d44a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -193,5 +193,12 @@ int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
void amd_iommu_update_dte256(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *new);
+static inline void
+amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
+{
+ /* All existing DTE must have V bit set */
+ new->data128[0] = DTE_FLAG_V;
+ new->data128[1] = 0;
+}
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3f2c61509b60..386ac96b2c02 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2008,14 +2008,6 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
-static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
- struct dev_table_entry *new)
-{
- /* All existing DTE must have V bit set */
- new->data128[0] = DTE_FLAG_V;
- new->data128[1] = 0;
-}
-
/*
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
@@ -2068,7 +2060,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
else
domid = domain->id;
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
if (domain->iop.mode != PAGE_MODE_NONE)
new.data[0] |= iommu_virt_to_phys(domain->iop.root);
@@ -2128,9 +2120,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
struct dev_table_entry new = {};
- struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
amd_iommu_update_dte256(iommu, dev_data, &new);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 07/15] iommu/amd: Make amd_iommu_completion_wait() non-static
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (5 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 06/15] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 08/15] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
` (7 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new iommufd.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 24 ++++++++++++------------
2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index c7cb4a80d44a..d533bb8851ea 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -187,6 +187,7 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
u64 *root, int mode);
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+int amd_iommu_completion_wait(struct amd_iommu *iommu);
/* DTE */
int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 386ac96b2c02..e0bfcda678a8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1392,7 +1392,7 @@ static int iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd)
* This function queues a completion wait command into the command
* buffer of an IOMMU
*/
-static int iommu_completion_wait(struct amd_iommu *iommu)
+int amd_iommu_completion_wait(struct amd_iommu *iommu)
{
struct iommu_cmd cmd;
unsigned long flags;
@@ -1431,7 +1431,7 @@ static void domain_flush_complete(struct protection_domain *domain)
* We need to wait for completion of all commands.
*/
xa_for_each(&domain->iommu_array, i, pdom_iommu_info)
- iommu_completion_wait(pdom_iommu_info->iommu);
+ amd_iommu_completion_wait(pdom_iommu_info->iommu);
}
static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
@@ -1449,7 +1449,7 @@ static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
ret = iommu_flush_dte(iommu, devid);
if (!ret)
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
@@ -1460,7 +1460,7 @@ static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
for (devid = 0; devid <= last_bdf; ++devid)
iommu_flush_dte(iommu, devid);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
/*
@@ -1479,7 +1479,7 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
iommu_queue_command(iommu, &cmd);
}
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
@@ -1490,7 +1490,7 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
dom_id, IOMMU_NO_PASID, false);
iommu_queue_command(iommu, &cmd);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void amd_iommu_flush_all(struct amd_iommu *iommu)
@@ -1500,7 +1500,7 @@ static void amd_iommu_flush_all(struct amd_iommu *iommu)
build_inv_all(&cmd);
iommu_queue_command(iommu, &cmd);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void iommu_flush_irt(struct amd_iommu *iommu, u16 devid)
@@ -1523,7 +1523,7 @@ static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
for (devid = 0; devid <= last_bdf; devid++)
iommu_flush_irt(iommu, devid);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
void amd_iommu_flush_all_caches(struct amd_iommu *iommu)
@@ -1748,7 +1748,7 @@ void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data,
if (dev_data->ats_enabled)
device_flush_iotlb(dev_data, address, size, pasid, true);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
static void dev_flush_pasid_all(struct iommu_dev_data *dev_data,
@@ -2137,7 +2137,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
clone_aliases(iommu, dev_data->dev);
amd_iommu_device_flush_dte(dev_data);
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
}
/*
@@ -2421,7 +2421,7 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
out_err:
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
if (FEATURE_NUM_INT_REMAP_SUP_2K(amd_iommu_efr2))
dev_data->max_irqs = MAX_IRQS_PER_TABLE_2K;
@@ -3255,7 +3255,7 @@ static struct irq_remap_table *alloc_irq_table(struct amd_iommu *iommu,
set_remap_table_entry(iommu, alias, table);
out_wait:
- iommu_completion_wait(iommu);
+ amd_iommu_completion_wait(iommu);
out_unlock:
spin_unlock_irqrestore(&iommu_table_lock, flags);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 08/15] iommufd: Introduce data struct for AMD nested domain allocation
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (6 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 07/15] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 09/15] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
` (6 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
include/uapi/linux/iommufd.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index efb52709c0a2..d111ee1dc572 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
__aligned_le64 ste[2];
};
+/**
+ * struct iommu_hwpt_amd_guest - AMD IOMMU guest I/O page table data
+ * (IOMMU_HWPT_DATA_AMD_GUEST)
+ * @dte: Guest Device Table Entry (DTE)
+ */
+struct iommu_hwpt_amd_guest {
+ __aligned_u64 dte[4];
+};
+
/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
* @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
+ * @IOMMU_HWPT_DATA_AMD_GUEST: AMD IOMMU guest page table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE = 0,
IOMMU_HWPT_DATA_VTD_S1 = 1,
IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
+ IOMMU_HWPT_DATA_AMD_GUEST = 3,
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 09/15] iommu/amd: Always enable GCR3TRPMode when supported.
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (7 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 08/15] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 22:37 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 10/15] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
` (5 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
with GPA (instead of SPA). This simplifies the implementation, and is
a pre-requisite for nested translation support.
Therefore, always enable this feature if available.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/init.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 556f1df32d53..9226edd8af69 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -185,6 +185,7 @@
#define CONTROL_EPH_EN 45
#define CONTROL_XT_EN 50
#define CONTROL_INTCAPXT_EN 51
+#define CONTROL_GCR3TRPMODE 58
#define CONTROL_IRTCACHEDIS 59
#define CONTROL_SNPAVIC_EN 61
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f2991c11867c..c45a4bd89569 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1122,6 +1122,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
return;
iommu_feature_enable(iommu, CONTROL_GT_EN);
+
+ if (check_feature2(FEATURE_GCR3TRPMODE))
+ iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
}
/* sets a specific bit in the device table entry. */
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 10/15] iommu/amd: Add support for nest parent domain allocation
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (8 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 09/15] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 22:38 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 11/15] iommu/amd: Add support for nested " Suravee Suthikulpanit
` (4 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To support nested translation, the nest parent domain is allocated with
IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
table for stage 2 (i.e. GPA->SPA).
Also, only support nest parent domain on AMD system, which can support
the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
order to program DTE[GCR3 Table Root Pointer] with the GPA.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/iommu.c | 26 +++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 9226edd8af69..c34604cf1811 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -107,6 +107,7 @@
/* Extended Feature 2 Bits */
+#define FEATURE_GCR3TRPMODE BIT_ULL(3)
#define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
#define FEATURE_SNPAVICSUP_GAM(x) \
(FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e0bfcda678a8..e489e360bb77 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2584,6 +2584,14 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
return &domain->domain;
}
+static inline bool is_nest_parent_supported(u32 flags)
+{
+ /* Only allow nest parent when these features are supported */
+ return check_feature(FEATURE_GT) &&
+ check_feature(FEATURE_GIOSUP) &&
+ check_feature2(FEATURE_GCR3TRPMODE);
+}
+
static struct iommu_domain *
amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
@@ -2591,16 +2599,28 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
{
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
- IOMMU_HWPT_ALLOC_PASID;
+ IOMMU_HWPT_ALLOC_PASID |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
if ((flags & ~supported_flags) || user_data)
return ERR_PTR(-EOPNOTSUPP);
switch (flags & supported_flags) {
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
- /* Allocate domain with v1 page table for dirty tracking */
- if (!amd_iommu_hd_support(iommu))
+ case IOMMU_HWPT_ALLOC_NEST_PARENT:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+ /*
+ * Allocate domain with v1 page table for dirty tracking
+ * and/or Nest parent.
+ */
+ if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+ !amd_iommu_hd_support(iommu))
+ break;
+
+ if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+ !is_nest_parent_supported(flags))
break;
+
return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
case IOMMU_HWPT_ALLOC_PASID:
/* Allocate domain with v2 page table if IOMMU supports PASID. */
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 11/15] iommu/amd: Add support for nested domain allocation
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (9 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 10/15] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 22:54 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 12/15] iommu/amd: Validate guest DTE for nested translation Suravee Suthikulpanit
` (3 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest page tables. The struct iommu_hwpt_amd_guest
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 4 ++
drivers/iommu/amd/amd_iommu_types.h | 31 +++++++----
drivers/iommu/amd/nested.c | 84 +++++++++++++++++++++++++++++
4 files changed, 110 insertions(+), 11 deletions(-)
create mode 100644 drivers/iommu/amd/nested.c
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 5ae46d99a45b..afa12ca2110e 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o
-obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o
+obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o nested.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index d533bb8851ea..3730d8bbe6dc 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -202,4 +202,8 @@ amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry
new->data128[1] = 0;
}
+/* NESTED */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index c34604cf1811..9374e6f7a19d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -20,6 +20,8 @@
#include <linux/irqreturn.h>
#include <linux/io-pgtable.h>
+#include <uapi/linux/iommufd.h>
+
/*
* Maximum number of IOMMUs supported
*/
@@ -586,6 +588,25 @@ struct pdom_iommu_info {
u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */
};
+/*
+ * Structure defining one entry in the device table
+ */
+struct dev_table_entry {
+ union {
+ u64 data[4];
+ u128 data128[2];
+ };
+};
+
+/*
+ * Nested domain is specifically used for nested translation
+ */
+struct nested_domain {
+ struct iommu_domain domain; /* generic domain handle used by iommu core code */
+ u16 id; /* the domain id written to the device table */
+ struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
@@ -895,16 +916,6 @@ extern struct list_head amd_iommu_pci_seg_list;
*/
extern struct list_head amd_iommu_list;
-/*
- * Structure defining one entry in the device table
- */
-struct dev_table_entry {
- union {
- u64 data[4];
- u128 data128[2];
- };
-};
-
/*
* Structure defining one entry in the command buffer
*/
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
new file mode 100644
index 000000000000..0ab5d65ec283
--- /dev/null
+++ b/drivers/iommu/amd/nested.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#define dev_fmt(fmt) "AMD-Vi: " fmt
+
+#include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
+
+#include "amd_iommu.h"
+
+static const struct iommu_domain_ops nested_domain_ops;
+
+static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct nested_domain, domain);
+}
+
+/*
+ * This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
+ * during the call to struct iommu_ops.viommu_init().
+ */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ int ret;
+ struct iommu_hwpt_amd_guest gdte;
+ struct nested_domain *ndom;
+
+ if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ret = iommu_copy_struct_from_user(&gdte, user_data,
+ IOMMU_HWPT_DATA_AMD_GUEST,
+ dte);
+ if (ret)
+ return ERR_PTR(ret);
+
+ ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
+ if (!ndom)
+ return ERR_PTR(-ENOMEM);
+
+ ndom->domain.ops = &nested_domain_ops;
+ ndom->domain.type = IOMMU_DOMAIN_NESTED;
+ memcpy(&ndom->gdte, &gdte, sizeof(gdte));
+
+ /*
+ * Normally, when a guest has multiple pass-through devices,
+ * the IOMMU driver setup DTEs with the same stage-2 table and
+ * use the same host domain ID (hDomId). In case of nested translation,
+ * if the guest setup different stage-1 tables with same PASID,
+ * IOMMU would use the same TLB tag. This will results in TLB
+ * aliasing issue.
+ *
+ * Workaround the issue by allocating per-device hDomID for nested
+ * domain (i.e. ndom->id). This require per-device IOMMU TLB invalidation
+ * with corresponded hDomId on the host side when updating stage-2 table.
+ */
+ ndom->id = amd_iommu_pdom_id_alloc();
+ if (ndom->id <= 0) {
+ ret = -ENOSPC;
+ goto out_err;
+ }
+
+ return &ndom->domain;
+out_err:
+ kfree(ndom);
+ return ERR_PTR(ret);
+}
+
+static void nested_domain_free(struct iommu_domain *dom)
+{
+ struct nested_domain *ndom = to_ndomain(dom);
+
+ amd_iommu_pdom_id_free(ndom->id);
+ kfree(ndom);
+}
+
+static const struct iommu_domain_ops nested_domain_ops = {
+ .free = nested_domain_free,
+};
+
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 12/15] iommu/amd: Validate guest DTE for nested translation
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (10 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 11/15] iommu/amd: Add support for nested " Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 22:55 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 13/15] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
` (2 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To make sure that configuration for host (v1) and guest (v2) page tables
are valid before allocate nested domain.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 ++
drivers/iommu/amd/nested.c | 41 +++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 9374e6f7a19d..a68b5c2fc0a2 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -418,6 +418,8 @@
#define DTE_FLAG_V BIT_ULL(0)
#define DTE_FLAG_TV BIT_ULL(1)
#define DTE_FLAG_HAD (3ULL << 7)
+#define DTE_MODE_MASK GENMASK_ULL(11, 9)
+#define DTE_HOST_TRP GENMASK_ULL(51, 12)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 0ab5d65ec283..3307c925d3c1 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -17,6 +17,43 @@ static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
return container_of(dom, struct nested_domain, domain);
}
+static int validate_gdte_nested(struct iommu_hwpt_amd_guest *gdte)
+{
+ u32 gpt_level = FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->dte[2]);
+
+ /* Must be zero: Mode, Host-TPR */
+ if (FIELD_GET(DTE_MODE_MASK, gdte->dte[0]) != 0 ||
+ FIELD_GET(DTE_HOST_TRP, gdte->dte[0]) != 0)
+ return -EINVAL;
+
+ /* Must be non-zero: V, GIOV, GV, GCR3 TRP */
+ if (FIELD_GET(DTE_FLAG_V, gdte->dte[0]) == 0 ||
+ FIELD_GET(DTE_FLAG_GIOV, gdte->dte[0]) == 0 ||
+ FIELD_GET(DTE_FLAG_GV, gdte->dte[0]) == 0 ||
+ (FIELD_GET(DTE_GCR3_14_12, gdte->dte[0]) == 0 &&
+ FIELD_GET(DTE_GCR3_30_15, gdte->dte[1]) == 0 &&
+ FIELD_GET(DTE_GCR3_51_31, gdte->dte[1]) == 0))
+ return -EINVAL;
+
+ /* Valid Guest Paging Mode values are 0 and 1 */
+ if (gpt_level != 0 && gpt_level != 1)
+ return -EINVAL;
+
+ /* GLX = 3 is reserved */
+ if (FIELD_GET(DTE_GLX, gdte->dte[0]) == 3)
+ return -EINVAL;
+
+ /*
+ * We need to check host capability before setting
+ * the Guest Paging Mode
+ */
+ if (gpt_level == GUEST_PGTABLE_5_LEVEL &&
+ amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
/*
* This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
* during the call to struct iommu_ops.viommu_init().
@@ -38,6 +75,10 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
if (ret)
return ERR_PTR(ret);
+ ret = validate_gdte_nested(&gdte);
+ if (ret)
+ return ERR_PTR(ret);
+
ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
if (!ndom)
return ERR_PTR(-ENOMEM);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 13/15] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (11 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 12/15] iommu/amd: Validate guest DTE for nested translation Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 22:56 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To help avoid duplicate logic when programing DTE for nested translation.
Note that this commit changes behavior of detached and blocking modes,
where DTE bit fields for interrupt pass-through (i.e. Lint0, Lint1, NMI,
INIT, ExtInt) and System management message could be affected.
These DTE bits are specified in the IVRS table for specific devices,
and should be persistent.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 13 +++++++++++++
drivers/iommu/amd/iommu.c | 11 -----------
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 3730d8bbe6dc..cfb63de7732a 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -197,9 +197,22 @@ void amd_iommu_update_dte256(struct amd_iommu *iommu,
static inline void
amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
{
+ struct dev_table_entry *initial_dte;
+ struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
+
/* All existing DTE must have V bit set */
new->data128[0] = DTE_FLAG_V;
new->data128[1] = 0;
+
+ /*
+ * Restore cached persistent DTE bits, which can be set by information
+ * in IVRS table. See set_dev_entry_from_acpi().
+ */
+ initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
+ if (initial_dte) {
+ new->data128[0] |= initial_dte->data128[0];
+ new->data128[1] |= initial_dte->data128[1];
+ }
}
/* NESTED */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e489e360bb77..ffb1adfd75c0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2049,7 +2049,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
{
u16 domid;
u32 old_domid;
- struct dev_table_entry *initial_dte;
struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
@@ -2090,16 +2089,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
- /*
- * Restore cached persistent DTE bits, which can be set by information
- * in IVRS table. See set_dev_entry_from_acpi().
- */
- initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
- if (initial_dte) {
- new.data128[0] |= initial_dte->data128[0];
- new.data128[1] |= initial_dte->data128[1];
- }
-
set_dte_gcr3_table(iommu, dev_data, &new);
amd_iommu_update_dte256(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (12 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 13/15] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 23:09 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce the set_dte_v1() helper function to configure IOMMU host (v1)
page table into DTE.
There is no functional change.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 54 +++++++++++++++++++++------------------
1 file changed, 29 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ffb1adfd75c0..2a536d02aeab 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2044,6 +2044,32 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
}
+static void set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain, u16 domid,
+ struct dev_table_entry *new)
+{
+ /*
+ * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
+ * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
+ * do_iommu_domain_alloc().
+ */
+ WARN_ON(amd_iommu_snp_en && (domid == 0));
+
+ if (domain->iop.mode != PAGE_MODE_NONE)
+ new->data[0] |= iommu_virt_to_phys(domain->iop.root);
+
+ new->data[0] |= FIELD_PREP(DTE_MODE_MASK, domain->iop.mode);
+ new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
+
+ if (domain->dirty_tracking)
+ new->data[0] |= DTE_FLAG_HAD;
+
+ if (dev_data->ats_enabled)
+ new->data[1] |= DTE_FLAG_IOTLB;
+
+ new->data[1] |= domid;
+}
+
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data)
{
@@ -2061,36 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
amd_iommu_make_clear_dte(dev_data, &new);
- if (domain->iop.mode != PAGE_MODE_NONE)
- new.data[0] |= iommu_virt_to_phys(domain->iop.root);
-
- new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
- << DEV_ENTRY_MODE_SHIFT;
-
- new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
+ old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- /*
- * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
- * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
- * do_iommu_domain_alloc().
- */
- WARN_ON(amd_iommu_snp_en && (domid == 0));
- new.data[0] |= DTE_FLAG_TV;
+ set_dte_v1(dev_data, domain, domid, &new);
+ set_dte_gcr3_table(iommu, dev_data, &new);
if (dev_data->ppr)
new.data[0] |= 1ULL << DEV_ENTRY_PPR;
- if (domain->dirty_tracking)
- new.data[0] |= DTE_FLAG_HAD;
-
- if (dev_data->ats_enabled)
- new.data[1] |= DTE_FLAG_IOTLB;
-
- old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- new.data[1] |= domid;
-
- set_dte_gcr3_table(iommu, dev_data, &new);
-
amd_iommu_update_dte256(iommu, dev_data, &new);
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (13 preceding siblings ...)
2025-10-09 23:57 ` [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2025-10-09 23:57 ` Suravee Suthikulpanit
2025-10-10 23:20 ` Jason Gunthorpe
14 siblings, 1 reply; 27+ messages in thread
From: Suravee Suthikulpanit @ 2025-10-09 23:57 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.
In addition, introduce struct amd_iommu_viommu, which stores reference to
the nest parent domain assigned during the call to struct
iommu_ops.viommu_init(). Information in the nest parent domain is needed
when setting up the DTE for nested translation.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 3 ++
drivers/iommu/amd/amd_iommu_types.h | 8 ++++
drivers/iommu/amd/iommu.c | 8 ++--
drivers/iommu/amd/nested.c | 66 +++++++++++++++++++++++++++++
4 files changed, 81 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index cfb63de7732a..98351b0cb9a0 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,6 +190,9 @@ struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
int amd_iommu_completion_wait(struct amd_iommu *iommu);
/* DTE */
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain, u16 domid,
+ struct dev_table_entry *new);
int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
void amd_iommu_update_dte256(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index a68b5c2fc0a2..683ee288c636 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -17,6 +17,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/pci.h>
+#include <linux/iommufd.h>
#include <linux/irqreturn.h>
#include <linux/io-pgtable.h>
@@ -420,6 +421,7 @@
#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_MODE_MASK GENMASK_ULL(11, 9)
#define DTE_HOST_TRP GENMASK_ULL(51, 12)
+#define DTE_FLAG_PPR BIT_ULL(52)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
@@ -590,6 +592,11 @@ struct pdom_iommu_info {
u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */
};
+struct amd_iommu_viommu {
+ struct iommufd_viommu core;
+ struct protection_domain *parent; /* nest parent domain for this viommu */
+};
+
/*
* Structure defining one entry in the device table
*/
@@ -607,6 +614,7 @@ struct nested_domain {
struct iommu_domain domain; /* generic domain handle used by iommu core code */
u16 id; /* the domain id written to the device table */
struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
+ struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2a536d02aeab..013914fc8a4f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2044,9 +2044,9 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
}
-static void set_dte_v1(struct iommu_dev_data *dev_data,
- struct protection_domain *domain, u16 domid,
- struct dev_table_entry *new)
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain, u16 domid,
+ struct dev_table_entry *new)
{
/*
* When SNP is enabled, we can only support TV=1 with non-zero domain ID.
@@ -2089,7 +2089,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- set_dte_v1(dev_data, domain, domid, &new);
+ amd_iommu_set_dte_v1(dev_data, domain, domid, &new);
set_dte_gcr3_table(iommu, dev_data, &new);
if (dev_data->ppr)
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 3307c925d3c1..ca3d3001c87f 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -63,6 +63,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
const struct iommu_user_data *user_data)
{
int ret;
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
struct iommu_hwpt_amd_guest gdte;
struct nested_domain *ndom;
@@ -85,6 +86,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
ndom->domain.ops = &nested_domain_ops;
ndom->domain.type = IOMMU_DOMAIN_NESTED;
+ ndom->viommu = aviommu;
memcpy(&ndom->gdte, &gdte, sizeof(gdte));
/*
@@ -111,6 +113,69 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
return ERR_PTR(ret);
}
+static void set_dte_nested(struct amd_iommu *iommu,
+ struct iommu_domain *dom,
+ struct iommu_dev_data *dev_data)
+{
+ struct protection_domain *parent;
+ struct dev_table_entry new = {0};
+ struct nested_domain *ndom = to_ndomain(dom);
+ struct iommu_hwpt_amd_guest *gdte = &ndom->gdte;
+
+ /*
+ * The nest parent domain is attached during the call to the
+ * struct iommu_ops.viommu_init(), which will be stored as part
+ * of the struct amd_iommu_viommu.parent.
+ */
+ if (WARN_ON(!ndom->viommu || !ndom->viommu->parent))
+ return;
+
+ parent = ndom->viommu->parent;
+ amd_iommu_make_clear_dte(dev_data, &new);
+
+ /*
+ * Use nested domain ID to program DTE.
+ * See amd_iommu_alloc_domain_nested().
+ */
+ amd_iommu_set_dte_v1(dev_data, parent, ndom->id, &new);
+
+ /* Guest PPR */
+ new.data[0] |= gdte->dte[0] & DTE_FLAG_PPR;
+
+ /* Guest translation stuff */
+ new.data[0] |= gdte->dte[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV);
+
+ /* GCR3 table */
+ new.data[0] |= gdte->dte[0] & DTE_GCR3_14_12;
+ new.data[1] |= gdte->dte[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31);
+
+ /* Guest paging mode */
+ new.data[2] |= gdte->dte[2] & DTE_GPT_LEVEL_MASK;
+
+ amd_iommu_update_dte256(iommu, dev_data, &new);
+}
+
+static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
+{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+ struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
+ int ret = 0;
+
+ if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
+ return -EINVAL;
+
+ mutex_lock(&dev_data->mutex);
+
+ /* Update device table entry */
+ set_dte_nested(iommu, dom, dev_data);
+ amd_iommu_device_flush_dte(dev_data);
+ amd_iommu_completion_wait(iommu);
+
+ mutex_unlock(&dev_data->mutex);
+
+ return ret;
+}
+
static void nested_domain_free(struct iommu_domain *dom)
{
struct nested_domain *ndom = to_ndomain(dom);
@@ -120,6 +185,7 @@ static void nested_domain_free(struct iommu_domain *dom)
}
static const struct iommu_domain_ops nested_domain_ops = {
+ .attach_dev = nested_attach_device,
.free = nested_domain_free,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v3 09/15] iommu/amd: Always enable GCR3TRPMode when supported.
2025-10-09 23:57 ` [PATCH v3 09/15] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2025-10-10 22:37 ` Jason Gunthorpe
0 siblings, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 22:37 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:49PM +0000, Suravee Suthikulpanit wrote:
> The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
> with GPA (instead of SPA). This simplifies the implementation, and is
> a pre-requisite for nested translation support.
>
> Therefore, always enable this feature if available.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 1 +
> drivers/iommu/amd/init.c | 3 +++
> 2 files changed, 4 insertions(+)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 10/15] iommu/amd: Add support for nest parent domain allocation
2025-10-09 23:57 ` [PATCH v3 10/15] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
@ 2025-10-10 22:38 ` Jason Gunthorpe
0 siblings, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 22:38 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:50PM +0000, Suravee Suthikulpanit wrote:
> To support nested translation, the nest parent domain is allocated with
> IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
> table for stage 2 (i.e. GPA->SPA).
>
> Also, only support nest parent domain on AMD system, which can support
> the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
> order to program DTE[GCR3 Table Root Pointer] with the GPA.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 1 +
> drivers/iommu/amd/iommu.c | 26 +++++++++++++++++++++++---
> 2 files changed, 24 insertions(+), 3 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 11/15] iommu/amd: Add support for nested domain allocation
2025-10-09 23:57 ` [PATCH v3 11/15] iommu/amd: Add support for nested " Suravee Suthikulpanit
@ 2025-10-10 22:54 ` Jason Gunthorpe
0 siblings, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 22:54 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:51PM +0000, Suravee Suthikulpanit wrote:
> +/*
> + * Structure defining one entry in the device table
> + */
> +struct dev_table_entry {
> + union {
There is no longer a reason to move this, so these hunks can be dropped?
> +/*
> + * This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
> + * during the call to struct iommu_ops.viommu_init().
> + */
> +struct iommu_domain *
> +amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> + const struct iommu_user_data *user_data)
> +{
> + int ret;
> + struct iommu_hwpt_amd_guest gdte;
> + struct nested_domain *ndom;
> +
> + if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + ret = iommu_copy_struct_from_user(&gdte, user_data,
> + IOMMU_HWPT_DATA_AMD_GUEST,
> + dte);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
> + if (!ndom)
> + return ERR_PTR(-ENOMEM);
> +
> + ndom->domain.ops = &nested_domain_ops;
> + ndom->domain.type = IOMMU_DOMAIN_NESTED;
> + memcpy(&ndom->gdte, &gdte, sizeof(gdte));
You can move the iommu_copy_struct_from_user() to here and avoid this
memcpy and stack allocation.
> + /*
> + * Normally, when a guest has multiple pass-through devices,
> + * the IOMMU driver setup DTEs with the same stage-2 table and
> + * use the same host domain ID (hDomId). In case of nested translation,
> + * if the guest setup different stage-1 tables with same PASID,
> + * IOMMU would use the same TLB tag. This will results in TLB
> + * aliasing issue.
Yes, but if the guest does this then the guest is required to use
different gDomID's.
> + *
> + * Workaround the issue by allocating per-device hDomID for nested
> + * domain (i.e. ndom->id). This require per-device IOMMU TLB invalidation
> + * with corresponded hDomId on the host side when updating stage-2 table.
> + */
This is still missing the point - we cannot work around this with
unique hDomID's because when you implement invalidation there is only
one input gDomID and you need to map it to exactly one hDomID to push
the PASID invalidation.
The only reason it is barely acceptable now is because there is no
invalidation support!
The comment should be:
The guest is assigning gDomIDs based on its own algorithm for managing
cache tags of (DomID, PASID). Within a single viommu the S2 is used
by all DTEs but we need to consistently map the gDomID to a single hDomID.
This should be done by using an xarray in the viommu to keep track of
the gDomID mapping. Since there is no invalidation support and no
viommu yet just always use a unique hDomId for now.
When the S2 is changed all the hDomIDs in the xarray need to have
invalidations pushed.
Other than that it looks OK
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 12/15] iommu/amd: Validate guest DTE for nested translation
2025-10-09 23:57 ` [PATCH v3 12/15] iommu/amd: Validate guest DTE for nested translation Suravee Suthikulpanit
@ 2025-10-10 22:55 ` Jason Gunthorpe
0 siblings, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 22:55 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:52PM +0000, Suravee Suthikulpanit wrote:
> To make sure that configuration for host (v1) and guest (v2) page tables
> are valid before allocate nested domain.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 2 ++
> drivers/iommu/amd/nested.c | 41 +++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
I would squash this into the prior patch, but looks OK. I did not try
to do a detailed check this is a complete/correct list of fields.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 13/15] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2025-10-09 23:57 ` [PATCH v3 13/15] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2025-10-10 22:56 ` Jason Gunthorpe
0 siblings, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 22:56 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:53PM +0000, Suravee Suthikulpanit wrote:
> To help avoid duplicate logic when programing DTE for nested translation.
>
> Note that this commit changes behavior of detached and blocking
> modes,
We don't actually have "detached", what this changes is a small time
window while the AMD driver is changing domains during attach, and the
blocking domain.
> where DTE bit fields for interrupt pass-through (i.e. Lint0, Lint1, NMI,
> INIT, ExtInt) and System management message could be affected.
> These DTE bits are specified in the IVRS table for specific devices,
> and should be persistent.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 13 +++++++++++++
> drivers/iommu/amd/iommu.c | 11 -----------
> 2 files changed, 13 insertions(+), 11 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE
2025-10-09 23:57 ` [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2025-10-10 23:09 ` Jason Gunthorpe
2025-10-21 1:26 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 23:09 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:54PM +0000, Suravee Suthikulpanit wrote:
> Introduce the set_dte_v1() helper function to configure IOMMU host (v1)
> page table into DTE.
>
> There is no functional change.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 54 +++++++++++++++++++++------------------
> 1 file changed, 29 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index ffb1adfd75c0..2a536d02aeab 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2044,6 +2044,32 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
> }
>
> +static void set_dte_v1(struct iommu_dev_data *dev_data,
> + struct protection_domain *domain, u16 domid,
> + struct dev_table_entry *new)
> +{
> + /*
> + * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
> + * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
> + * do_iommu_domain_alloc().
> + */
> + WARN_ON(amd_iommu_snp_en && (domid == 0));
> +
> + if (domain->iop.mode != PAGE_MODE_NONE)
> + new->data[0] |= iommu_virt_to_phys(domain->iop.root);
Use a FIELD_PREP here too
> + new->data[0] |= FIELD_PREP(DTE_MODE_MASK, domain->iop.mode);
> + new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> +
> + if (domain->dirty_tracking)
> + new->data[0] |= DTE_FLAG_HAD;
> +
> + if (dev_data->ats_enabled)
> + new->data[1] |= DTE_FLAG_IOTLB;
> +
> + new->data[1] |= domid;
new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
> @@ -2061,36 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
>
> amd_iommu_make_clear_dte(dev_data, &new);
>
> - if (domain->iop.mode != PAGE_MODE_NONE)
> - new.data[0] |= iommu_virt_to_phys(domain->iop.root);
> -
> - new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
> - << DEV_ENTRY_MODE_SHIFT;
> -
> - new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
>
> - /*
> - * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
> - * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
> - * do_iommu_domain_alloc().
> - */
> - WARN_ON(amd_iommu_snp_en && (domid == 0));
> - new.data[0] |= DTE_FLAG_TV;
> + set_dte_v1(dev_data, domain, domid, &new);
> + set_dte_gcr3_table(iommu, dev_data, &new);
This seems weird, I would expect this to be written:
if (gcr3_info && gcr3_info->gcr3_tbl)
set_dte_gcr3_table(iommu, dev_data, &new);
else
set_dte_v1(dev_data, domain, domid, &new);
It is nonsense to call both gcr3 and v1 in this function that does not
setup two stages.
So, I'd just put this code in both the v1 and gcr3 functions:
+ new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
+ if (dev_data->ats_enabled)
+ new->data[1] |= DTE_FLAG_IOTLB;
(does IR/IW apply to GCR3?)
And then WARN_ON(domain->iop.mode != PAGE_MODE_NONE) as it should be
illegal to call set_dte_v1() on a domain that is not a v1 domain.
But this is overall the right idea and direction.
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach
2025-10-09 23:57 ` [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-10-10 23:20 ` Jason Gunthorpe
2025-10-20 23:17 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2025-10-10 23:20 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Oct 09, 2025 at 11:57:55PM +0000, Suravee Suthikulpanit wrote:
> Introduce set_dte_nested() to program guest translation settings in
> the host DTE when attaches the nested domain to a device.
>
> In addition, introduce struct amd_iommu_viommu, which stores reference to
> the nest parent domain assigned during the call to struct
> iommu_ops.viommu_init(). Information in the nest parent domain is needed
> when setting up the DTE for nested translation.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 3 ++
> drivers/iommu/amd/amd_iommu_types.h | 8 ++++
> drivers/iommu/amd/iommu.c | 8 ++--
> drivers/iommu/amd/nested.c | 66 +++++++++++++++++++++++++++++
> 4 files changed, 81 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index cfb63de7732a..98351b0cb9a0 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -190,6 +190,9 @@ struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
> int amd_iommu_completion_wait(struct amd_iommu *iommu);
>
> /* DTE */
> +void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
> + struct protection_domain *domain, u16 domid,
> + struct dev_table_entry *new);
> int amd_iommu_device_flush_dte(struct iommu_dev_data *dev_data);
> void amd_iommu_update_dte256(struct amd_iommu *iommu,
> struct iommu_dev_data *dev_data,
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index a68b5c2fc0a2..683ee288c636 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -17,6 +17,7 @@
> #include <linux/list.h>
> #include <linux/spinlock.h>
> #include <linux/pci.h>
> +#include <linux/iommufd.h>
> #include <linux/irqreturn.h>
> #include <linux/io-pgtable.h>
>
> @@ -420,6 +421,7 @@
> #define DTE_FLAG_HAD (3ULL << 7)
> #define DTE_MODE_MASK GENMASK_ULL(11, 9)
> #define DTE_HOST_TRP GENMASK_ULL(51, 12)
> +#define DTE_FLAG_PPR BIT_ULL(52)
> #define DTE_FLAG_GIOV BIT_ULL(54)
> #define DTE_FLAG_GV BIT_ULL(55)
> #define DTE_GLX GENMASK_ULL(57, 56)
> @@ -590,6 +592,11 @@ struct pdom_iommu_info {
> u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */
> };
>
> +struct amd_iommu_viommu {
> + struct iommufd_viommu core;
> + struct protection_domain *parent; /* nest parent domain for this viommu */
> +};
This alone is not enough, the core code needs to allocate this
memory too. Make adding the viommu to be its own patch before adding
allocating the nested domain and move these hunks:
> @@ -607,6 +614,7 @@ struct nested_domain {
> struct iommu_domain domain; /* generic domain handle used by iommu core code */
> u16 id; /* the domain id written to the device table */
> struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
> + struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
Into the nested domain allocation patch.
> @@ -2044,9 +2044,9 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
> }
>
> -static void set_dte_v1(struct iommu_dev_data *dev_data,
> - struct protection_domain *domain, u16 domid,
> - struct dev_table_entry *new)
> +void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
> + struct protection_domain *domain, u16 domid,
> + struct dev_table_entry *new)
> {
Give the function the right name in the patch that adds it?
> diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
> index 3307c925d3c1..ca3d3001c87f 100644
> --- a/drivers/iommu/amd/nested.c
> +++ b/drivers/iommu/amd/nested.c
> @@ -63,6 +63,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> const struct iommu_user_data *user_data)
> {
> int ret;
> + struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> struct iommu_hwpt_amd_guest gdte;
> struct nested_domain *ndom;
>
> @@ -85,6 +86,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
>
> ndom->domain.ops = &nested_domain_ops;
> ndom->domain.type = IOMMU_DOMAIN_NESTED;
> + ndom->viommu = aviommu;
> memcpy(&ndom->gdte, &gdte, sizeof(gdte));
These hunks to the nesting domain allocation patch
> +static void set_dte_nested(struct amd_iommu *iommu,
> + struct iommu_domain *dom,
> + struct iommu_dev_data *dev_data)
> +{
> + struct protection_domain *parent;
> + struct dev_table_entry new = {0};
> + struct nested_domain *ndom = to_ndomain(dom);
> + struct iommu_hwpt_amd_guest *gdte = &ndom->gdte;
> +
> + /*
> + * The nest parent domain is attached during the call to the
> + * struct iommu_ops.viommu_init(), which will be stored as part
> + * of the struct amd_iommu_viommu.parent.
> + */
> + if (WARN_ON(!ndom->viommu || !ndom->viommu->parent))
> + return;
> +
> + parent = ndom->viommu->parent;
> + amd_iommu_make_clear_dte(dev_data, &new);
> +
> + /*
> + * Use nested domain ID to program DTE.
> + * See amd_iommu_alloc_domain_nested().
> + */
> + amd_iommu_set_dte_v1(dev_data, parent, ndom->id, &new);
> +
> + /* Guest PPR */
> + new.data[0] |= gdte->dte[0] & DTE_FLAG_PPR;
> +
> + /* Guest translation stuff */
> + new.data[0] |= gdte->dte[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV);
> +
> + /* GCR3 table */
> + new.data[0] |= gdte->dte[0] & DTE_GCR3_14_12;
> + new.data[1] |= gdte->dte[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31);
> +
> + /* Guest paging mode */
> + new.data[2] |= gdte->dte[2] & DTE_GPT_LEVEL_MASK;
> +
> + amd_iommu_update_dte256(iommu, dev_data, &new);
> +}
This looks good
> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> + int ret = 0;
> +
> + if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
> + return -EINVAL;
> +
> + mutex_lock(&dev_data->mutex);
> +
> + /* Update device table entry */
> + set_dte_nested(iommu, dom, dev_data);
> + amd_iommu_device_flush_dte(dev_data);
> + amd_iommu_completion_wait(iommu);
> +
> + mutex_unlock(&dev_data->mutex);
Where does the code record the ndom->id to push invalidates when the
S2 is changed? Seems like an important thing to be missing!
Shouldn't all this attach related stuff be in here too??
ret = pdom_attach_iommu(iommu, domain);
dev_data->domain = domain;
spin_lock_irqsave(&domain->lock, flags);
list_add(&dev_data->list, &domain->dev_list);
spin_unlock_irqrestore(&domain->lock, flags);
At a bare minimum if the series is going to stop here then it must
also do correct invalidation for any S2 changes.
Given that, I'd suggest to also fix the domain id's with the xarray so
you don't have to redo the invalidation logic.
Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2025-10-09 23:57 ` [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-10-13 10:51 ` Sairaj Kodilkar
2025-10-14 2:08 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 27+ messages in thread
From: Sairaj Kodilkar @ 2025-10-13 10:51 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 10/10/2025 5:27 AM, Suravee Suthikulpanit wrote:
> Also change the define to use GENMASK_ULL instead.
> There is no functional change.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 2 +-
> drivers/iommu/amd/iommu.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index a698a2e7ce2a..556f1df32d53 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -422,7 +422,7 @@
>
> #define DTE_FLAG_IOTLB BIT_ULL(32)
> #define DTE_FLAG_MASK (0x3ffULL << 32)
> -#define DEV_DOMID_MASK 0xffffULL
> +#define DTE_DOMID_MASK GENMASK_ULL(15, 0)
>
> #define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
> #define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index b57a6993179d..a9b17d31a969 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2094,7 +2094,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
> if (dev_data->ats_enabled)
> new.data[1] |= DTE_FLAG_IOTLB;
>
> - old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
> new.data[1] |= domid;
>
> /*
Hi suravee
Please update the replace the usage of DEV_DOMID_MASK in
init.c:__copy_device_table as well
Thanks
Sairaj
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2025-10-13 10:51 ` Sairaj Kodilkar
@ 2025-10-14 2:08 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 27+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-14 2:08 UTC (permalink / raw)
To: Sairaj Kodilkar, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 10/13/2025 5:51 AM, Sairaj Kodilkar wrote:
>
>
> On 10/10/2025 5:27 AM, Suravee Suthikulpanit wrote:
>> Also change the define to use GENMASK_ULL instead.
>> There is no functional change.
>>
>> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> drivers/iommu/amd/amd_iommu_types.h | 2 +-
>> drivers/iommu/amd/iommu.c | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/amd_iommu_types.h
>> b/drivers/iommu/amd/amd_iommu_types.h
>> index a698a2e7ce2a..556f1df32d53 100644
>> --- a/drivers/iommu/amd/amd_iommu_types.h
>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>> @@ -422,7 +422,7 @@
>> #define DTE_FLAG_IOTLB BIT_ULL(32)
>> #define DTE_FLAG_MASK (0x3ffULL << 32)
>> -#define DEV_DOMID_MASK 0xffffULL
>> +#define DTE_DOMID_MASK GENMASK_ULL(15, 0)
>> #define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
>> #define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index b57a6993179d..a9b17d31a969 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -2094,7 +2094,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
>> if (dev_data->ats_enabled)
>> new.data[1] |= DTE_FLAG_IOTLB;
>> - old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
>> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
>> new.data[1] |= domid;
>> /*
> Hi suravee
> Please update the replace the usage of DEV_DOMID_MASK in
> init.c:__copy_device_table as well
That logic should already be removed by the following commit in the
iommu next branch.
commit 38e5f33ee3596f37ee8d1e694073a17590904004
Author: Ashish Kalra <ashish.kalra@amd.com>
Date: Mon Aug 25 21:46:15 2025 +0000
iommu/amd: Reuse device table for kdump
https://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git/commit/?h=next&id=38e5f33ee3596f37ee8d1e694073a17590904004
Thanks,
Suravee
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach
2025-10-10 23:20 ` Jason Gunthorpe
@ 2025-10-20 23:17 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 27+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-20 23:17 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On 10/10/2025 6:20 PM, Jason Gunthorpe wrote:
> On Thu, Oct 09, 2025 at 11:57:55PM +0000, Suravee Suthikulpanit wrote:
>> Introduce set_dte_nested() to program guest translation settings in
>> the host DTE when attaches the nested domain to a device.
>> .....
>>
>> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev)
>> +{
>> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
>> + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
>> + int ret = 0;
>> +
>> + if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
>> + return -EINVAL;
>> +
>> + mutex_lock(&dev_data->mutex);
>> +
>> + /* Update device table entry */
>> + set_dte_nested(iommu, dom, dev_data);
>> + amd_iommu_device_flush_dte(dev_data);
>> + amd_iommu_completion_wait(iommu);
>> +
>> + mutex_unlock(&dev_data->mutex);
>
> Where does the code record the ndom->id to push invalidates when the
> S2 is changed? Seems like an important thing to be missing!
>
> Shouldn't all this attach related stuff be in here too??
>
> ret = pdom_attach_iommu(iommu, domain);
> dev_data->domain = domain;
>
> spin_lock_irqsave(&domain->lock, flags);
> list_add(&dev_data->list, &domain->dev_list);
> spin_unlock_irqrestore(&domain->lock, flags);
>
> At a bare minimum if the series is going to stop here then it must
> also do correct invalidation for any S2 changes.
>
> Given that, I'd suggest to also fix the domain id's with the xarray so
> you don't have to redo the invalidation logic.
>
> Jason
I am reworking this series to include S2 flushing, and will be sending
out v4.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE
2025-10-10 23:09 ` Jason Gunthorpe
@ 2025-10-21 1:26 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 27+ messages in thread
From: Suthikulpanit, Suravee @ 2025-10-21 1:26 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On 10/10/2025 6:09 PM, Jason Gunthorpe wrote:
> On Thu, Oct 09, 2025 at 11:57:54PM +0000, Suravee Suthikulpanit wrote:
>> ...
>> @@ -2061,36 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
>>
>> amd_iommu_make_clear_dte(dev_data, &new);
>>
>> - if (domain->iop.mode != PAGE_MODE_NONE)
>> - new.data[0] |= iommu_virt_to_phys(domain->iop.root);
>> -
>> - new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
>> - << DEV_ENTRY_MODE_SHIFT;
>> -
>> - new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
>> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
>>
>> - /*
>> - * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
>> - * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
>> - * do_iommu_domain_alloc().
>> - */
>> - WARN_ON(amd_iommu_snp_en && (domid == 0));
>> - new.data[0] |= DTE_FLAG_TV;
>> + set_dte_v1(dev_data, domain, domid, &new);
>> + set_dte_gcr3_table(iommu, dev_data, &new);
>
> This seems weird, I would expect this to be written:
>
> if (gcr3_info && gcr3_info->gcr3_tbl)
> set_dte_gcr3_table(iommu, dev_data, &new);
> else
> set_dte_v1(dev_data, domain, domid, &new);
>
> It is nonsense to call both gcr3 and v1 in this function that does not
> setup two stages.
>
> So, I'd just put this code in both the v1 and gcr3 functions:
>
> + new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> + if (dev_data->ats_enabled)
> + new->data[1] |= DTE_FLAG_IOTLB;
>
> (does IR/IW apply to GCR3?)
IR/IW apply to both host and GCR3 tables. I'll add comment to the V4.
> And then WARN_ON(domain->iop.mode != PAGE_MODE_NONE) as it should be
> illegal to call set_dte_v1() on a domain that is not a v1 domain.
I'll rework the logic in the set_dte_entry() as you suggested in V4.
Thanks,
Suravee
> But this is overall the right idea and direction.
>
> Jason
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-10-21 1:26 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 23:57 [PATCH v3 00/15] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 01/15] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2025-10-13 10:51 ` Sairaj Kodilkar
2025-10-14 2:08 ` Suthikulpanit, Suravee
2025-10-09 23:57 ` [PATCH v3 02/15] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 03/15] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 04/15] iommu/amd: Make amd_iommu_device_flush_dte() non-static Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 05/15] iommu/amd: Make amd_iommu_update_dte256() non-static Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 06/15] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 07/15] iommu/amd: Make amd_iommu_completion_wait() non-static Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 08/15] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
2025-10-09 23:57 ` [PATCH v3 09/15] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
2025-10-10 22:37 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 10/15] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
2025-10-10 22:38 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 11/15] iommu/amd: Add support for nested " Suravee Suthikulpanit
2025-10-10 22:54 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 12/15] iommu/amd: Validate guest DTE for nested translation Suravee Suthikulpanit
2025-10-10 22:55 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 13/15] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
2025-10-10 22:56 ` Jason Gunthorpe
2025-10-09 23:57 ` [PATCH v3 14/15] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
2025-10-10 23:09 ` Jason Gunthorpe
2025-10-21 1:26 ` Suthikulpanit, Suravee
2025-10-09 23:57 ` [PATCH v3 15/15] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-10-10 23:20 ` Jason Gunthorpe
2025-10-20 23:17 ` Suthikulpanit, Suravee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox