* [PATCH v5 01/14] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 02/14] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
` (13 subsequent siblings)
14 siblings, 0 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Also change the define to use GENMASK_ULL instead.
There is no functional change.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 4b4a37fad70e..bdd7d5f3c4a6 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -353,7 +353,7 @@
#define DTE_FLAG_IOTLB BIT_ULL(32)
#define DTE_FLAG_MASK (0x3ffULL << 32)
-#define DEV_DOMID_MASK 0xffffULL
+#define DTE_DOMID_MASK GENMASK_ULL(15, 0)
#define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
#define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3505a14a4d2c..02cbe82ffcf1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2081,7 +2081,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
if (dev_data->ats_enabled)
new.data[1] |= DTE_FLAG_IOTLB;
- old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+ old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH v5 02/14] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 01/14] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 03/14] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
` (12 subsequent siblings)
14 siblings, 0 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new nested.c file for nested translation.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 25044d28f28a..a9d724d43398 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -26,6 +26,7 @@ void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
+int amd_iommu_pdom_id_alloc(void);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 02cbe82ffcf1..b42738e9d8ab 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1787,7 +1787,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
*
****************************************************************************/
-static int pdom_id_alloc(void)
+int amd_iommu_pdom_id_alloc(void)
{
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
@@ -1875,7 +1875,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
return -EBUSY;
/* Allocate per device domain ID */
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0)
return -ENOSPC;
gcr3_info->domid = domid;
@@ -2478,7 +2478,7 @@ struct protection_domain *protection_domain_alloc(void)
if (!domain)
return NULL;
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0) {
kfree(domain);
return NULL;
@@ -2825,7 +2825,7 @@ void amd_iommu_init_identity_domain(void)
domain->ops = &identity_domain_ops;
domain->owner = &amd_iommu_ops;
- identity_domain.id = pdom_id_alloc();
+ identity_domain.id = amd_iommu_pdom_id_alloc();
protection_domain_init(&identity_domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH v5 03/14] iommu/amd: Make amd_iommu_pdom_id_free() non-static
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 01/14] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 02/14] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 04/14] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
` (11 subsequent siblings)
14 siblings, 0 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new nested.c file for nested translation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index a9d724d43398..2730fd0b55de 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -27,6 +27,7 @@ void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
int amd_iommu_pdom_id_alloc(void);
+void amd_iommu_pdom_id_free(int id);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b42738e9d8ab..81eb27aa30e1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1792,7 +1792,7 @@ int amd_iommu_pdom_id_alloc(void)
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
-static void pdom_id_free(int id)
+void amd_iommu_pdom_id_free(int id)
{
ida_free(&pdom_ids, id);
}
@@ -1839,7 +1839,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
gcr3_info->glx = 0;
/* Free per device domain ID */
- pdom_id_free(gcr3_info->domid);
+ amd_iommu_pdom_id_free(gcr3_info->domid);
iommu_free_pages(gcr3_info->gcr3_tbl);
gcr3_info->gcr3_tbl = NULL;
@@ -1882,7 +1882,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
gcr3_info->gcr3_tbl = iommu_alloc_pages_node_sz(nid, GFP_ATOMIC, SZ_4K);
if (gcr3_info->gcr3_tbl == NULL) {
- pdom_id_free(domid);
+ amd_iommu_pdom_id_free(domid);
return -ENOMEM;
}
@@ -2774,7 +2774,7 @@ void amd_iommu_domain_free(struct iommu_domain *dom)
WARN_ON(!list_empty(&domain->dev_list));
pt_iommu_deinit(&domain->iommu);
- pdom_id_free(domain->id);
+ amd_iommu_pdom_id_free(domain->id);
kfree(domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH v5 04/14] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (2 preceding siblings ...)
2025-11-12 18:24 ` [PATCH v5 03/14] iommu/amd: Make amd_iommu_pdom_id_free() non-static Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-18 23:44 ` Jason Gunthorpe
2025-11-12 18:24 ` [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
` (10 subsequent siblings)
14 siblings, 1 reply; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new nested.c file for nested translation.
Also, remove unused function parameter ptr.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 8 ++++++++
drivers/iommu/amd/iommu.c | 13 ++-----------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 2730fd0b55de..a6ffb1ece2f9 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -187,4 +187,12 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+static inline void
+amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
+{
+ /* All existing DTE must have V bit set */
+ new->data128[0] = DTE_FLAG_V;
+ new->data128[1] = 0;
+}
+
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 81eb27aa30e1..0be2c818504c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1976,14 +1976,6 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
-static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
- struct dev_table_entry *new)
-{
- /* All existing DTE must have V bit set */
- new->data128[0] = DTE_FLAG_V;
- new->data128[1] = 0;
-}
-
/*
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
@@ -2033,7 +2025,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
struct pt_iommu_amdv1_hw_info pt_info;
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
@@ -2114,9 +2106,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
struct dev_table_entry new = {};
- struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
update_dte256(iommu, dev_data, &new);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 04/14] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
2025-11-12 18:24 ` [PATCH v5 04/14] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-11-18 23:44 ` Jason Gunthorpe
0 siblings, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 23:44 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:24:56PM +0000, Suravee Suthikulpanit wrote:
> This will be reused in a new nested.c file for nested translation.
>
> Also, remove unused function parameter ptr.
>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 8 ++++++++
> drivers/iommu/amd/iommu.c | 13 ++-----------
> 2 files changed, 10 insertions(+), 11 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte()
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (3 preceding siblings ...)
2025-11-12 18:24 ` [PATCH v5 04/14] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-13 19:18 ` Nicolin Chen
2025-11-18 23:50 ` Jason Gunthorpe
2025-11-12 18:24 ` [PATCH v5 06/14] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
` (9 subsequent siblings)
14 siblings, 2 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Which includes DTE update, clone_aliases, DTE flush and completion-wait
commands to avoid code duplication when reuse to setup DTE for nested
translation.
Also, make amd_iommu_update_dte() non-static to reuse in
in a new nested.c file for nested translation.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 4 ++++
drivers/iommu/amd/iommu.c | 24 ++++++++++++++++++------
2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index a6ffb1ece2f9..3ad8b5e65a82 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -187,6 +187,10 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+void amd_iommu_update_dte(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new);
+
static inline void
amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
{
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0be2c818504c..a36426f096d2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -76,6 +76,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
phys_addr_t top_paddr, unsigned int top_level);
+static int device_flush_dte(struct iommu_dev_data *dev_data);
+
static void amd_iommu_change_top(struct pt_iommu *iommu_table,
phys_addr_t top_paddr, unsigned int top_level);
@@ -86,6 +88,10 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain);
static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable);
+static void clone_aliases(struct amd_iommu *iommu, struct device *dev);
+
+static int iommu_completion_wait(struct amd_iommu *iommu);
+
/****************************************************************************
*
* Helper functions
@@ -203,6 +209,16 @@ static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_da
spin_unlock_irqrestore(&dev_data->dte_lock, flags);
}
+void amd_iommu_update_dte(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
+{
+ update_dte256(iommu, dev_data, new);
+ clone_aliases(iommu, dev_data->dev);
+ device_flush_dte(dev_data);
+ iommu_completion_wait(iommu);
+}
+
static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
struct dev_table_entry *dte)
{
@@ -2088,7 +2104,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
set_dte_gcr3_table(iommu, dev_data, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
@@ -2108,7 +2124,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
struct dev_table_entry new = {};
amd_iommu_make_clear_dte(dev_data, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
@@ -2120,10 +2136,6 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
set_dte_entry(iommu, dev_data, 0, 0);
else
clear_dte_entry(iommu, dev_data);
-
- clone_aliases(iommu, dev_data->dev);
- device_flush_dte(dev_data);
- iommu_completion_wait(iommu);
}
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte()
2025-11-12 18:24 ` [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
@ 2025-11-13 19:18 ` Nicolin Chen
2026-01-15 9:20 ` Suthikulpanit, Suravee
2025-11-18 23:50 ` Jason Gunthorpe
1 sibling, 1 reply; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 19:18 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:24:57PM +0000, Suravee Suthikulpanit wrote:
> +void amd_iommu_update_dte(struct amd_iommu *iommu,
> + struct iommu_dev_data *dev_data,
> + struct dev_table_entry *new)
> +{
> + update_dte256(iommu, dev_data, new);
> + clone_aliases(iommu, dev_data->dev);
> + device_flush_dte(dev_data);
> + iommu_completion_wait(iommu);
> +}
> +
> static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
> struct dev_table_entry *dte)
> {
> @@ -2088,7 +2104,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
>
> set_dte_gcr3_table(iommu, dev_data, &new);
>
> - update_dte256(iommu, dev_data, &new);
> + amd_iommu_update_dte(iommu, dev_data, &new);
>
> /*
> * A kdump kernel might be replacing a domain ID that was copied from
> @@ -2108,7 +2124,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
> struct dev_table_entry new = {};
>
> amd_iommu_make_clear_dte(dev_data, &new);
> - update_dte256(iommu, dev_data, &new);
> + amd_iommu_update_dte(iommu, dev_data, &new);
> }
>
> /* Update and flush DTE for the given device */
> @@ -2120,10 +2136,6 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
> set_dte_entry(iommu, dev_data, 0, 0);
> else
> clear_dte_entry(iommu, dev_data);
I found these two are somewhat unnecessary.
set_dte_entry()
{
u32 old_domid;
make_clear_dte(dev_data, dte, &new);
....
amd_iommu_update_dte(iommu, dev_data, &new);
if (old_domid)
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
clear_dte_entry()
{
make_clear_dte(dev_data, dte, &new);
amd_iommu_update_dte(iommu, dev_data, &new);
}
And given that dev_update_dte() now are just calling these them
without any other thing to do. Why not just unwrap them:
dev_update_dte(struct iommu_dev_data *dev_data, bool set)
{
u32 old_domid = 0;
make_clear_dte(dev_data, dte, &new);
if (!set)
goto update_dte;
....
update_dte:
amd_iommu_update_dte(iommu, dev_data, &new);
if (old_domid)
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
?
Nicolin
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte()
2025-11-13 19:18 ` Nicolin Chen
@ 2026-01-15 9:20 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 39+ messages in thread
From: Suthikulpanit, Suravee @ 2026-01-15 9:20 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 11/14/2025 2:18 AM, Nicolin Chen wrote:
> On Wed, Nov 12, 2025 at 06:24:57PM +0000, Suravee Suthikulpanit wrote:
>> +void amd_iommu_update_dte(struct amd_iommu *iommu,
>> + struct iommu_dev_data *dev_data,
>> + struct dev_table_entry *new)
>> +{
>> + update_dte256(iommu, dev_data, new);
>> + clone_aliases(iommu, dev_data->dev);
>> + device_flush_dte(dev_data);
>> + iommu_completion_wait(iommu);
>> +}
>> +
>> static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
>> struct dev_table_entry *dte)
>> {
>> @@ -2088,7 +2104,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
>>
>> set_dte_gcr3_table(iommu, dev_data, &new);
>>
>> - update_dte256(iommu, dev_data, &new);
>> + amd_iommu_update_dte(iommu, dev_data, &new);
>>
>> /*
>> * A kdump kernel might be replacing a domain ID that was copied from
>> @@ -2108,7 +2124,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
>> struct dev_table_entry new = {};
>>
>> amd_iommu_make_clear_dte(dev_data, &new);
>> - update_dte256(iommu, dev_data, &new);
>> + amd_iommu_update_dte(iommu, dev_data, &new);
>> }
>>
>> /* Update and flush DTE for the given device */
>> @@ -2120,10 +2136,6 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
>> set_dte_entry(iommu, dev_data, 0, 0);
>> else
>> clear_dte_entry(iommu, dev_data);
>
> I found these two are somewhat unnecessary.
>
> set_dte_entry()
> {
> u32 old_domid;
>
> make_clear_dte(dev_data, dte, &new);
> ....
> amd_iommu_update_dte(iommu, dev_data, &new);
> if (old_domid)
> amd_iommu_flush_tlb_domid(iommu, old_domid);
> }
>
> clear_dte_entry()
> {
> make_clear_dte(dev_data, dte, &new);
> amd_iommu_update_dte(iommu, dev_data, &new);
> }
>
> And given that dev_update_dte() now are just calling these them
> without any other thing to do. Why not just unwrap them:
>
> dev_update_dte(struct iommu_dev_data *dev_data, bool set)
> {
> u32 old_domid = 0;
>
> make_clear_dte(dev_data, dte, &new);
> if (!set)
> goto update_dte;
> ....
> update_dte:
> amd_iommu_update_dte(iommu, dev_data, &new);
> if (old_domid)
> amd_iommu_flush_tlb_domid(iommu, old_domid);
> }
>
> ?
Currently, set_dte_entry() is called from multiple call-path. Therefore,
I would like to keep this way to simplify the code reuse. The
clear_dte_entry() is called only from one place, but I feel that having
a separate function is simpler to read.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte()
2025-11-12 18:24 ` [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
2025-11-13 19:18 ` Nicolin Chen
@ 2025-11-18 23:50 ` Jason Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-18 23:50 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:24:57PM +0000, Suravee Suthikulpanit wrote:
> Which includes DTE update, clone_aliases, DTE flush and completion-wait
> commands to avoid code duplication when reuse to setup DTE for nested
> translation.
>
> Also, make amd_iommu_update_dte() non-static to reuse in
> in a new nested.c file for nested translation.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 4 ++++
> drivers/iommu/amd/iommu.c | 24 ++++++++++++++++++------
> 2 files changed, 22 insertions(+), 6 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> +void amd_iommu_update_dte(struct amd_iommu *iommu,
> + struct iommu_dev_data *dev_data,
> + struct dev_table_entry *new)
> +{
> + update_dte256(iommu, dev_data, new);
> + clone_aliases(iommu, dev_data->dev);
> + device_flush_dte(dev_data);
> + iommu_completion_wait(iommu);
> +}
Something to think about later, but clone_aliases is not optimal
now.. Since we have a "struct dev_table_entry *new" right here we can
pass it directly into clone_aliases() which can avoid doing
get_dte256() entirely.
And I don't think we need setup_aliases() to call clone_alises() these
days as the core code is now always immediately attaching a domain of
some kind after probe_device which will write a known DTE anyhow.
Finally, it would be an improvement to this alias code if it worked
like arm and just kept an allocated array of the all the DTE table
indexes setup during device probe time and just ran over the list
instead of repeatedly using PCI functions and callbacks..
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 06/14] iommufd: Introduce data struct for AMD nested domain allocation
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (4 preceding siblings ...)
2025-11-12 18:24 ` [PATCH v5 05/14] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-12 18:24 ` [PATCH v5 07/14] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
` (8 subsequent siblings)
14 siblings, 0 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
include/uapi/linux/iommufd.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index efb52709c0a2..d111ee1dc572 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
__aligned_le64 ste[2];
};
+/**
+ * struct iommu_hwpt_amd_guest - AMD IOMMU guest I/O page table data
+ * (IOMMU_HWPT_DATA_AMD_GUEST)
+ * @dte: Guest Device Table Entry (DTE)
+ */
+struct iommu_hwpt_amd_guest {
+ __aligned_u64 dte[4];
+};
+
/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
* @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
+ * @IOMMU_HWPT_DATA_AMD_GUEST: AMD IOMMU guest page table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE = 0,
IOMMU_HWPT_DATA_VTD_S1 = 1,
IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
+ IOMMU_HWPT_DATA_AMD_GUEST = 3,
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH v5 07/14] iommu/amd: Always enable GCR3TRPMode when supported.
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (5 preceding siblings ...)
2025-11-12 18:24 ` [PATCH v5 06/14] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-11-12 18:24 ` Suravee Suthikulpanit
2025-11-13 19:19 ` Nicolin Chen
2025-11-12 18:25 ` [PATCH v5 08/14] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
` (7 subsequent siblings)
14 siblings, 1 reply; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:24 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
with GPA (instead of SPA). This simplifies the implementation, and is
a pre-requisite for nested translation support.
Therefore, always enable this feature if available.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 ++
drivers/iommu/amd/init.c | 8 ++++++++
2 files changed, 10 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index bdd7d5f3c4a6..d13a50fc209b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -107,6 +107,7 @@
/* Extended Feature 2 Bits */
+#define FEATURE_GCR3TRPMODE BIT_ULL(3)
#define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
#define FEATURE_SNPAVICSUP_GAM(x) \
(FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
@@ -185,6 +186,7 @@
#define CONTROL_EPH_EN 45
#define CONTROL_XT_EN 50
#define CONTROL_INTCAPXT_EN 51
+#define CONTROL_GCR3TRPMODE 58
#define CONTROL_IRTCACHEDIS 59
#define CONTROL_SNPAVIC_EN 61
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f2991c11867c..a61111c2aabf 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1122,6 +1122,14 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
return;
iommu_feature_enable(iommu, CONTROL_GT_EN);
+
+ /*
+ * This feature needs to be enabled prior to a call
+ * to iommu_snp_enable(). Since this function is called
+ * in early_enable_iommu(), it is safe to enable here.
+ */
+ if (check_feature2(FEATURE_GCR3TRPMODE))
+ iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
}
/* sets a specific bit in the device table entry. */
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 07/14] iommu/amd: Always enable GCR3TRPMode when supported.
2025-11-12 18:24 ` [PATCH v5 07/14] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2025-11-13 19:19 ` Nicolin Chen
0 siblings, 0 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 19:19 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:24:59PM +0000, Suravee Suthikulpanit wrote:
> The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
> with GPA (instead of SPA). This simplifies the implementation, and is
> a pre-requisite for nested translation support.
>
> Therefore, always enable this feature if available.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 08/14] iommu/amd: Add support for nest parent domain allocation
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (6 preceding siblings ...)
2025-11-12 18:24 ` [PATCH v5 07/14] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-12 18:25 ` [PATCH v5 09/14] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
` (6 subsequent siblings)
14 siblings, 0 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To support nested translation, the nest parent domain is allocated with
IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
table for stage 2 (i.e. GPA->SPA).
Also, only support nest parent domain on AMD system, which can support
the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
order to program DTE[GCR3 Table Root Pointer] with the GPA.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a36426f096d2..e03aa8d946b3 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2727,6 +2727,14 @@ static struct iommu_domain *amd_iommu_domain_alloc_paging_v2(struct device *dev,
return &domain->domain;
}
+static inline bool is_nest_parent_supported(u32 flags)
+{
+ /* Only allow nest parent when these features are supported */
+ return check_feature(FEATURE_GT) &&
+ check_feature(FEATURE_GIOSUP) &&
+ check_feature2(FEATURE_GCR3TRPMODE);
+}
+
static struct iommu_domain *
amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
@@ -2734,16 +2742,28 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
{
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
- IOMMU_HWPT_ALLOC_PASID;
+ IOMMU_HWPT_ALLOC_PASID |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
if ((flags & ~supported_flags) || user_data)
return ERR_PTR(-EOPNOTSUPP);
switch (flags & supported_flags) {
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
- /* Allocate domain with v1 page table for dirty tracking */
- if (!amd_iommu_hd_support(iommu))
+ case IOMMU_HWPT_ALLOC_NEST_PARENT:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+ /*
+ * Allocate domain with v1 page table for dirty tracking
+ * and/or Nest parent.
+ */
+ if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+ !amd_iommu_hd_support(iommu))
+ break;
+
+ if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+ !is_nest_parent_supported(flags))
break;
+
return amd_iommu_domain_alloc_paging_v1(dev, flags);
case IOMMU_HWPT_ALLOC_PASID:
/* Allocate domain with v2 page table if IOMMU supports PASID. */
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH v5 09/14] iommu/amd: Introduce struct amd_iommu_viommu
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (7 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 08/14] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-13 19:21 ` Nicolin Chen
2025-11-12 18:25 ` [PATCH v5 10/14] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
` (5 subsequent siblings)
14 siblings, 1 reply; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Which stores reference to nested parent domain assigned during the call to
struct iommu_ops.viommu_init(). Information in the nest parent is needed
when setting up the nested translation.
Note that the viommu initialization will be introduced in subsequent
commit.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 6 ++++++
drivers/iommu/amd/iommu.c | 2 ++
drivers/iommu/amd/iommufd.c | 16 ++++++++++++++++
drivers/iommu/amd/iommufd.h | 5 +++++
4 files changed, 29 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index d13a50fc209b..446be08c88c8 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -17,6 +17,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/pci.h>
+#include <linux/iommufd.h>
#include <linux/irqreturn.h>
#include <linux/generic_pt/iommu.h>
@@ -490,6 +491,11 @@ struct pdom_iommu_info {
u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */
};
+struct amd_iommu_viommu {
+ struct iommufd_viommu core;
+ struct protection_domain *parent; /* nest parent domain for this viommu */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e03aa8d946b3..724b8723b836 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3078,6 +3078,8 @@ const struct iommu_ops amd_iommu_ops = {
.is_attach_deferred = amd_iommu_is_attach_deferred,
.def_domain_type = amd_iommu_def_domain_type,
.page_response = amd_iommu_page_response,
+ .get_viommu_size = amd_iommufd_get_viommu_size,
+ .viommu_init = amd_iommufd_viommu_init,
};
#ifdef CONFIG_IRQ_REMAP
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
index 72eaaa923d04..eb6119bdcf12 100644
--- a/drivers/iommu/amd/iommufd.c
+++ b/drivers/iommu/amd/iommufd.c
@@ -29,3 +29,19 @@ void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
return hwinfo;
}
+
+size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type)
+{
+ return VIOMMU_STRUCT_SIZE(struct amd_iommu_viommu, core);
+}
+
+int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
+ const struct iommu_user_data *user_data)
+{
+ struct protection_domain *pdom = to_pdomain(parent);
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+
+ aviommu->parent = pdom;
+
+ return 0;
+}
diff --git a/drivers/iommu/amd/iommufd.h b/drivers/iommu/amd/iommufd.h
index f880be80a30d..f05aad495b5b 100644
--- a/drivers/iommu/amd/iommufd.h
+++ b/drivers/iommu/amd/iommufd.h
@@ -8,8 +8,13 @@
#if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type);
+size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type);
+int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
+ const struct iommu_user_data *user_data);
#else
#define amd_iommufd_hw_info NULL
+#define amd_iommufd_viommu_init NULL
+#define amd_iommufd_get_viommu_size NULL
#endif /* CONFIG_AMD_IOMMU_IOMMUFD */
#endif /* AMD_IOMMUFD_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 09/14] iommu/amd: Introduce struct amd_iommu_viommu
2025-11-12 18:25 ` [PATCH v5 09/14] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
@ 2025-11-13 19:21 ` Nicolin Chen
0 siblings, 0 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 19:21 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:01PM +0000, Suravee Suthikulpanit wrote:
> Which stores reference to nested parent domain assigned during the call to
> struct iommu_ops.viommu_init(). Information in the nest parent is needed
> when setting up the nested translation.
>
> Note that the viommu initialization will be introduced in subsequent
> commit.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
I think I have given my Reviewed-by tag in v4:
https://lore.kernel.org/linux-iommu/aPmUD3U764IPo%2Fet@Asurada-Nvidia/
Anyway,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 10/14] iommu/amd: Add support for nested domain allocation
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (8 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 09/14] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-12 18:25 ` [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
` (4 subsequent siblings)
14 siblings, 0 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest page tables. The struct iommu_hwpt_amd_guest
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 4 +
drivers/iommu/amd/amd_iommu_types.h | 14 ++++
drivers/iommu/amd/nested.c | 110 ++++++++++++++++++++++++++++
4 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/amd/nested.c
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 41f053b49dce..94b8ef2acb18 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += iommu.o init.o quirks.o ppr.o pasid.o
-obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o
+obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o nested.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 3ad8b5e65a82..57f9f4fb8a4b 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -199,4 +199,8 @@ amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry
new->data128[1] = 0;
}
+/* NESTED */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 446be08c88c8..e46f346fd6c5 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -21,6 +21,8 @@
#include <linux/irqreturn.h>
#include <linux/generic_pt/iommu.h>
+#include <uapi/linux/iommufd.h>
+
/*
* Maximum number of IOMMUs supported
*/
@@ -348,6 +350,8 @@
#define DTE_FLAG_V BIT_ULL(0)
#define DTE_FLAG_TV BIT_ULL(1)
#define DTE_FLAG_HAD (3ULL << 7)
+#define DTE_MODE_MASK GENMASK_ULL(11, 9)
+#define DTE_HOST_TRP GENMASK_ULL(51, 12)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
@@ -496,6 +500,16 @@ struct amd_iommu_viommu {
struct protection_domain *parent; /* nest parent domain for this viommu */
};
+/*
+ * Nested domain is specifically used for nested translation
+ */
+struct nested_domain {
+ struct iommu_domain domain; /* generic domain handle used by iommu core code */
+ u16 gdom_id; /* domain ID from gDTE */
+ struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
+ struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
new file mode 100644
index 000000000000..dd3e53dd16ea
--- /dev/null
+++ b/drivers/iommu/amd/nested.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#define dev_fmt(fmt) "AMD-Vi: " fmt
+
+#include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
+
+#include "amd_iommu.h"
+
+static const struct iommu_domain_ops nested_domain_ops;
+
+static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct nested_domain, domain);
+}
+
+/*
+ * Validate guest DTE to make sure that configuration for host (v1)
+ * and guest (v2) page tables are valid when allocating nested domain.
+ */
+static int validate_gdte_nested(struct iommu_hwpt_amd_guest *gdte)
+{
+ u32 gpt_level = FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->dte[2]);
+
+ /* Must be zero: Mode, Host-TPR */
+ if (FIELD_GET(DTE_MODE_MASK, gdte->dte[0]) != 0 ||
+ FIELD_GET(DTE_HOST_TRP, gdte->dte[0]) != 0)
+ return -EINVAL;
+
+ /* Must be non-zero: V, GIOV, GV, GCR3 TRP */
+ if (FIELD_GET(DTE_FLAG_V, gdte->dte[0]) == 0 ||
+ FIELD_GET(DTE_FLAG_GV, gdte->dte[0]) == 0 ||
+ (FIELD_GET(DTE_GCR3_14_12, gdte->dte[0]) == 0 &&
+ FIELD_GET(DTE_GCR3_30_15, gdte->dte[1]) == 0 &&
+ FIELD_GET(DTE_GCR3_51_31, gdte->dte[1]) == 0))
+ return -EINVAL;
+
+ /* Valid Guest Paging Mode values are 0 and 1 */
+ if (gpt_level != GUEST_PGTABLE_4_LEVEL &&
+ gpt_level != GUEST_PGTABLE_5_LEVEL)
+ return -EINVAL;
+
+ /* GLX = 3 is reserved */
+ if (FIELD_GET(DTE_GLX, gdte->dte[0]) == 3)
+ return -EINVAL;
+
+ /*
+ * We need to check host capability before setting
+ * the Guest Paging Mode
+ */
+ if (gpt_level == GUEST_PGTABLE_5_LEVEL &&
+ amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+/*
+ * This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
+ * during the call to struct iommu_ops.viommu_init().
+ */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ int ret;
+ struct nested_domain *ndom;
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+
+ if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
+ if (!ndom)
+ return ERR_PTR(-ENOMEM);
+
+ ret = iommu_copy_struct_from_user(&ndom->gdte, user_data,
+ IOMMU_HWPT_DATA_AMD_GUEST,
+ dte);
+ if (ret)
+ goto out_err;
+
+ ret = validate_gdte_nested(&ndom->gdte);
+ if (ret)
+ goto out_err;
+
+ ndom->gdom_id = FIELD_GET(DTE_DOMID_MASK, ndom->gdte.dte[1]);
+ ndom->domain.ops = &nested_domain_ops;
+ ndom->domain.type = IOMMU_DOMAIN_NESTED;
+ ndom->viommu = aviommu;
+
+ return &ndom->domain;
+out_err:
+ kfree(ndom);
+ return ERR_PTR(ret);
+}
+
+static void nested_domain_free(struct iommu_domain *dom)
+{
+ struct nested_domain *ndom = to_ndomain(dom);
+
+ kfree(ndom);
+}
+
+static const struct iommu_domain_ops nested_domain_ops = {
+ .free = nested_domain_free,
+};
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (9 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 10/14] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-13 20:36 ` Nicolin Chen
2025-11-19 0:11 ` Jason Gunthorpe
2025-11-12 18:25 ` [PATCH v5 12/14] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
` (3 subsequent siblings)
14 siblings, 2 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Each nested domain is assigned guest domain ID (gDomID), which guest OS
programs into guest Device Table Entry (gDTE). For each gDomID, the driver
assigns a corresponding host domain ID (hDomID), which will be programmed
into the host Device Table Entry (hDTE).
The hDomID is allocated during amd_iommu_alloc_domain_nested(),
and free during nested_domain_free(). The gDomID-to-hDomID mapping info
(struct guest_domain_mapping_info) is stored in a per-viommu xarray
(struct amd_iommu_viommu.gdomid_array), which is indexed by gDomID.
Note also that parent domain can be shared among struct iommufd_viommu.
Therefore, when hypervisor invalidates the nest parent domain, the AMD
IOMMU command INVALIDATE_IOMMU_PAGES must be issued for each hDomID in
the gdomid_array. This is handled by the iommu_flush_pages_v1_hdom_ids(),
where it iterates through struct protection_domain.viommu_list.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 23 +++++++++
drivers/iommu/amd/iommu.c | 35 +++++++++++++
drivers/iommu/amd/iommufd.c | 34 ++++++++++++
drivers/iommu/amd/nested.c | 80 +++++++++++++++++++++++++++++
4 files changed, 172 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index e46f346fd6c5..734f6a753b3a 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -498,6 +498,22 @@ struct pdom_iommu_info {
struct amd_iommu_viommu {
struct iommufd_viommu core;
struct protection_domain *parent; /* nest parent domain for this viommu */
+ struct list_head pdom_list; /* For protection_domain->viommu_list */
+
+ /*
+ * Per-vIOMMU guest domain ID to host domain ID mapping.
+ * Indexed by guest domain ID.
+ */
+ struct xarray gdomid_array;
+};
+
+/*
+ * Contains guest domain ID mapping info,
+ * which is stored in the struct xarray gdomid_array.
+ */
+struct guest_domain_mapping_info {
+ refcount_t users;
+ u32 hdom_id; /* Host domain ID */
};
/*
@@ -506,6 +522,7 @@ struct amd_iommu_viommu {
struct nested_domain {
struct iommu_domain domain; /* generic domain handle used by iommu core code */
u16 gdom_id; /* domain ID from gDTE */
+ struct guest_domain_mapping_info *gdom_info;
struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
};
@@ -530,6 +547,12 @@ struct protection_domain {
struct mmu_notifier mn; /* mmu notifier for the SVA domain */
struct list_head dev_data_list; /* List of pdom_dev_data */
+
+ /*
+ * Store reference to list of vIOMMUs, which use this protection domain.
+ * This will be used to look up host domain ID when flushing this domain.
+ */
+ struct list_head viommu_list;
};
PT_IOMMU_CHECK_DOMAIN(struct protection_domain, iommu, domain);
PT_IOMMU_CHECK_DOMAIN(struct protection_domain, amdv1.iommu, domain);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 724b8723b836..6a26e7a28141 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1513,6 +1513,29 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
iommu_completion_wait(iommu);
}
+static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
+{
+ int ret = 0;
+ struct amd_iommu_viommu *aviommu;
+
+ list_for_each_entry(aviommu, &pdom->viommu_list, pdom_list) {
+ unsigned long i;
+ struct guest_domain_mapping_info *gdom_info;
+ struct amd_iommu *iommu = container_of(aviommu->core.iommu_dev, struct amd_iommu, iommu);
+
+ xa_for_each(&aviommu->gdomid_array, i, gdom_info) {
+ struct iommu_cmd cmd;
+
+ pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
+ iommu->devid, gdom_info->hdom_id);
+ build_inv_iommu_pages(&cmd, address, size, gdom_info->hdom_id,
+ IOMMU_NO_PASID, false);
+ ret |= iommu_queue_command(iommu, &cmd);
+ }
+ }
+ return ret;
+}
+
static void amd_iommu_flush_all(struct amd_iommu *iommu)
{
struct iommu_cmd cmd;
@@ -1661,6 +1684,17 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
ret |= iommu_queue_command(pdom_iommu_info->iommu, &cmd);
}
+ /*
+ * A domain w/ v1 table can be a nest parent, which can have
+ * multiple nested domains. Each nested domain has 1:1 mapping
+ * between gDomID and hDomID. Therefore, flush every hDomID
+ * associated to this nest parent domain.
+ *
+ * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
+ */
+ if (!list_empty(&pdom->viommu_list))
+ ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, size);
+
return ret;
}
@@ -2469,6 +2503,7 @@ static void protection_domain_init(struct protection_domain *domain)
spin_lock_init(&domain->lock);
INIT_LIST_HEAD(&domain->dev_list);
INIT_LIST_HEAD(&domain->dev_data_list);
+ INIT_LIST_HEAD(&domain->viommu_list);
xa_init(&domain->iommu_array);
}
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
index eb6119bdcf12..bb53475f9171 100644
--- a/drivers/iommu/amd/iommufd.c
+++ b/drivers/iommu/amd/iommufd.c
@@ -9,6 +9,8 @@
#include "amd_iommu.h"
#include "amd_iommu_types.h"
+static const struct iommufd_viommu_ops amd_viommu_ops;
+
void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
{
struct iommu_hw_info_amd *hwinfo;
@@ -38,10 +40,42 @@ size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type vi
int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
const struct iommu_user_data *user_data)
{
+ unsigned long flags;
struct protection_domain *pdom = to_pdomain(parent);
struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+ xa_init(&aviommu->gdomid_array);
aviommu->parent = pdom;
+ viommu->ops = &amd_viommu_ops;
+
+ spin_lock_irqsave(&pdom->lock, flags);
+ list_add(&aviommu->pdom_list, &pdom->viommu_list);
+ spin_unlock_irqrestore(&pdom->lock, flags);
+
return 0;
}
+
+static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
+{
+ unsigned long flags;
+ struct amd_iommu_viommu *entry, *next;
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+ struct protection_domain *pdom = aviommu->parent;
+
+ spin_lock_irqsave(&pdom->lock, flags);
+ list_for_each_entry_safe(entry, next, &pdom->viommu_list, pdom_list) {
+ if (entry == aviommu)
+ list_del(&entry->pdom_list);
+ }
+ spin_unlock_irqrestore(&pdom->lock, flags);
+
+}
+
+/*
+ * See include/linux/iommufd.h
+ * struct iommufd_viommu_ops - vIOMMU specific operations
+ */
+static const struct iommufd_viommu_ops amd_viommu_ops = {
+ .destroy = amd_iommufd_viommu_destroy,
+};
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index dd3e53dd16ea..1bbcb16abecc 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -6,6 +6,7 @@
#define dev_fmt(fmt) "AMD-Vi: " fmt
#include <linux/iommu.h>
+#include <linux/refcount.h>
#include <uapi/linux/iommufd.h>
#include "amd_iommu.h"
@@ -68,6 +69,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
{
int ret;
struct nested_domain *ndom;
+ struct guest_domain_mapping_info *gdom_info, *curr;
struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
@@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
ndom->domain.type = IOMMU_DOMAIN_NESTED;
ndom->viommu = aviommu;
+ gdom_info = kzalloc(sizeof(*gdom_info), GFP_KERNEL);
+ if (!gdom_info)
+ goto out_err;
+
+ /*
+ * Normally, when a guest has multiple pass-through devices,
+ * the IOMMU driver setup DTEs with the same stage-2 table and
+ * use the same host domain ID (hDomId). In case of nested translation,
+ * if the guest setup different stage-1 tables with same PASID,
+ * IOMMU would use the same TLB tag. This will results in TLB
+ * aliasing issue.
+ *
+ * The guest is assigning gDomIDs based on its own algorithm for managing
+ * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
+ * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
+ * to a single hDomID. This is done using an xarray in the vIOMMU to
+ * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
+ * command must be issued for each hDomID in the xarray.
+ */
+ curr = xa_cmpxchg(&aviommu->gdomid_array,
+ ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
+ if (curr) {
+ if (xa_err(curr)) {
+ ret = -EINVAL;
+ goto out_err_gdom_info;
+ } else {
+ /* The gDomID already exist */
+ pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, curr->hdom_id);
+ refcount_inc(&curr->users);
+ ndom->gdom_info = curr;
+ kfree(gdom_info);
+ return &ndom->domain;
+ }
+ }
+
+ /* The gDomID does not exist. We allocate new hdom_id */
+ gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
+ if (gdom_info->hdom_id <= 0) {
+ xa_cmpxchg(&aviommu->gdomid_array,
+ ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
+ ret = -ENOSPC;
+ goto out_err_gdom_info;
+ }
+
+ refcount_set(&gdom_info->users, 1);
+ ndom->gdom_info = gdom_info;
+ pr_debug("%s: Allocate gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, gdom_info->hdom_id);
+
return &ndom->domain;
+
+out_err_gdom_info:
+ kfree(gdom_info);
out_err:
kfree(ndom);
return ERR_PTR(ret);
@@ -100,8 +155,33 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
static void nested_domain_free(struct iommu_domain *dom)
{
+ struct guest_domain_mapping_info *curr;
struct nested_domain *ndom = to_ndomain(dom);
+ struct amd_iommu_viommu *aviommu = ndom->viommu;
+
+ if (!refcount_dec_and_test(&ndom->gdom_info->users))
+ return;
+ /*
+ * The refcount for the gdom_id to hdom_id mapping is zero.
+ * It is now safe to remove the mapping.
+ */
+ curr = xa_cmpxchg(&aviommu->gdomid_array, ndom->gdom_id,
+ ndom->gdom_info, NULL, GFP_ATOMIC);
+ if (curr) {
+ if (xa_err(curr)) {
+ pr_err("%s: Failed to free nested domain gdom_id=%#x\n",
+ __func__, ndom->gdom_id);
+ return;
+ }
+
+ /* success */
+ pr_debug("%s: Free gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, curr->hdom_id);
+ kfree(curr);
+ }
+
+ amd_iommu_pdom_id_free(ndom->gdom_info->hdom_id);
kfree(ndom);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-12 18:25 ` [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
@ 2025-11-13 20:36 ` Nicolin Chen
2025-11-19 0:02 ` Jason Gunthorpe
2026-01-15 9:21 ` Suthikulpanit, Suravee
2025-11-19 0:11 ` Jason Gunthorpe
1 sibling, 2 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 20:36 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:03PM +0000, Suravee Suthikulpanit wrote:
> @@ -38,10 +40,42 @@ size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type vi
> int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
> const struct iommu_user_data *user_data)
> {
> + unsigned long flags;
> struct protection_domain *pdom = to_pdomain(parent);
> struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
>
> + xa_init(&aviommu->gdomid_array);
Perhaps init with XA_FLAGS_ALLOC1 since domid can't be 0?
> +static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
> +{
> + unsigned long flags;
> + struct amd_iommu_viommu *entry, *next;
> + struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> + struct protection_domain *pdom = aviommu->parent;
> +
> + spin_lock_irqsave(&pdom->lock, flags);
> + list_for_each_entry_safe(entry, next, &pdom->viommu_list, pdom_list) {
> + if (entry == aviommu)
> + list_del(&entry->pdom_list);
> + }
Do we really need the loop? Why not simply do list_del()?
> + spin_unlock_irqrestore(&pdom->lock, flags);
> +
> +}
No need of the extra line at the end of the function.
> @@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> ndom->domain.type = IOMMU_DOMAIN_NESTED;
> ndom->viommu = aviommu;
>
> + gdom_info = kzalloc(sizeof(*gdom_info), GFP_KERNEL);
> + if (!gdom_info)
> + goto out_err;
Missing:
ret = -ENOMEM;
> +
> + /*
> + * Normally, when a guest has multiple pass-through devices,
> + * the IOMMU driver setup DTEs with the same stage-2 table and
> + * use the same host domain ID (hDomId). In case of nested translation,
> + * if the guest setup different stage-1 tables with same PASID,
> + * IOMMU would use the same TLB tag. This will results in TLB
> + * aliasing issue.
> + *
> + * The guest is assigning gDomIDs based on its own algorithm for managing
> + * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
> + * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
> + * to a single hDomID. This is done using an xarray in the vIOMMU to
> + * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
> + * command must be issued for each hDomID in the xarray.
> + */
> + curr = xa_cmpxchg(&aviommu->gdomid_array,
> + ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
> + if (curr) {
> + if (xa_err(curr)) {
> + ret = -EINVAL;
> + goto out_err_gdom_info;
> + } else {
> + /* The gDomID already exist */
> + pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
> + __func__, ndom->gdom_id, curr->hdom_id);
> + refcount_inc(&curr->users);
> + ndom->gdom_info = curr;
This looks racy..
When a gDomID is shared between two nested domains, a concurrent
nested_domain_free() could enter before refcount_inc(), and call
refcount_dec_and_test() or even free the curr and ndom.
Then, this refcount_inc() will blow up, or curr/ndom will UAF.
Actually, I don't see where amd_iommu_alloc_domain_nested() gets
used in this series.. I assume AMD will use the iommufd's vIOMMU
infrastructure directly which doesn't mutex across nested domain
allocation/free calls.
So, the entire thing here should hold xa_lock(), use xas_load()
for the existing curr and use xas_store() to store gdom_info if
!curr, and xa_unlock() after gdom_info is fully initialized.
> + kfree(gdom_info);
> + return &ndom->domain;
> + }
> + }
> +
> + /* The gDomID does not exist. We allocate new hdom_id */
> + gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
> + if (gdom_info->hdom_id <= 0) {
> + xa_cmpxchg(&aviommu->gdomid_array,
> + ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
> + ret = -ENOSPC;
> + goto out_err_gdom_info;
> + }
> +
> + refcount_set(&gdom_info->users, 1);
Similar risk here. gdom_info is stored to the xarray before this
line. A concurrent amd_iommu_alloc_domain_nested() could get the
stored gdom_info and blow up at refcount_inc().
Make sure the entire thing is locked and safe.
Nicolin
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-13 20:36 ` Nicolin Chen
@ 2025-11-19 0:02 ` Jason Gunthorpe
2026-01-15 9:25 ` Suthikulpanit, Suravee
2026-01-15 9:21 ` Suthikulpanit, Suravee
1 sibling, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 0:02 UTC (permalink / raw)
To: Nicolin Chen
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Nov 13, 2025 at 12:36:07PM -0800, Nicolin Chen wrote:
> > + curr = xa_cmpxchg(&aviommu->gdomid_array,
> > + ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
> > + if (curr) {
> > + if (xa_err(curr)) {
> > + ret = -EINVAL;
> > + goto out_err_gdom_info;
> > + } else {
> > + /* The gDomID already exist */
> > + pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
> > + __func__, ndom->gdom_id, curr->hdom_id);
> > + refcount_inc(&curr->users);
> > + ndom->gdom_info = curr;
>
> This looks racy..
Yes
> When a gDomID is shared between two nested domains, a concurrent
> nested_domain_free() could enter before refcount_inc(), and call
> refcount_dec_and_test() or even free the curr and ndom.
>
> Then, this refcount_inc() will blow up, or curr/ndom will UAF.
>
> Actually, I don't see where amd_iommu_alloc_domain_nested() gets
> used in this series.. I assume AMD will use the iommufd's vIOMMU
> infrastructure directly which doesn't mutex across nested domain
> allocation/free calls.
>
> So, the entire thing here should hold xa_lock(), use xas_load()
> for the existing curr and use xas_store() to store gdom_info if
> !curr, and xa_unlock() after gdom_info is fully initialized.
No need for xas functions.. You can use the __ functions..
A helper function like this will do the job:
static void *xa_load_or_alloc_locked(struct xarray *xa, unsigned long index, size_t sz)
{
void *elm, *res;
elm = xa_load(xa, index);
if (elm)
return elm;
xa_unlock(xa);
elm = kzalloc(sz, GFP_KERNEL);
xa_lock(xa);
if (!elm)
return ERR_PTR(-ENOMEM);
res = __xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL);
if (xa_is_err(res))
res = ERR_PTR(xa_err(res));
if (res) {
kfree(elm);
return res;
}
return elm;
}
Call like
xa_lock(&aviommu->gdomid_array);
elm = *xa_load_or_alloc_locked(..)
if (IS_ERR(elm))
..
elm->refcount++;
xa_unlock(&aviommu->gdomid_array);
Needs more bits if you want to use refcount_t
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-19 0:02 ` Jason Gunthorpe
@ 2026-01-15 9:25 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 39+ messages in thread
From: Suthikulpanit, Suravee @ 2026-01-15 9:25 UTC (permalink / raw)
To: Jason Gunthorpe, Nicolin Chen
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 11/19/2025 7:02 AM, Jason Gunthorpe wrote:
> On Thu, Nov 13, 2025 at 12:36:07PM -0800, Nicolin Chen wrote:
>>> + curr = xa_cmpxchg(&aviommu->gdomid_array,
>>> + ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
>>> + if (curr) {
>>> + if (xa_err(curr)) {
>>> + ret = -EINVAL;
>>> + goto out_err_gdom_info;
>>> + } else {
>>> + /* The gDomID already exist */
>>> + pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
>>> + __func__, ndom->gdom_id, curr->hdom_id);
>>> + refcount_inc(&curr->users);
>>> + ndom->gdom_info = curr;
>>
>> This looks racy..
>
> Yes
>
>> When a gDomID is shared between two nested domains, a concurrent
>> nested_domain_free() could enter before refcount_inc(), and call
>> refcount_dec_and_test() or even free the curr and ndom.
>>
>> Then, this refcount_inc() will blow up, or curr/ndom will UAF.
>>
>> Actually, I don't see where amd_iommu_alloc_domain_nested() gets
>> used in this series.. I assume AMD will use the iommufd's vIOMMU
>> infrastructure directly which doesn't mutex across nested domain
>> allocation/free calls.
>>
>> So, the entire thing here should hold xa_lock(), use xas_load()
>> for the existing curr and use xas_store() to store gdom_info if
>> !curr, and xa_unlock() after gdom_info is fully initialized.
>
> No need for xas functions.. You can use the __ functions..
>
> A helper function like this will do the job:
>
> static void *xa_load_or_alloc_locked(struct xarray *xa, unsigned long index, size_t sz)
> {
> void *elm, *res;
>
> elm = xa_load(xa, index);
> if (elm)
> return elm;
>
> xa_unlock(xa);
> elm = kzalloc(sz, GFP_KERNEL);
> xa_lock(xa);
> if (!elm)
> return ERR_PTR(-ENOMEM);
>
> res = __xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL);
> if (xa_is_err(res))
> res = ERR_PTR(xa_err(res));
>
> if (res) {
> kfree(elm);
> return res;
> }
>
> return elm;
> }
>
> Call like
>
> xa_lock(&aviommu->gdomid_array);
> elm = *xa_load_or_alloc_locked(..)
> if (IS_ERR(elm))
> ..
> elm->refcount++;
> xa_unlock(&aviommu->gdomid_array);
>
> Needs more bits if you want to use refcount_t
Thanks for the guide. I have rework this and sent out V6.
Suravee
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-13 20:36 ` Nicolin Chen
2025-11-19 0:02 ` Jason Gunthorpe
@ 2026-01-15 9:21 ` Suthikulpanit, Suravee
1 sibling, 0 replies; 39+ messages in thread
From: Suthikulpanit, Suravee @ 2026-01-15 9:21 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 11/14/2025 3:36 AM, Nicolin Chen wrote:
>> @@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
>> .....
>> + /*
>> + * Normally, when a guest has multiple pass-through devices,
>> + * the IOMMU driver setup DTEs with the same stage-2 table and
>> + * use the same host domain ID (hDomId). In case of nested translation,
>> + * if the guest setup different stage-1 tables with same PASID,
>> + * IOMMU would use the same TLB tag. This will results in TLB
>> + * aliasing issue.
>> + *
>> + * The guest is assigning gDomIDs based on its own algorithm for managing
>> + * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
>> + * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
>> + * to a single hDomID. This is done using an xarray in the vIOMMU to
>> + * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
>> + * command must be issued for each hDomID in the xarray.
>> + */
>> + curr = xa_cmpxchg(&aviommu->gdomid_array,
>> + ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
>> + if (curr) {
>> + if (xa_err(curr)) {
>> + ret = -EINVAL;
>> + goto out_err_gdom_info;
>> + } else {
>> + /* The gDomID already exist */
>> + pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
>> + __func__, ndom->gdom_id, curr->hdom_id);
>> + refcount_inc(&curr->users);
>> + ndom->gdom_info = curr;
> This looks racy..
>
> When a gDomID is shared between two nested domains, a concurrent
> nested_domain_free() could enter before refcount_inc(), and call
> refcount_dec_and_test() or even free the curr and ndom.
>
> Then, this refcount_inc() will blow up, or curr/ndom will UAF.
>
> Actually, I don't see where amd_iommu_alloc_domain_nested() gets
> used in this series.. I assume AMD will use the iommufd's vIOMMU
> infrastructure directly which doesn't mutex across nested domain
> allocation/free calls.
>
> So, the entire thing here should hold xa_lock(), use xas_load()
> for the existing curr and use xas_store() to store gdom_info if
> !curr, and xa_unlock() after gdom_info is fully initialized.
>
>> + kfree(gdom_info);
>> + return &ndom->domain;
>> + }
>> + }
>> +
>> + /* The gDomID does not exist. We allocate new hdom_id */
>> + gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
>> + if (gdom_info->hdom_id <= 0) {
>> + xa_cmpxchg(&aviommu->gdomid_array,
>> + ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
>> + ret = -ENOSPC;
>> + goto out_err_gdom_info;
>> + }
>> +
>> + refcount_set(&gdom_info->users, 1);
> Similar risk here. gdom_info is stored to the xarray before this
> line. A concurrent amd_iommu_alloc_domain_nested() could get the
> stored gdom_info and blow up at refcount_inc().
>
> Make sure the entire thing is locked and safe.
>
> Nicolin
Thanks for pointing this out. I am fixing this in V6 w/ Jason's
suggestion to use xa_lock/unlock w/ __xa_cmpxchg.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-12 18:25 ` [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
2025-11-13 20:36 ` Nicolin Chen
@ 2025-11-19 0:11 ` Jason Gunthorpe
2025-11-19 1:10 ` Nicolin Chen
1 sibling, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 0:11 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:03PM +0000, Suravee Suthikulpanit wrote:
> Each nested domain is assigned guest domain ID (gDomID), which guest OS
> programs into guest Device Table Entry (gDTE). For each gDomID, the driver
> assigns a corresponding host domain ID (hDomID), which will be programmed
> into the host Device Table Entry (hDTE).
>
> The hDomID is allocated during amd_iommu_alloc_domain_nested(),
> and free during nested_domain_free(). The gDomID-to-hDomID mapping info
> (struct guest_domain_mapping_info) is stored in a per-viommu xarray
> (struct amd_iommu_viommu.gdomid_array), which is indexed by gDomID.
>
> Note also that parent domain can be shared among struct iommufd_viommu.
> Therefore, when hypervisor invalidates the nest parent domain, the AMD
> IOMMU command INVALIDATE_IOMMU_PAGES must be issued for each hDomID in
> the gdomid_array. This is handled by the iommu_flush_pages_v1_hdom_ids(),
> where it iterates through struct protection_domain.viommu_list.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 23 +++++++++
> drivers/iommu/amd/iommu.c | 35 +++++++++++++
> drivers/iommu/amd/iommufd.c | 34 ++++++++++++
> drivers/iommu/amd/nested.c | 80 +++++++++++++++++++++++++++++
> 4 files changed, 172 insertions(+)
I think this looks OK in general, just the locking needs fixing up.
> +static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
> +{
> + int ret = 0;
> + struct amd_iommu_viommu *aviommu;
> +
> + list_for_each_entry(aviommu, &pdom->viommu_list, pdom_list) {
> + unsigned long i;
> + struct guest_domain_mapping_info *gdom_info;
> + struct amd_iommu *iommu = container_of(aviommu->core.iommu_dev, struct amd_iommu, iommu);
> +
> + xa_for_each(&aviommu->gdomid_array, i, gdom_info) {
> + struct iommu_cmd cmd;
What is the locking for the xa here?
It looks missing too. Either hold the xa lock for this iteration, or
do something to hold the pdom lock when updating the xarray.
> + pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
> + iommu->devid, gdom_info->hdom_id);
> + build_inv_iommu_pages(&cmd, address, size, gdom_info->hdom_id,
> + IOMMU_NO_PASID, false);
> + ret |= iommu_queue_command(iommu, &cmd);
> + }
> + }
> + return ret;
> +}
This is kind of painfully slow for invalidation but OK for now, we
don't really expect alot of parent invalidation traffic..
I think after we get this landed:
https://lore.kernel.org/r/cover.1762588839.git.nicolinc@nvidia.com
We should take a serious run at trying to make it shared code and have
AMD use it. That would resolve the performance concern..
> +static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
> +{
> + unsigned long flags;
> + struct amd_iommu_viommu *entry, *next;
> + struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> + struct protection_domain *pdom = aviommu->parent;
> +
> + spin_lock_irqsave(&pdom->lock, flags);
> + list_for_each_entry_safe(entry, next, &pdom->viommu_list, pdom_list) {
> + if (entry == aviommu)
> + list_del(&entry->pdom_list);
> + }
> + spin_unlock_irqrestore(&pdom->lock, flags);
Same remark as Nicolin
> static void nested_domain_free(struct iommu_domain *dom)
> {
> + struct guest_domain_mapping_info *curr;
> struct nested_domain *ndom = to_ndomain(dom);
> + struct amd_iommu_viommu *aviommu = ndom->viommu;
> +
> + if (!refcount_dec_and_test(&ndom->gdom_info->users))
> + return;
Same locking issues here, you have to hold the xa_lock while doing
this decrement through to thecmpxchg otherwise it will go wrong.
> + /*
> + * The refcount for the gdom_id to hdom_id mapping is zero.
> + * It is now safe to remove the mapping.
> + */
> + curr = xa_cmpxchg(&aviommu->gdomid_array, ndom->gdom_id,
> + ndom->gdom_info, NULL, GFP_ATOMIC);
> + if (curr) {
> + if (xa_err(curr)) {
> + pr_err("%s: Failed to free nested domain gdom_id=%#x\n",
> + __func__, ndom->gdom_id);
Don't log.. This should be a WARN_ON, the cmpxchg cannot fail unless
the xa is corrupted.
> + return;
> + }
> +
> + /* success */
> + pr_debug("%s: Free gdom_id=%#x, hdom_id=%#x\n",
> + __func__, ndom->gdom_id, curr->hdom_id);
> + kfree(curr);
> + }
> +
> + amd_iommu_pdom_id_free(ndom->gdom_info->hdom_id);
?? This should be before the kfree(curr). the hdom_id remains
allocated and valid so long as the gdom_info is allocated.
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2025-11-19 0:11 ` Jason Gunthorpe
@ 2025-11-19 1:10 ` Nicolin Chen
0 siblings, 0 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-19 1:10 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Tue, Nov 18, 2025 at 08:11:51PM -0400, Jason Gunthorpe wrote:
> On Wed, Nov 12, 2025 at 06:25:03PM +0000, Suravee Suthikulpanit wrote:
> > +static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
> > +{
> > + int ret = 0;
> > + struct amd_iommu_viommu *aviommu;
> > +
> > + list_for_each_entry(aviommu, &pdom->viommu_list, pdom_list) {
> > + unsigned long i;
> > + struct guest_domain_mapping_info *gdom_info;
> > + struct amd_iommu *iommu = container_of(aviommu->core.iommu_dev, struct amd_iommu, iommu);
> > +
> > + xa_for_each(&aviommu->gdomid_array, i, gdom_info) {
> This is kind of painfully slow for invalidation but OK for now, we
> don't really expect alot of parent invalidation traffic..
>
> I think after we get this landed:
>
> https://lore.kernel.org/r/cover.1762588839.git.nicolinc@nvidia.com
>
> We should take a serious run at trying to make it shared code and have
> AMD use it. That would resolve the performance concern..
Yea. I had the same feeling when looking at AMD's patch, and
realized that you did foresee all of these.
I've started work on the VMID sharing using the invs. Perhaps
I can add some patches prior to generalize those helpers.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 12/14] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (10 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-13 20:42 ` Nicolin Chen
2025-11-12 18:25 ` [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
` (2 subsequent siblings)
14 siblings, 1 reply; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To help avoid duplicate logic when programing DTE for nested translation.
Note that this commit changes behavior of when the IOMMU driver is
switching domain during attach and the blocking domain, where DTE bit
fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt)
and System management message could be affected. These DTE bits are
specified in the IVRS table for specific devices, and should be persistent.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 13 +++++++++++++
drivers/iommu/amd/iommu.c | 11 -----------
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 57f9f4fb8a4b..ebedb49c28cf 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -194,9 +194,22 @@ void amd_iommu_update_dte(struct amd_iommu *iommu,
static inline void
amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
{
+ struct dev_table_entry *initial_dte;
+ struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
+
/* All existing DTE must have V bit set */
new->data128[0] = DTE_FLAG_V;
new->data128[1] = 0;
+
+ /*
+ * Restore cached persistent DTE bits, which can be set by information
+ * in IVRS table. See set_dev_entry_from_acpi().
+ */
+ initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
+ if (initial_dte) {
+ new->data128[0] |= initial_dte->data128[0];
+ new->data128[1] |= initial_dte->data128[1];
+ }
}
/* NESTED */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6a26e7a28141..24bab275e8c0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2068,7 +2068,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
{
u16 domid;
u32 old_domid;
- struct dev_table_entry *initial_dte;
struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
@@ -2126,16 +2125,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
- /*
- * Restore cached persistent DTE bits, which can be set by information
- * in IVRS table. See set_dev_entry_from_acpi().
- */
- initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
- if (initial_dte) {
- new.data128[0] |= initial_dte->data128[0];
- new.data128[1] |= initial_dte->data128[1];
- }
-
set_dte_gcr3_table(iommu, dev_data, &new);
amd_iommu_update_dte(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 12/14] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2025-11-12 18:25 ` [PATCH v5 12/14] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2025-11-13 20:42 ` Nicolin Chen
0 siblings, 0 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 20:42 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:04PM +0000, Suravee Suthikulpanit wrote:
> To help avoid duplicate logic when programing DTE for nested translation.
>
> Note that this commit changes behavior of when the IOMMU driver is
> switching domain during attach and the blocking domain, where DTE bit
> fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt)
> and System management message could be affected. These DTE bits are
> specified in the IVRS table for specific devices, and should be persistent.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
I have reviewed this in v4...
https://lore.kernel.org/linux-iommu/aPmX1fKDtwr4seS0@Asurada-Nvidia/
Perhaps you should try "b4 am" that would have collected all the
tags for you.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (11 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 12/14] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-13 21:19 ` Nicolin Chen
2025-11-19 0:18 ` Jason Gunthorpe
2025-11-12 18:25 ` [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-11-13 21:52 ` [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Nicolin Chen
14 siblings, 2 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce the amd_iommu_set_dte_v1() helper function to configure
IOMMU host (v1) page table into DTE.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 5 ++
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/iommu.c | 94 ++++++++++++++++-------------
3 files changed, 59 insertions(+), 41 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index ebedb49c28cf..df167ae9e4aa 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -187,6 +187,11 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain,
+ u16 domid, struct pt_iommu_amdv1_hw_info *pt_info,
+ struct dev_table_entry *new);
+
void amd_iommu_update_dte(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *new);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 734f6a753b3a..19a98c6490cb 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -352,6 +352,7 @@
#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_MODE_MASK GENMASK_ULL(11, 9)
#define DTE_HOST_TRP GENMASK_ULL(51, 12)
+#define DTE_FLAG_PPR BIT_ULL(52)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 24bab275e8c0..d392b6757f2a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2030,36 +2030,56 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
*/
-static void set_dte_gcr3_table(struct amd_iommu *iommu,
- struct iommu_dev_data *dev_data,
- struct dev_table_entry *target)
+static void set_dte_gcr3_table(struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
- u64 gcr3;
+ u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- if (!gcr3_info->gcr3_tbl)
- return;
-
- pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
- __func__, dev_data->devid, gcr3_info->glx,
- (unsigned long long)gcr3_info->gcr3_tbl);
-
- gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+ new->data[0] |= DTE_FLAG_TV |
+ (dev_data->ppr ? DTE_FLAG_PPR : 0) |
+ (pdom_is_v2_pgtbl_mode(dev_data->domain) ? DTE_FLAG_GIOV : 0) |
+ DTE_FLAG_GV |
+ FIELD_PREP(DTE_GLX, gcr3_info->glx) |
+ FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
+ DTE_FLAG_IR | DTE_FLAG_IW;
- target->data[0] |= DTE_FLAG_GV |
- FIELD_PREP(DTE_GLX, gcr3_info->glx) |
- FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
- if (pdom_is_v2_pgtbl_mode(dev_data->domain))
- target->data[0] |= DTE_FLAG_GIOV;
-
- target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
- FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
+ new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid) |
+ FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
+ (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
+ FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
/* Guest page table can only support 4 and 5 levels */
if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
- target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
+ new->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
else
- target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
+ new->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
+}
+
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain,
+ u16 domid, struct pt_iommu_amdv1_hw_info *pt_info,
+ struct dev_table_entry *new)
+{
+ /* Note Dirty tracking is used for v1 table only for now */
+ new->data[0] |= DTE_FLAG_TV |
+ FIELD_PREP(DTE_MODE_MASK, pt_info->mode) |
+ (domain->dirty_tracking ? DTE_FLAG_HAD : 0) |
+ FIELD_PREP(DTE_HOST_TRP, pt_info->host_pt_root >> 12) |
+ DTE_FLAG_IR | DTE_FLAG_IW;
+
+ new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid) |
+ (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
+}
+
+static void set_dte_passthrough(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain,
+ struct dev_table_entry *new)
+{
+ new->data[0] |= DTE_FLAG_TV | DTE_FLAG_IR | DTE_FLAG_IW;
+
+ new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id) |
+ (dev_data->ats_enabled) ? DTE_FLAG_IOTLB : 0;
}
static void set_dte_entry(struct amd_iommu *iommu,
@@ -2074,8 +2094,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
struct pt_iommu_amdv1_hw_info pt_info;
- amd_iommu_make_clear_dte(dev_data, &new);
-
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
else {
@@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
&pt_info);
}
- new.data[0] |= __sme_set(pt_info.host_pt_root) |
- (pt_info.mode & DEV_ENTRY_MODE_MASK)
- << DEV_ENTRY_MODE_SHIFT;
+ pt_info.host_pt_root = __sme_set(pt_info.host_pt_root);
}
}
- new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
-
/*
* When SNP is enabled, we can only support TV=1 with non-zero domain ID.
* This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
* do_iommu_domain_alloc().
*/
WARN_ON(amd_iommu_snp_en && (domid == 0));
- new.data[0] |= DTE_FLAG_TV;
-
- if (dev_data->ppr)
- new.data[0] |= 1ULL << DEV_ENTRY_PPR;
-
- if (domain->dirty_tracking)
- new.data[0] |= DTE_FLAG_HAD;
- if (dev_data->ats_enabled)
- new.data[1] |= DTE_FLAG_IOTLB;
+ amd_iommu_make_clear_dte(dev_data, &new);
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- new.data[1] |= domid;
-
- set_dte_gcr3_table(iommu, dev_data, &new);
+ if (gcr3_info && gcr3_info->gcr3_tbl)
+ set_dte_gcr3_table(dev_data, &new);
+ else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+ set_dte_passthrough(dev_data, domain, &new);
+ else if (domain->domain.type & __IOMMU_DOMAIN_PAGING &&
+ domain->pd_mode == PD_MODE_V1)
+ amd_iommu_set_dte_v1(dev_data, domain, domid, &pt_info, &new);
+ else
+ WARN_ON(true);
amd_iommu_update_dte(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-12 18:25 ` [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2025-11-13 21:19 ` Nicolin Chen
2025-11-13 21:29 ` Nicolin Chen
` (2 more replies)
2025-11-19 0:18 ` Jason Gunthorpe
1 sibling, 3 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 21:19 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:05PM +0000, Suravee Suthikulpanit wrote:
> static void set_dte_entry(struct amd_iommu *iommu,
> @@ -2074,8 +2094,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
> struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
> struct pt_iommu_amdv1_hw_info pt_info;
>
> - amd_iommu_make_clear_dte(dev_data, &new);
> -
> if (gcr3_info && gcr3_info->gcr3_tbl)
> domid = dev_data->gcr3_info.domid;
set_dte_gcr3_table() now always get the domid from this link, so
it's unnecessary.
> else {
> @@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
> &pt_info);
> }
>
> - new.data[0] |= __sme_set(pt_info.host_pt_root) |
> - (pt_info.mode & DEV_ENTRY_MODE_MASK)
> - << DEV_ENTRY_MODE_SHIFT;
> + pt_info.host_pt_root = __sme_set(pt_info.host_pt_root);
> }
> }
And this __IOMMU_DOMAIN_PAGING path seems to be used by v1 only.
So, it could be squashed into amd_iommu_set_dte_v1(). This could
tidy set_dte_entry() further.
>
> - new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
> -
> /*
> * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
> * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
> * do_iommu_domain_alloc().
> */
> WARN_ON(amd_iommu_snp_en && (domid == 0));
This, if it's very necessary, can go into individual functions.
Though I am not sure if it is accurate as it seems to imply that
IOMMU_DOMAIN_IDENTITY has domid == 0?
But an IOMMU_DOMAIN_IDENTITY does:
identity_domain.id = amd_iommu_pdom_id_alloc();
doing:
ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
which means it never returns 0.
Nicolin
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-13 21:19 ` Nicolin Chen
@ 2025-11-13 21:29 ` Nicolin Chen
2025-11-19 0:21 ` Jason Gunthorpe
2025-11-19 0:20 ` Jason Gunthorpe
2026-01-15 9:24 ` Suthikulpanit, Suravee
2 siblings, 1 reply; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 21:29 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Thu, Nov 13, 2025 at 01:19:16PM -0800, Nicolin Chen wrote:
> On Wed, Nov 12, 2025 at 06:25:05PM +0000, Suravee Suthikulpanit wrote:
> > else {
> > @@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
> > &pt_info);
> > }
> >
> > - new.data[0] |= __sme_set(pt_info.host_pt_root) |
> > - (pt_info.mode & DEV_ENTRY_MODE_MASK)
> > - << DEV_ENTRY_MODE_SHIFT;
> > + pt_info.host_pt_root = __sme_set(pt_info.host_pt_root);
> > }
> > }
>
> And this __IOMMU_DOMAIN_PAGING path seems to be used by v1 only.
> So, it could be squashed into amd_iommu_set_dte_v1(). This could
> tidy set_dte_entry() further.
Having looked at PATCH-14, I realized that amd_iommu_set_dte_v1()
is shared with the nesting pathway.
So perhaps:
else if (domain->domain.type & __IOMMU_DOMAIN_PAGING &&
domain->pd_mode == PD_MODE_V1) {
struct pt_iommu_amdv1_hw_info pt_info;
....; // <--move here
amd_iommu_set_dte_v1(dev_data, domain, domid, &pt_info, &new)
} else
Nicolin
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-13 21:29 ` Nicolin Chen
@ 2025-11-19 0:21 ` Jason Gunthorpe
0 siblings, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 0:21 UTC (permalink / raw)
To: Nicolin Chen
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Nov 13, 2025 at 01:29:39PM -0800, Nicolin Chen wrote:
> On Thu, Nov 13, 2025 at 01:19:16PM -0800, Nicolin Chen wrote:
> > On Wed, Nov 12, 2025 at 06:25:05PM +0000, Suravee Suthikulpanit wrote:
> > > else {
> > > @@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
> > > &pt_info);
> > > }
> > >
> > > - new.data[0] |= __sme_set(pt_info.host_pt_root) |
> > > - (pt_info.mode & DEV_ENTRY_MODE_MASK)
> > > - << DEV_ENTRY_MODE_SHIFT;
> > > + pt_info.host_pt_root = __sme_set(pt_info.host_pt_root);
> > > }
> > > }
> >
> > And this __IOMMU_DOMAIN_PAGING path seems to be used by v1 only.
> > So, it could be squashed into amd_iommu_set_dte_v1(). This could
> > tidy set_dte_entry() further.
>
> Having looked at PATCH-14, I realized that amd_iommu_set_dte_v1()
> is shared with the nesting pathway.
>
> So perhaps:
> else if (domain->domain.type & __IOMMU_DOMAIN_PAGING &&
> domain->pd_mode == PD_MODE_V1) {
> struct pt_iommu_amdv1_hw_info pt_info;
>
> ....; // <--move here
> amd_iommu_set_dte_v1(dev_data, domain, domid, &pt_info, &new)
> } else
Even so it should be called from the set functions and the set
functions should sort it out, if another helper is needed to share
with nesting then fine, but you might also just have nesting call the
set_dte_v1 to start and then modify in the gcr3 table..
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-13 21:19 ` Nicolin Chen
2025-11-13 21:29 ` Nicolin Chen
@ 2025-11-19 0:20 ` Jason Gunthorpe
2026-01-15 9:24 ` Suthikulpanit, Suravee
2 siblings, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 0:20 UTC (permalink / raw)
To: Nicolin Chen
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Nov 13, 2025 at 01:19:13PM -0800, Nicolin Chen wrote:
> > else {
> > @@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
> > &pt_info);
> > }
> >
> > - new.data[0] |= __sme_set(pt_info.host_pt_root) |
> > - (pt_info.mode & DEV_ENTRY_MODE_MASK)
> > - << DEV_ENTRY_MODE_SHIFT;
> > + pt_info.host_pt_root = __sme_set(pt_info.host_pt_root);
> > }
> > }
>
> And this __IOMMU_DOMAIN_PAGING path seems to be used by v1 only.
> So, it could be squashed into amd_iommu_set_dte_v1(). This could
> tidy set_dte_entry() further.
Oh yes, definately should be done, the whole pt info grab for v1
belongs inside the set v1 function.
> > /*
> > * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
> > * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
> > * do_iommu_domain_alloc().
> > */
> > WARN_ON(amd_iommu_snp_en && (domid == 0));
>
> This, if it's very necessary, can go into individual functions.
>
> Though I am not sure if it is accurate as it seems to imply that
> IOMMU_DOMAIN_IDENTITY has domid == 0?
I never quite understood this comment myself either :\
> But an IOMMU_DOMAIN_IDENTITY does:
> identity_domain.id = amd_iommu_pdom_id_alloc();
> doing:
> ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
> which means it never returns 0.
Hmm!
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-13 21:19 ` Nicolin Chen
2025-11-13 21:29 ` Nicolin Chen
2025-11-19 0:20 ` Jason Gunthorpe
@ 2026-01-15 9:24 ` Suthikulpanit, Suravee
2 siblings, 0 replies; 39+ messages in thread
From: Suthikulpanit, Suravee @ 2026-01-15 9:24 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 11/14/2025 4:19 AM, Nicolin Chen wrote:
>> @@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
>>
>>.....
>>
>> - new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
>> -
>> /*
>> * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
>> * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
>> * do_iommu_domain_alloc().
>> */
>> WARN_ON(amd_iommu_snp_en && (domid == 0));
> This, if it's very necessary, can go into individual functions.
>
> Though I am not sure if it is accurate as it seems to imply that
> IOMMU_DOMAIN_IDENTITY has domid == 0?
>
> But an IOMMU_DOMAIN_IDENTITY does:
> identity_domain.id = amd_iommu_pdom_id_alloc();
> doing:
> ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
> which means it never returns 0.
>
> Nicolin
After more careful review, I am removing this check for now since it's
no longer applicable with the more recent code.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE
2025-11-12 18:25 ` [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
2025-11-13 21:19 ` Nicolin Chen
@ 2025-11-19 0:18 ` Jason Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 0:18 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:05PM +0000, Suravee Suthikulpanit wrote:
> @@ -2097,35 +2115,29 @@ static void set_dte_entry(struct amd_iommu *iommu,
> old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
> - new.data[1] |= domid;
> -
> - set_dte_gcr3_table(iommu, dev_data, &new);
> + if (gcr3_info && gcr3_info->gcr3_tbl)
> + set_dte_gcr3_table(dev_data, &new);
> + else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> + set_dte_passthrough(dev_data, domain, &new);
> + else if (domain->domain.type & __IOMMU_DOMAIN_PAGING &&
() around the & is kernel style I think
Otherwise this looks OK
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
In some future the set_dte_xx should be called from the domain attach
function that knows what kind of domain it is attaching already - ie
identity, v1, v2 all should have different attach functions.
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (12 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 13/14] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2025-11-12 18:25 ` Suravee Suthikulpanit
2025-11-13 21:34 ` Nicolin Chen
2025-11-19 0:28 ` Jason Gunthorpe
2025-11-13 21:52 ` [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Nicolin Chen
14 siblings, 2 replies; 39+ messages in thread
From: Suravee Suthikulpanit @ 2025-11-12 18:25 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/nested.c | 69 ++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 1bbcb16abecc..eeb5d9b3a58f 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -153,6 +153,74 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
return ERR_PTR(ret);
}
+static void set_dte_nested(struct amd_iommu *iommu,
+ struct iommu_domain *dom,
+ struct iommu_dev_data *dev_data)
+{
+ struct protection_domain *parent;
+ struct dev_table_entry new = {0};
+ struct nested_domain *ndom = to_ndomain(dom);
+ struct iommu_hwpt_amd_guest *gdte = &ndom->gdte;
+ struct pt_iommu_amdv1_hw_info pt_info;
+
+ /*
+ * The nest parent domain is attached during the call to the
+ * struct iommu_ops.viommu_init(), which will be stored as part
+ * of the struct amd_iommu_viommu.parent.
+ */
+ if (WARN_ON(!ndom->viommu || !ndom->viommu->parent))
+ return;
+
+ parent = ndom->viommu->parent;
+ amd_iommu_make_clear_dte(dev_data, &new);
+
+ /* Retrieve the current pagetable info via the IOMMU PT API. */
+ pt_iommu_amdv1_hw_info(&parent->amdv1, &pt_info);
+
+ /*
+ * Use nested domain ID to program DTE.
+ * See amd_iommu_alloc_domain_nested().
+ */
+ amd_iommu_set_dte_v1(dev_data, parent, ndom->gdom_info->hdom_id, &pt_info, &new);
+
+ /* Guest PPR */
+ new.data[0] |= gdte->dte[0] & DTE_FLAG_PPR;
+
+ /* Guest translation stuff */
+ new.data[0] |= gdte->dte[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV);
+
+ /* GCR3 table */
+ new.data[0] |= gdte->dte[0] & DTE_GCR3_14_12;
+ new.data[1] |= gdte->dte[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31);
+
+ /* Guest paging mode */
+ new.data[2] |= gdte->dte[2] & DTE_GPT_LEVEL_MASK;
+
+ amd_iommu_update_dte(iommu, dev_data, &new);
+}
+
+static int nested_attach_device(struct iommu_domain *dom, struct device *dev,
+ struct iommu_domain *old)
+{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+ struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
+ int ret = 0;
+
+ if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
+ return -EINVAL;
+
+ mutex_lock(&dev_data->mutex);
+
+ /* Setup DTE for nested translation and
+ * update the device table
+ */
+ set_dte_nested(iommu, dom, dev_data);
+
+ mutex_unlock(&dev_data->mutex);
+
+ return ret;
+}
+
static void nested_domain_free(struct iommu_domain *dom)
{
struct guest_domain_mapping_info *curr;
@@ -186,5 +254,6 @@ static void nested_domain_free(struct iommu_domain *dom)
}
static const struct iommu_domain_ops nested_domain_ops = {
+ .attach_dev = nested_attach_device,
.free = nested_domain_free,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach
2025-11-12 18:25 ` [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-11-13 21:34 ` Nicolin Chen
2025-11-19 0:28 ` Jason Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 21:34 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:06PM +0000, Suravee Suthikulpanit wrote:
> Introduce set_dte_nested() to program guest translation settings in
> the host DTE when attaches the nested domain to a device.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach
2025-11-12 18:25 ` [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-11-13 21:34 ` Nicolin Chen
@ 2025-11-19 0:28 ` Jason Gunthorpe
1 sibling, 0 replies; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-19 0:28 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:25:06PM +0000, Suravee Suthikulpanit wrote:
> Introduce set_dte_nested() to program guest translation settings in
> the host DTE when attaches the nested domain to a device.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/nested.c | 69 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 69 insertions(+)
>
> diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
> index 1bbcb16abecc..eeb5d9b3a58f 100644
> --- a/drivers/iommu/amd/nested.c
> +++ b/drivers/iommu/amd/nested.c
> @@ -153,6 +153,74 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> return ERR_PTR(ret);
> }
>
> +static void set_dte_nested(struct amd_iommu *iommu,
> + struct iommu_domain *dom,
> + struct iommu_dev_data *dev_data)
> +{
> + struct protection_domain *parent;
> + struct dev_table_entry new = {0};
> + struct nested_domain *ndom = to_ndomain(dom);
> + struct iommu_hwpt_amd_guest *gdte = &ndom->gdte;
> + struct pt_iommu_amdv1_hw_info pt_info;
> +
> + /*
> + * The nest parent domain is attached during the call to the
> + * struct iommu_ops.viommu_init(), which will be stored as part
> + * of the struct amd_iommu_viommu.parent.
> + */
> + if (WARN_ON(!ndom->viommu || !ndom->viommu->parent))
> + return;
> +
> + parent = ndom->viommu->parent;
> + amd_iommu_make_clear_dte(dev_data, &new);
> +
> + /* Retrieve the current pagetable info via the IOMMU PT API. */
> + pt_iommu_amdv1_hw_info(&parent->amdv1, &pt_info);
> +
> + /*
> + * Use nested domain ID to program DTE.
> + * See amd_iommu_alloc_domain_nested().
> + */
> + amd_iommu_set_dte_v1(dev_data, parent, ndom->gdom_info->hdom_id, &pt_info, &new);
> +
> + /* Guest PPR */
> + new.data[0] |= gdte->dte[0] & DTE_FLAG_PPR;
> +
> + /* Guest translation stuff */
> + new.data[0] |= gdte->dte[0] & (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV);
> +
> + /* GCR3 table */
> + new.data[0] |= gdte->dte[0] & DTE_GCR3_14_12;
> + new.data[1] |= gdte->dte[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31);
> +
> + /* Guest paging mode */
> + new.data[2] |= gdte->dte[2] & DTE_GPT_LEVEL_MASK;
> +
> + amd_iommu_update_dte(iommu, dev_data, &new);
The functions should be consistent a "set" function should just set a
struct dev_table_entry. A set function should not call
amd_iommu_update_dte().
So either lift the amd_iommu_update_dte() (I prefer) or change the
function name?
> +}
> +
> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev,
> + struct iommu_domain *old)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> + int ret = 0;
> +
> + if (WARN_ON(dom->type != IOMMU_DOMAIN_NESTED))
> + return -EINVAL;
This is not needed, the ops are for nesting they are only called by
nesting domain types.
> + mutex_lock(&dev_data->mutex);
> +
> + /* Setup DTE for nested translation and
> + * update the device table
> + */
> + set_dte_nested(iommu, dom, dev_data);
> +
> + mutex_unlock(&dev_data->mutex);
This needs to make sure there are not PASIDs enabled.
And similarly the PASID attach path needs to to check that a v1 or
blocking domain is on the rid not identiy, not nesting.
But overall this looks OK and I think the series a whole is looking
pretty good. If you fix these little things it can possibly make this
cycle?
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support
2025-11-12 18:24 [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (13 preceding siblings ...)
2025-11-12 18:25 ` [PATCH v5 14/14] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-11-13 21:52 ` Nicolin Chen
2025-11-17 17:54 ` Jason Gunthorpe
14 siblings, 1 reply; 39+ messages in thread
From: Nicolin Chen @ 2025-11-13 21:52 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On Wed, Nov 12, 2025 at 06:24:52PM +0000, Suravee Suthikulpanit wrote:
> Note: This series is rebased on top of:
> * Git repo: git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
> Branch: next
> Commit: 91920a9d87f5 ("Merge branches 'arm/smmu/updates', 'arm/smmu/bindings',
> 'mediatek', 'nvidia/tegra', 'amd/amd-vi' and 'core'
> into next")
> * [PATCH v5] iommu/amd: Add support for hw_info for iommu capability query
> https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/T/#u
Nit: this patch doesn't apply cleanly on 91920a9d87f5 :-/
> drivers/iommu/amd/Makefile | 2 +-
> drivers/iommu/amd/amd_iommu.h | 36 ++++
> drivers/iommu/amd/amd_iommu_types.h | 48 +++++-
> drivers/iommu/amd/init.c | 8 +
> drivers/iommu/amd/iommu.c | 221 +++++++++++++++---------
> drivers/iommu/amd/iommufd.c | 50 ++++++
> drivers/iommu/amd/iommufd.h | 5 +
> drivers/iommu/amd/nested.c | 259 ++++++++++++++++++++++++++++
> include/uapi/linux/iommufd.h | 11 ++
So, this seems to be a preparatory series for AMD vIOMMU, yet it
doesn't properly work since it's missing IOMMUFD_VIOMMU_TYPE_AMD
and the invalidation component (HW_QUEUE).
However, the series does declare IOMMU_HWPT_DATA_AMD_GUEST in the
uAPI header. I am afraid that might confuse user who might think
AMD now supports virtualization using the HWPT-based mode, like
Intel VT-d.
So, maybe we should either:
- leave a note at IOMMU_HWPT_DATA_AMD_GUEST to declare it is
incomplete yet, and remove later
- keep IOMMU_HWPT_DATA_AMD_GUEST in an AMD driver header, and
move to the uAPI header later
Jason?
Nicolin
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support
2025-11-13 21:52 ` [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support Nicolin Chen
@ 2025-11-17 17:54 ` Jason Gunthorpe
2026-01-15 9:18 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2025-11-17 17:54 UTC (permalink / raw)
To: Nicolin Chen
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Nov 13, 2025 at 01:52:02PM -0800, Nicolin Chen wrote:
> On Wed, Nov 12, 2025 at 06:24:52PM +0000, Suravee Suthikulpanit wrote:
> > Note: This series is rebased on top of:
> > * Git repo: git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
> > Branch: next
> > Commit: 91920a9d87f5 ("Merge branches 'arm/smmu/updates', 'arm/smmu/bindings',
> > 'mediatek', 'nvidia/tegra', 'amd/amd-vi' and 'core'
> > into next")
> > * [PATCH v5] iommu/amd: Add support for hw_info for iommu capability query
> > https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/T/#u
>
> Nit: this patch doesn't apply cleanly on 91920a9d87f5 :-/
>
> > drivers/iommu/amd/Makefile | 2 +-
> > drivers/iommu/amd/amd_iommu.h | 36 ++++
> > drivers/iommu/amd/amd_iommu_types.h | 48 +++++-
> > drivers/iommu/amd/init.c | 8 +
> > drivers/iommu/amd/iommu.c | 221 +++++++++++++++---------
> > drivers/iommu/amd/iommufd.c | 50 ++++++
> > drivers/iommu/amd/iommufd.h | 5 +
> > drivers/iommu/amd/nested.c | 259 ++++++++++++++++++++++++++++
> > include/uapi/linux/iommufd.h | 11 ++
>
> So, this seems to be a preparatory series for AMD vIOMMU, yet it
> doesn't properly work since it's missing IOMMUFD_VIOMMU_TYPE_AMD
> and the invalidation component (HW_QUEUE).
>
> However, the series does declare IOMMU_HWPT_DATA_AMD_GUEST in the
> uAPI header. I am afraid that might confuse user who might think
> AMD now supports virtualization using the HWPT-based mode, like
> Intel VT-d.
>
> So, maybe we should either:
> - leave a note at IOMMU_HWPT_DATA_AMD_GUEST to declare it is
> incomplete yet, and remove later
> - keep IOMMU_HWPT_DATA_AMD_GUEST in an AMD driver header, and
> move to the uAPI header later
>
> Jason?
Yeah, I like to see this incremental work, but Alex recently raised
that we should be a bit more careful about how userspace perceives
these partially complete things.
I don't think tricks with head files work well, I think what you'd
want to do is leave some critical system call disabled until all the
work is finished so the VMM never has to see a half working
implementation?
The patch to get the info would have been a nice choice for this purpose..
Jason
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v5 00/14] iommu/amd: Introduce Nested Translation support
2025-11-17 17:54 ` Jason Gunthorpe
@ 2026-01-15 9:18 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 39+ messages in thread
From: Suthikulpanit, Suravee @ 2026-01-15 9:18 UTC (permalink / raw)
To: Jason Gunthorpe, Nicolin Chen
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez
On 11/18/2025 12:54 AM, Jason Gunthorpe wrote:
> On Thu, Nov 13, 2025 at 01:52:02PM -0800, Nicolin Chen wrote:
>> On Wed, Nov 12, 2025 at 06:24:52PM +0000, Suravee Suthikulpanit wrote:
>>> Note: This series is rebased on top of:
>>> * Git repo: git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
>>> Branch: next
>>> Commit: 91920a9d87f5 ("Merge branches 'arm/smmu/updates', 'arm/smmu/bindings',
>>> 'mediatek', 'nvidia/tegra', 'amd/amd-vi' and 'core'
>>> into next")
>>> * [PATCH v5] iommu/amd: Add support for hw_info for iommu capability query
>>> https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/T/#u
>>
>> Nit: this patch doesn't apply cleanly on 91920a9d87f5 :-/
>>
>>> drivers/iommu/amd/Makefile | 2 +-
>>> drivers/iommu/amd/amd_iommu.h | 36 ++++
>>> drivers/iommu/amd/amd_iommu_types.h | 48 +++++-
>>> drivers/iommu/amd/init.c | 8 +
>>> drivers/iommu/amd/iommu.c | 221 +++++++++++++++---------
>>> drivers/iommu/amd/iommufd.c | 50 ++++++
>>> drivers/iommu/amd/iommufd.h | 5 +
>>> drivers/iommu/amd/nested.c | 259 ++++++++++++++++++++++++++++
>>> include/uapi/linux/iommufd.h | 11 ++
>>
>> So, this seems to be a preparatory series for AMD vIOMMU, yet it
>> doesn't properly work since it's missing IOMMUFD_VIOMMU_TYPE_AMD
>> and the invalidation component (HW_QUEUE).
>>
>> However, the series does declare IOMMU_HWPT_DATA_AMD_GUEST in the
>> uAPI header. I am afraid that might confuse user who might think
>> AMD now supports virtualization using the HWPT-based mode, like
>> Intel VT-d.
>>
>> So, maybe we should either:
>> - leave a note at IOMMU_HWPT_DATA_AMD_GUEST to declare it is
>> incomplete yet, and remove later
>> - keep IOMMU_HWPT_DATA_AMD_GUEST in an AMD driver header, and
>> move to the uAPI header later
>>
>> Jason?
>
> Yeah, I like to see this incremental work, but Alex recently raised
> that we should be a bit more careful about how userspace perceives
> these partially complete things.
>
> I don't think tricks with head files work well, I think what you'd
> want to do is leave some critical system call disabled until all the
> work is finished so the VMM never has to see a half working
> implementation?
Yes, this is a preparatory series for nested translation w/ AMD vIOMMU
support. Currently, the amd_iommu_alloc_domain_nested() has not been
hooked with the struct iommufd_viommu_ops.alloc_domain_nested. This is
should prevent the nested domain usage until it is fully enabled in
subsequent series, which will introduce the support for IOMMUFD vIOMMU
for AMD.
I am working on preparing and sending out the next series.
Thanks,
Suravee
> The patch to get the info would have been a nice choice for this purpose..
>
> Jason
^ permalink raw reply [flat|nested] 39+ messages in thread