* [PATCH 0/8] iommu/amd: Introduce Nested Translation support
@ 2025-08-20 11:30 Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
` (7 more replies)
0 siblings, 8 replies; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
This series introduces support for AMD IOMMU nested page table translation
with the host (v1) and guest (v2) page tables. In this mode, the AMD IOMMU
driver configures the Device Table Entry (DTE) with host page table
root pointer and Guest CR3 (GCR3) root pointer along with other related
paremeters.
The host table is configured by allocating domain with page table type
IOMMU_HWPT_ALLOC_NEST_PARENT. The guest table is configured by the
amd_iommu_domain_alloc_nested() and amd_iommu_nested_attach_device()
using GPA of the GCR3 root pointer and GIOV, GLX, GuestPagingMode
parametetrs from guest.
Thanks,
Suravee
Suravee Suthikulpanit (8):
iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
iommu/amd: Making device attach / detach helpers non-static
iommu/amd: Making amd_iommu_pdev_enable_cap_ats non-static
iommu/amd: Introduce struct gcr3_tbl_info.giov
iommufd: Introduce data struct for AMD nested domain allocation
iommu/amd: Add support for nest parent domain allocation
iommu/amd: Add support for nested domain allocation
iommu/amd: Add support for nested domain attach/detach
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 9 ++
drivers/iommu/amd/amd_iommu_types.h | 12 +++
drivers/iommu/amd/init.c | 3 +
drivers/iommu/amd/iommu.c | 134 ++++++++++++++++++------
drivers/iommu/amd/nested.c | 153 ++++++++++++++++++++++++++++
include/uapi/linux/iommufd.h | 11 ++
7 files changed, 290 insertions(+), 34 deletions(-)
create mode 100644 drivers/iommu/amd/nested.c
--
2.34.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-09-02 13:03 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 2/8] iommu/amd: Making device attach / detach helpers non-static Suravee Suthikulpanit
` (6 subsequent siblings)
7 siblings, 1 reply; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
To allow reuse in other files in subsequent patches.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 9b4b589a54b5..6ea549816a1f 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -26,6 +26,7 @@ void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
+int amd_iommu_pdom_id_alloc(void);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ebe1cb9b0807..a05b7f69ac74 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1816,7 +1816,7 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag)
*
****************************************************************************/
-static int pdom_id_alloc(void)
+int amd_iommu_pdom_id_alloc(void)
{
return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC);
}
@@ -1904,7 +1904,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
return -EBUSY;
/* Allocate per device domain ID */
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0)
return -ENOSPC;
gcr3_info->domid = domid;
@@ -2487,7 +2487,7 @@ struct protection_domain *protection_domain_alloc(void)
if (!domain)
return NULL;
- domid = pdom_id_alloc();
+ domid = amd_iommu_pdom_id_alloc();
if (domid <= 0) {
kfree(domain);
return NULL;
@@ -2679,7 +2679,7 @@ void amd_iommu_init_identity_domain(void)
domain->ops = &identity_domain_ops;
domain->owner = &amd_iommu_ops;
- identity_domain.id = pdom_id_alloc();
+ identity_domain.id = amd_iommu_pdom_id_alloc();
protection_domain_init(&identity_domain);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/8] iommu/amd: Making device attach / detach helpers non-static
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-09-02 13:10 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 3/8] iommu/amd: Making amd_iommu_pdev_enable_cap_ats non-static Suravee Suthikulpanit
` (5 subsequent siblings)
7 siblings, 1 reply; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
To allow reuse in subsequent patches.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 2 ++
drivers/iommu/amd/iommu.c | 11 +++++------
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 6ea549816a1f..5085aa353522 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -88,6 +88,8 @@ int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag);
* This function flushes all internal caches of
* the IOMMU used by this driver.
*/
+int __amd_iommu_attach_device(struct device *dev, struct protection_domain *domain);
+void amd_iommu_detach_device(struct device *dev);
void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
void amd_iommu_update_and_flush_device_table(struct protection_domain *domain);
void amd_iommu_domain_flush_pages(struct protection_domain *domain,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a05b7f69ac74..207fef84a4c5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2262,8 +2262,7 @@ static void pdom_detach_iommu(struct amd_iommu *iommu,
* If a device is not yet associated with a domain, this function makes the
* device visible in the domain
*/
-static int attach_device(struct device *dev,
- struct protection_domain *domain)
+int __amd_iommu_attach_device(struct device *dev, struct protection_domain *domain)
{
struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
@@ -2325,7 +2324,7 @@ static int attach_device(struct device *dev,
/*
* Removes a device from a protection domain (with devtable_lock held)
*/
-static void detach_device(struct device *dev)
+void amd_iommu_detach_device(struct device *dev)
{
struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
@@ -2639,7 +2638,7 @@ static int blocked_domain_attach_device(struct iommu_domain *domain,
struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
if (dev_data->domain)
- detach_device(dev);
+ amd_iommu_detach_device(dev);
/* Clear DTE and flush the entry */
mutex_lock(&dev_data->mutex);
@@ -2717,9 +2716,9 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
return -EINVAL;
if (dev_data->domain)
- detach_device(dev);
+ amd_iommu_detach_device(dev);
- ret = attach_device(dev, domain);
+ ret = __amd_iommu_attach_device(dev, domain);
#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 3/8] iommu/amd: Making amd_iommu_pdev_enable_cap_ats non-static
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 2/8] iommu/amd: Making device attach / detach helpers non-static Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 4/8] iommu/amd: Introduce struct gcr3_tbl_info.giov Suravee Suthikulpanit
` (4 subsequent siblings)
7 siblings, 0 replies; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
To allow reuse in subsequent patches.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 5085aa353522..3ff380afb9f4 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -27,6 +27,7 @@ void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
int amd_iommu_pdom_id_alloc(void);
+int amd_iommu_pdev_enable_cap_ats(struct pci_dev *pdev);
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(void);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 207fef84a4c5..44f9a8990d87 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -531,7 +531,7 @@ static u32 pdev_get_caps(struct pci_dev *pdev)
return flags;
}
-static inline int pdev_enable_cap_ats(struct pci_dev *pdev)
+int amd_iommu_pdev_enable_cap_ats(struct pci_dev *pdev)
{
struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
int ret = -EINVAL;
@@ -629,7 +629,7 @@ static inline void pdev_disable_cap_pasid(struct pci_dev *pdev)
static void pdev_enable_caps(struct pci_dev *pdev)
{
pdev_enable_cap_pasid(pdev);
- pdev_enable_cap_ats(pdev);
+ amd_iommu_pdev_enable_cap_ats(pdev);
pdev_enable_cap_pri(pdev);
}
@@ -2303,7 +2303,7 @@ int __amd_iommu_attach_device(struct device *dev, struct protection_domain *doma
if (amd_iommu_iopf_add_device(iommu, dev_data))
pdev_disable_cap_pri(pdev);
} else if (pdev) {
- pdev_enable_cap_ats(pdev);
+ amd_iommu_pdev_enable_cap_ats(pdev);
}
/* Update data structures */
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 4/8] iommu/amd: Introduce struct gcr3_tbl_info.giov
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (2 preceding siblings ...)
2025-08-20 11:30 ` [PATCH 3/8] iommu/amd: Making amd_iommu_pdev_enable_cap_ats non-static Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-09-02 13:07 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 5/8] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
` (3 subsequent siblings)
7 siblings, 1 reply; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
To track DTE[GIOV] programming during IOMMU domain attach, also add logic
to determine if the GIOV is required, and set the variable accordinglly.
Then check the variable before programming the DTE.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/iommu.c | 18 ++++++++++++++----
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index efdd0cbda1df..44c44943802c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -555,6 +555,7 @@ struct gcr3_tbl_info {
int glx; /* Number of levels for GCR3 table */
u32 pasid_cnt; /* Track attached PASIDs */
u16 domid; /* Per device domain ID */
+ bool giov; /* Track DTE[GIOV] */
};
struct amd_io_pgtable {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 44f9a8990d87..f463774e4b71 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1872,6 +1872,7 @@ static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
iommu_free_pages(gcr3_info->gcr3_tbl);
gcr3_info->gcr3_tbl = NULL;
+ gcr3_info->giov = false;
}
/*
@@ -2027,8 +2028,8 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
if (!gcr3_info->gcr3_tbl)
return;
- pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
- __func__, dev_data->devid, gcr3_info->glx,
+ pr_debug("%s: devid=%#x, glx=%#x, giov=%#x, gcr3_tbl=%#llx\n",
+ __func__, dev_data->devid, gcr3_info->glx, gcr3_info->giov,
(unsigned long long)gcr3_info->gcr3_tbl);
gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
@@ -2036,7 +2037,7 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
target->data[0] |= DTE_FLAG_GV |
FIELD_PREP(DTE_GLX, gcr3_info->glx) |
FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
- if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+ if (gcr3_info->giov)
target->data[0] |= DTE_FLAG_GIOV;
target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
@@ -2178,8 +2179,17 @@ static int init_gcr3_table(struct iommu_dev_data *dev_data,
return ret;
ret = update_gcr3(dev_data, 0, iommu_virt_to_phys(pdom->iop.pgd), true);
- if (ret)
+ if (!ret) {
+ /*
+ * GIOV is required for PD_MODE_V2 because we need
+ * to support the case where the end-point device
+ * does not have PASID in the TLP prefix when setting
+ * up to use the v2 table.
+ */
+ dev_data->gcr3_info.giov = true;
+ } else {
free_gcr3_table(&dev_data->gcr3_info);
+ }
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 5/8] iommufd: Introduce data struct for AMD nested domain allocation
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (3 preceding siblings ...)
2025-08-20 11:30 ` [PATCH 4/8] iommu/amd: Introduce struct gcr3_tbl_info.giov Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-09-02 13:09 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 6/8] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
` (2 subsequent siblings)
7 siblings, 1 reply; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_V2 data type for AMD IOMMU v2 page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
include/uapi/linux/iommufd.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 0f7212f9e0ce..a12353488c83 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -455,16 +455,27 @@ struct iommu_hwpt_arm_smmuv3 {
__aligned_le64 ste[2];
};
+/**
+ * struct iommu_hwpt_amd_v2 - AMD IOMMU specific user-managed
+ * v2 I/O page table data
+ * @dte: Guest Device Table Entry (DTE)
+ */
+struct iommu_hwpt_amd_v2 {
+ __aligned_u64 dte[4];
+};
+
/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
* @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
+ * @IOMMU_HWPT_DATA_AMD_V2: AMD IOMMUv2 page table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE = 0,
IOMMU_HWPT_DATA_VTD_S1 = 1,
IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
+ IOMMU_HWPT_DATA_AMD_V2 = 3,
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 6/8] iommu/amd: Add support for nest parent domain allocation
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (4 preceding siblings ...)
2025-08-20 11:30 ` [PATCH 5/8] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-09-02 13:12 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
7 siblings, 1 reply; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
To support nested translation, the nest parent domain is allocated with
IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
table for stage 2 (i.e. GPA->SPA).
Also, only support nest parent domain on AMD system, which can support
the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
order to program DTE[GCR3 Table Root Pointer] with the GPA.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/iommu.c | 24 ++++++++++++++++++++++--
2 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 44c44943802c..5343b99913e4 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -110,6 +110,7 @@
/* Extended Feature 2 Bits */
+#define FEATURE_GCR3TRPMODE BIT_ULL(3)
#define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
#define FEATURE_SNPAVICSUP_GAM(x) \
(FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f463774e4b71..46682c8ba28d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2599,20 +2599,39 @@ do_iommu_domain_alloc(struct device *dev, u32 flags,
return &domain->domain;
}
+static inline bool is_nest_parent_supported(u32 flags)
+{
+ /* Only allow nest parent when these features are supported */
+ if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+ (!check_feature(FEATURE_GT) ||
+ !check_feature(FEATURE_GIOSUP) ||
+ !check_feature2(FEATURE_GCR3TRPMODE)))
+ return false;
+
+ return true;
+}
+
static struct iommu_domain *
amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
- IOMMU_HWPT_ALLOC_PASID;
+ IOMMU_HWPT_ALLOC_PASID |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
- if ((flags & ~supported_flags) || user_data)
+ /* Check supported flags */
+ if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
return ERR_PTR(-EOPNOTSUPP);
+ pr_debug("%s: IOMMU devid=%#x, flags=%#x\n", __func__, dev_data->devid, flags);
+
switch (flags & supported_flags) {
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+ case IOMMU_HWPT_ALLOC_NEST_PARENT:
/* Allocate domain with v1 page table for dirty tracking */
if (!amd_iommu_hd_support(iommu))
break;
@@ -2626,6 +2645,7 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
/* If nothing specific is required use the kernel commandline default */
return do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
default:
+ pr_err("%s: Unhandled flag : 0x%x\n", __func__, flags);
break;
}
return ERR_PTR(-EOPNOTSUPP);
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (5 preceding siblings ...)
2025-08-20 11:30 ` [PATCH 6/8] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-08-22 5:39 ` kernel test robot
` (3 more replies)
2025-08-20 11:30 ` [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
7 siblings, 4 replies; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
The child domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest (v2) page tables. The struct iommu_hwpt_amd_v2
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().
The parent domain is tracked using the struct protection_domain.parent.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 5 ++
drivers/iommu/amd/amd_iommu_types.h | 5 ++
drivers/iommu/amd/iommu.c | 22 ++++----
drivers/iommu/amd/nested.c | 79 +++++++++++++++++++++++++++++
5 files changed, 103 insertions(+), 10 deletions(-)
create mode 100644 drivers/iommu/amd/nested.c
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 59c04a67f398..9ccf9d61810c 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,3 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o
+obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o nested.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 3ff380afb9f4..8e86d5b1d915 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -8,6 +8,7 @@
#define AMD_IOMMU_H
#include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
#include "amd_iommu_types.h"
@@ -190,4 +191,8 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+/* NESTED */
+struct iommu_domain *
+amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
+ u32 flags, const struct iommu_user_data *user_data);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 5343b99913e4..94f51a09b364 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -20,6 +20,8 @@
#include <linux/irqreturn.h>
#include <linux/io-pgtable.h>
+#include <uapi/linux/iommufd.h>
+
/*
* Maximum number of IOMMUs supported
*/
@@ -605,6 +607,9 @@ struct protection_domain {
struct mmu_notifier mn; /* mmu notifier for the SVA domain */
struct list_head dev_data_list; /* List of pdom_dev_data */
+
+ struct protection_domain *parent; /* Nested parent domain */
+ struct iommu_hwpt_amd_v2 guest_hwpt;
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 46682c8ba28d..ea790a8997ee 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2616,6 +2616,7 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
{
+ struct iommu_domain *dom = ERR_PTR(-EOPNOTSUPP);
struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
@@ -2626,29 +2627,31 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
return ERR_PTR(-EOPNOTSUPP);
- pr_debug("%s: IOMMU devid=%#x, flags=%#x\n", __func__, dev_data->devid, flags);
+ pr_debug("%s: IOMMU devid=%#x, flags=%#x, supported_flags=%#x\n", __func__, dev_data->devid, flags, supported_flags);
switch (flags & supported_flags) {
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
case IOMMU_HWPT_ALLOC_NEST_PARENT:
/* Allocate domain with v1 page table for dirty tracking */
- if (!amd_iommu_hd_support(iommu))
- break;
- return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
+ if (amd_iommu_hd_support(iommu))
+ dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
+ break;
case IOMMU_HWPT_ALLOC_PASID:
/* Allocate domain with v2 page table if IOMMU supports PASID. */
- if (!amd_iommu_pasid_supported())
- break;
- return do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
+ if (amd_iommu_pasid_supported())
+ dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
+ break;
case 0:
/* If nothing specific is required use the kernel commandline default */
- return do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
+ dom = do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
+ break;
default:
pr_err("%s: Unhandled flag : 0x%x\n", __func__, flags);
break;
}
- return ERR_PTR(-EOPNOTSUPP);
+
+ return dom;
}
void amd_iommu_domain_free(struct iommu_domain *dom)
@@ -3113,6 +3116,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_domain = &release_domain,
.identity_domain = &identity_domain.domain,
.domain_alloc_paging_flags = amd_iommu_domain_alloc_paging_flags,
+ .domain_alloc_nested = amd_iommu_domain_alloc_nested,
.domain_alloc_sva = amd_iommu_domain_alloc_sva,
.probe_device = amd_iommu_probe_device,
.release_device = amd_iommu_release_device,
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
new file mode 100644
index 000000000000..09f2a455af33
--- /dev/null
+++ b/drivers/iommu/amd/nested.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ */
+
+#define pr_fmt(fmt) "AMD-Vi: " fmt
+#define dev_fmt(fmt) pr_fmt(fmt)
+
+#include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
+
+#include "amd_iommu.h"
+#include "amd_iommu_types.h"
+
+const struct iommu_domain_ops nested_domain_ops = {
+ .free = amd_iommu_domain_free,
+};
+
+static int udata_to_iommu_hwpt_amd_v2(const struct iommu_user_data *user_data,
+ struct iommu_hwpt_amd_v2 *hwpt)
+{
+ if (!user_data)
+ return -EINVAL;
+
+ if (user_data->type != IOMMU_HWPT_DATA_AMD_V2)
+ return -EOPNOTSUPP;
+
+ return iommu_copy_struct_from_user(hwpt, user_data,
+ IOMMU_HWPT_DATA_AMD_V2,
+ dte);
+}
+
+struct iommu_domain *
+amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
+ u32 flags, const struct iommu_user_data *user_data)
+{
+ int ret;
+ struct iommu_hwpt_amd_v2 hwpt;
+ struct protection_domain *pdom;
+
+ if (parent->ops != amd_iommu_ops.default_domain_ops)
+ return ERR_PTR(-EINVAL);
+
+ ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
+ if (ret)
+ return ERR_PTR(ret);
+
+ pdom = kzalloc(sizeof(*pdom), GFP_KERNEL);
+ if (IS_ERR(pdom))
+ return ERR_PTR(-ENOMEM);
+
+ pdom->id = amd_iommu_pdom_id_alloc();
+ if (!pdom->id)
+ goto out_err;
+
+ pr_debug("%s: Allocating nested domain with parent domid=%#x\n",
+ __func__, to_pdomain(parent)->id);
+
+ spin_lock_init(&pdom->lock);
+ INIT_LIST_HEAD(&pdom->dev_list);
+ INIT_LIST_HEAD(&pdom->dev_data_list);
+ xa_init(&pdom->iommu_array);
+
+ pdom->pd_mode = PD_MODE_V2;
+ pdom->iop.pgtbl.cfg.amd.nid = NUMA_NO_NODE;
+ pdom->parent = to_pdomain(parent);
+ pdom->domain.ops = &nested_domain_ops;
+ pdom->domain.type = IOMMU_DOMAIN_NESTED;
+ pdom->domain.geometry.aperture_start = 0;
+ pdom->domain.geometry.aperture_end = ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
+ pdom->domain.geometry.force_aperture = true;
+ pdom->domain.pgsize_bitmap = pdom->iop.pgtbl.cfg.pgsize_bitmap;
+ memcpy(&pdom->guest_hwpt, &hwpt, sizeof(struct iommu_hwpt_amd_v2));
+
+ return &pdom->domain;
+out_err:
+ kfree(pdom);
+ return ERR_PTR(-EINVAL);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (6 preceding siblings ...)
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
@ 2025-08-20 11:30 ` Suravee Suthikulpanit
2025-08-22 20:20 ` Nicolin Chen
` (2 more replies)
7 siblings, 3 replies; 26+ messages in thread
From: Suravee Suthikulpanit @ 2025-08-20 11:30 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
Suravee Suthikulpanit
Programs guest translation settings in the host DTE when attaches the
nested domain to a device.
Also, enable the GCR3TRPMode feature when supported.
Note that nested translation is only supported with the GCR3TRP mode.
When it is enabled, the AMD IOMMU driver programs the GCR3 Table Root
Pointer field of the device table entry with the GPA provided by the guest.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 5 ++
drivers/iommu/amd/init.c | 3 ++
drivers/iommu/amd/iommu.c | 49 +++++++++++++++---
drivers/iommu/amd/nested.c | 80 +++++++++++++++++++++++++++--
4 files changed, 127 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 94f51a09b364..f8c392aadeb1 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -191,6 +191,7 @@
#define CONTROL_EPH_EN 45
#define CONTROL_XT_EN 50
#define CONTROL_INTCAPXT_EN 51
+#define CONTROL_GCR3TRPMODE 58
#define CONTROL_IRTCACHEDIS 59
#define CONTROL_SNPAVIC_EN 61
@@ -420,6 +421,8 @@
#define DTE_FLAG_V BIT_ULL(0)
#define DTE_FLAG_TV BIT_ULL(1)
#define DTE_FLAG_HAD (3ULL << 7)
+#define DTE_FLAG_PPR BIT_ULL(52)
+#define DTE_FLAG_GLX BIT_ULL(53)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
@@ -555,6 +558,7 @@ struct amd_irte_ops;
struct gcr3_tbl_info {
u64 *gcr3_tbl; /* Guest CR3 table */
+ u64 trp_gpa; /* Guest CR3 TRP GPA for nested domain */
int glx; /* Number of levels for GCR3 table */
u32 pasid_cnt; /* Track attached PASIDs */
u16 domid; /* Per device domain ID */
@@ -610,6 +614,7 @@ struct protection_domain {
struct protection_domain *parent; /* Nested parent domain */
struct iommu_hwpt_amd_v2 guest_hwpt;
+ u16 guest_paging_mode; /* Guest paging mode */
};
/*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 8de689b2c5ed..b340afd6901f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -971,6 +971,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
return;
iommu_feature_enable(iommu, CONTROL_GT_EN);
+
+ if (check_feature2(FEATURE_GCR3TRPMODE))
+ iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
}
/* sets a specific bit in the device table entry. */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ea790a8997ee..935eaffb6814 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1856,6 +1856,18 @@ static void free_gcr3_tbl_level2(u64 *tbl)
}
}
+static inline bool amd_iommu_domain_is_nested(struct protection_domain *pdom)
+{
+ return (pdom && (pdom->domain.type == IOMMU_DOMAIN_NESTED));
+}
+
+static inline bool has_gcr3_table(struct gcr3_tbl_info *gcr3_info)
+{
+ if (!gcr3_info || (!gcr3_info->gcr3_tbl && !gcr3_info->trp_gpa))
+ return false;
+ return true;
+}
+
static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
{
if (gcr3_info->glx == 2)
@@ -1901,7 +1913,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
if (levels > amd_iommu_max_glx_val)
return -EINVAL;
- if (gcr3_info->gcr3_tbl)
+ if (has_gcr3_table(gcr3_info))
return -EBUSY;
/* Allocate per device domain ID */
@@ -2023,17 +2035,31 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
struct dev_table_entry *target)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ struct protection_domain *pdom = dev_data->domain;
u64 gcr3;
- if (!gcr3_info->gcr3_tbl)
+ if (!has_gcr3_table(gcr3_info))
return;
- pr_debug("%s: devid=%#x, glx=%#x, giov=%#x, gcr3_tbl=%#llx\n",
+ /* We need to check host capability before setting the mode. */
+ if ((pdom->guest_paging_mode == PAGE_MODE_5_LEVEL) &&
+ (amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)) {
+ pr_err("Cannot support Guest paging mode=%#x (dom_id=%#x).\n",
+ pdom->guest_paging_mode, pdom->id);
+ return;
+ }
+
+ pr_debug("%s: devid=%#x, glx=%#x, giov=%#x, gcr3_tbl=%#llx, trp_gpa=%#llx, type=%#x\n",
__func__, dev_data->devid, gcr3_info->glx, gcr3_info->giov,
- (unsigned long long)gcr3_info->gcr3_tbl);
+ (unsigned long long)gcr3_info->gcr3_tbl, gcr3_info->trp_gpa,
+ pdom->domain.type);
gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+ /* For nested domain, use GCR3 GPA provided */
+ if (gcr3_info->trp_gpa)
+ gcr3 = gcr3_info->trp_gpa;
+
target->data[0] |= DTE_FLAG_GV |
FIELD_PREP(DTE_GLX, gcr3_info->glx) |
FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
@@ -2044,7 +2070,7 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
/* Guest page table can only support 4 and 5 levels */
- if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
+ if (pdom->guest_paging_mode == PAGE_MODE_5_LEVEL)
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
else
target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
@@ -2061,7 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- if (gcr3_info && gcr3_info->gcr3_tbl)
+ /*
+ * For nested domain, use parent domain to setup v1 table
+ * information and domain id.
+ */
+ if (amd_iommu_domain_is_nested(domain))
+ domain = domain->parent;
+
+ if (has_gcr3_table(gcr3_info))
domid = dev_data->gcr3_info.domid;
else
domid = domain->id;
@@ -2293,7 +2326,8 @@ int __amd_iommu_attach_device(struct device *dev, struct protection_domain *doma
goto out;
/* Setup GCR3 table */
- if (pdom_is_sva_capable(domain)) {
+ if (!amd_iommu_domain_is_nested(domain) && pdom_is_sva_capable(domain)) {
+ pr_warn("%s: Allocating guest page table\n", __func__);
ret = init_gcr3_table(dev_data, domain);
if (ret) {
pdom_detach_iommu(iommu, domain);
@@ -2519,6 +2553,7 @@ static int pdom_setup_pgtable(struct protection_domain *domain,
fmt = AMD_IOMMU_V1;
break;
case PD_MODE_V2:
+ domain->guest_paging_mode = amd_iommu_gpt_level;
fmt = AMD_IOMMU_V2;
break;
case PD_MODE_NONE:
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 09f2a455af33..c9bf44e6298d 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -12,9 +12,7 @@
#include "amd_iommu.h"
#include "amd_iommu_types.h"
-const struct iommu_domain_ops nested_domain_ops = {
- .free = amd_iommu_domain_free,
-};
+const struct iommu_domain_ops nested_domain_ops;
static int udata_to_iommu_hwpt_amd_v2(const struct iommu_user_data *user_data,
struct iommu_hwpt_amd_v2 *hwpt)
@@ -77,3 +75,79 @@ amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
kfree(pdom);
return ERR_PTR(-EINVAL);
}
+
+static inline u64 hwpt_to_gcr3_trp(u64 *dte)
+{
+ u64 gcr3;
+
+ gcr3 = (FIELD_GET(DTE_GCR3_14_12, dte[0]) << 12);
+ gcr3 |= (FIELD_GET(DTE_GCR3_30_15, dte[1]) << 15);
+ gcr3 |= (FIELD_GET(DTE_GCR3_51_31, dte[1]) << 31);
+ return gcr3;
+}
+
+static int nested_gcr3_update(struct protection_domain *pdom, struct device *dev)
+{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+ struct iommu_hwpt_amd_v2 *hwpt = &pdom->guest_hwpt;
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pdev || !hwpt)
+ return -EINVAL;
+
+ /* Note: Currently only support GCR3TRPMode with nested translation */
+ if (!check_feature2(FEATURE_GCR3TRPMODE))
+ return -EOPNOTSUPP;
+
+ if (FIELD_GET(DTE_GPT_LEVEL_MASK, hwpt->dte[2]) == GUEST_PGTABLE_5_LEVEL)
+ pdom->guest_paging_mode = PAGE_MODE_5_LEVEL;
+ else
+ pdom->guest_paging_mode = PAGE_MODE_4_LEVEL;
+
+ dev_data->ppr = FIELD_GET(DTE_FLAG_PPR, hwpt->dte[0]);
+ dev_data->gcr3_info.glx = FIELD_GET(DTE_FLAG_GLX, hwpt->dte[0]);
+ dev_data->gcr3_info.giov = FIELD_GET(DTE_FLAG_GIOV, hwpt->dte[0]);
+ dev_data->gcr3_info.trp_gpa = hwpt_to_gcr3_trp(hwpt->dte);
+ /* Due to possible aliasing issue use nested domain ID */
+ dev_data->gcr3_info.domid = pdom->id;
+ pr_debug("%s: devid=%#x, domid=%#x, trp_gpa=%#llx, glx=%#x\n", __func__,
+ pci_dev_id(pdev),
+ dev_data->gcr3_info.domid,
+ dev_data->gcr3_info.trp_gpa,
+ dev_data->gcr3_info.glx);
+
+ return 0;
+}
+
+static int amd_iommu_nested_attach_device(struct iommu_domain *dom, struct device *dev)
+{
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+ struct protection_domain *pdom = to_pdomain(dom);
+ struct pci_dev *pdev;
+ int ret;
+
+ if (dev_data->domain == pdom)
+ return 0;
+
+ ret = nested_gcr3_update(pdom, dev);
+ if (ret)
+ return ret;
+
+ if (dev_data->domain)
+ amd_iommu_detach_device(dev);
+
+ ret = __amd_iommu_attach_device(dev, pdom);
+ if (ret)
+ return ret;
+
+ pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
+ if (pdev)
+ amd_iommu_pdev_enable_cap_ats(pdev);
+
+ return ret;
+}
+
+const struct iommu_domain_ops nested_domain_ops = {
+ .attach_dev = amd_iommu_nested_attach_device,
+ .free = amd_iommu_domain_free,
+};
--
2.34.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
@ 2025-08-22 5:39 ` kernel test robot
2025-08-22 19:51 ` Nicolin Chen
` (2 subsequent siblings)
3 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2025-08-22 5:39 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: oe-kbuild-all, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, Suravee Suthikulpanit
Hi Suravee,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.17-rc2 next-20250821]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Suravee-Suthikulpanit/iommu-amd-Make-amd_iommu_pdom_id_alloc-non-static/20250820-194937
base: linus/master
patch link: https://lore.kernel.org/r/20250820113009.5233-8-suravee.suthikulpanit%40amd.com
patch subject: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
config: x86_64-randconfig-123-20250822 (https://download.01.org/0day-ci/archive/20250822/202508221308.4CwLNeZw-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250822/202508221308.4CwLNeZw-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508221308.4CwLNeZw-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> drivers/iommu/amd/nested.c:15:31: sparse: sparse: symbol 'nested_domain_ops' was not declared. Should it be static?
vim +/nested_domain_ops +15 drivers/iommu/amd/nested.c
14
> 15 const struct iommu_domain_ops nested_domain_ops = {
16 .free = amd_iommu_domain_free,
17 };
18
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
2025-08-22 5:39 ` kernel test robot
@ 2025-08-22 19:51 ` Nicolin Chen
2025-08-22 21:16 ` Jason Gunthorpe
2025-09-01 7:09 ` Sairaj Kodilkar
2025-09-02 13:18 ` Jason Gunthorpe
3 siblings, 1 reply; 26+ messages in thread
From: Nicolin Chen @ 2025-08-22 19:51 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh
On Wed, Aug 20, 2025 at 11:30:08AM +0000, Suravee Suthikulpanit wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 46682c8ba28d..ea790a8997ee 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2616,6 +2616,7 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> const struct iommu_user_data *user_data)
>
> {
> + struct iommu_domain *dom = ERR_PTR(-EOPNOTSUPP);
> struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
> const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> @@ -2626,29 +2627,31 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
> return ERR_PTR(-EOPNOTSUPP);
>
> - pr_debug("%s: IOMMU devid=%#x, flags=%#x\n", __func__, dev_data->devid, flags);
> + pr_debug("%s: IOMMU devid=%#x, flags=%#x, supported_flags=%#x\n", __func__, dev_data->devid, flags, supported_flags);
>
> switch (flags & supported_flags) {
> case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
> case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
> case IOMMU_HWPT_ALLOC_NEST_PARENT:
> /* Allocate domain with v1 page table for dirty tracking */
> - if (!amd_iommu_hd_support(iommu))
> - break;
> - return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> + if (amd_iommu_hd_support(iommu))
> + dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> + break;
> case IOMMU_HWPT_ALLOC_PASID:
> /* Allocate domain with v2 page table if IOMMU supports PASID. */
> - if (!amd_iommu_pasid_supported())
> - break;
> - return do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
> + if (amd_iommu_pasid_supported())
> + dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
> + break;
> case 0:
> /* If nothing specific is required use the kernel commandline default */
> - return do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> + dom = do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> + break;
> default:
> pr_err("%s: Unhandled flag : 0x%x\n", __func__, flags);
> break;
> }
> - return ERR_PTR(-EOPNOTSUPP);
> +
> + return dom;
These seem better to be a preparatory patch.
> @@ -3113,6 +3116,7 @@ const struct iommu_ops amd_iommu_ops = {
> .release_domain = &release_domain,
> .identity_domain = &identity_domain.domain,
> .domain_alloc_paging_flags = amd_iommu_domain_alloc_paging_flags,
> + .domain_alloc_nested = amd_iommu_domain_alloc_nested,
> .domain_alloc_sva = amd_iommu_domain_alloc_sva,
> .probe_device = amd_iommu_probe_device,
> .release_device = amd_iommu_release_device,
This will be an HWPT-based nesting support, v.s. vIOMMU-based.
If AMD wants to enable its Command/Event Buffers, I think this
should follow the vIOMMU model instead.
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + */
> +
> +#define pr_fmt(fmt) "AMD-Vi: " fmt
> +#define dev_fmt(fmt) pr_fmt(fmt)
> +
> +#include <linux/iommu.h>
> +#include <uapi/linux/iommufd.h>
> +
> +#include "amd_iommu.h"
> +#include "amd_iommu_types.h"
It seems that you already included the uapi header in "amd_iommu.h".
> +static int udata_to_iommu_hwpt_amd_v2(const struct iommu_user_data *user_data,
> + struct iommu_hwpt_amd_v2 *hwpt)
> +{
> + if (!user_data)
> + return -EINVAL;
> +
> + if (user_data->type != IOMMU_HWPT_DATA_AMD_V2)
> + return -EOPNOTSUPP;
> +
iommu_copy_struct_from_user() internally checks these two already.
> + return iommu_copy_struct_from_user(hwpt, user_data,
> + IOMMU_HWPT_DATA_AMD_V2,
> + dte);
> +}
> +
> +struct iommu_domain *
> +amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
> + u32 flags, const struct iommu_user_data *user_data)
> +{
> + int ret;
> + struct iommu_hwpt_amd_v2 hwpt;
> + struct protection_domain *pdom;
> +
> + if (parent->ops != amd_iommu_ops.default_domain_ops)
> + return ERR_PTR(-EINVAL);
> +
> + ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + pdom = kzalloc(sizeof(*pdom), GFP_KERNEL);
> + if (IS_ERR(pdom))
> + return ERR_PTR(-ENOMEM);
> +
> + pdom->id = amd_iommu_pdom_id_alloc();
> + if (!pdom->id)
> + goto out_err;
This seems incorrect. amd_iommu_pdom_id_alloc() is a wrapper of the
ida_alloc_range() that would return -ENOMEM or -ENOSPC on failure.
Also, -EINVAL in out_err isn't nice to replace either of them.
So, I think this should be:
if (pdom->id <= 0) {
ret = pdom->id;
goto out_err;
}
> +
> + pr_debug("%s: Allocating nested domain with parent domid=%#x\n",
> + __func__, to_pdomain(parent)->id);
> +
> + spin_lock_init(&pdom->lock);
> + INIT_LIST_HEAD(&pdom->dev_list);
> + INIT_LIST_HEAD(&pdom->dev_data_list);
> + xa_init(&pdom->iommu_array);
> +
> + pdom->pd_mode = PD_MODE_V2;
> + pdom->iop.pgtbl.cfg.amd.nid = NUMA_NO_NODE;
> + pdom->parent = to_pdomain(parent);
> + pdom->domain.ops = &nested_domain_ops;
> + pdom->domain.type = IOMMU_DOMAIN_NESTED;
> + pdom->domain.geometry.aperture_start = 0;
> + pdom->domain.geometry.aperture_end = ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> + pdom->domain.geometry.force_aperture = true;
> + pdom->domain.pgsize_bitmap = pdom->iop.pgtbl.cfg.pgsize_bitmap;
> + memcpy(&pdom->guest_hwpt, &hwpt, sizeof(struct iommu_hwpt_amd_v2));
How about just hold a "struct dev_table_entry guest_dte" in the
pdom, instead of holding a uAPI structure?
Nicolin
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach
2025-08-20 11:30 ` [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2025-08-22 20:20 ` Nicolin Chen
2025-08-28 0:36 ` Suthikulpanit, Suravee
2025-09-02 13:25 ` Jason Gunthorpe
2025-09-03 8:27 ` Sairaj Kodilkar
2 siblings, 1 reply; 26+ messages in thread
From: Nicolin Chen @ 2025-08-22 20:20 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh
On Wed, Aug 20, 2025 at 11:30:09AM +0000, Suravee Suthikulpanit wrote:
> +static inline bool has_gcr3_table(struct gcr3_tbl_info *gcr3_info)
> +{
> + if (!gcr3_info || (!gcr3_info->gcr3_tbl && !gcr3_info->trp_gpa))
> + return false;
"gcr3_info" seems always pointing to "&dev_data->gcr3_info", which
can never be NULL.
> @@ -2061,7 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
>
> - if (gcr3_info && gcr3_info->gcr3_tbl)
> + /*
> + * For nested domain, use parent domain to setup v1 table
> + * information and domain id.
> + */
> + if (amd_iommu_domain_is_nested(domain))
> + domain = domain->parent;
> +
> + if (has_gcr3_table(gcr3_info))
> domid = dev_data->gcr3_info.domid;
There is already a local variable "gcr3_info".
> +static int nested_gcr3_update(struct protection_domain *pdom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct iommu_hwpt_amd_v2 *hwpt = &pdom->guest_hwpt;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pdev || !hwpt)
> + return -EINVAL;
to_pci_dev is a container_of from the dev. !pdev indicates a !dev
that should never happen in the path of an attach_dev op. Or, did
you actually want to check if dev_is_pci(dev)?
Also, hwpt is "&pdom->guest_hwpt", which would never be NULL.
> +static int amd_iommu_nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct protection_domain *pdom = to_pdomain(dom);
> + struct pci_dev *pdev;
> + int ret;
> +
> + if (dev_data->domain == pdom)
> + return 0;
> +
> + ret = nested_gcr3_update(pdom, dev);
> + if (ret)
> + return ret;
> +
> + if (dev_data->domain)
> + amd_iommu_detach_device(dev);
> +
> + ret = __amd_iommu_attach_device(dev, pdom);
> + if (ret)
> + return ret;
> +
> + pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
> + if (pdev)
> + amd_iommu_pdev_enable_cap_ats(pdev);
Is "dev_data->dev" expected to be "dev"?
Nicolin
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-22 19:51 ` Nicolin Chen
@ 2025-08-22 21:16 ` Jason Gunthorpe
2025-08-22 21:45 ` Nicolin Chen
0 siblings, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2025-08-22 21:16 UTC (permalink / raw)
To: Nicolin Chen
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Fri, Aug 22, 2025 at 12:51:02PM -0700, Nicolin Chen wrote:
> > @@ -3113,6 +3116,7 @@ const struct iommu_ops amd_iommu_ops = {
> > .release_domain = &release_domain,
> > .identity_domain = &identity_domain.domain,
> > .domain_alloc_paging_flags = amd_iommu_domain_alloc_paging_flags,
> > + .domain_alloc_nested = amd_iommu_domain_alloc_nested,
> > .domain_alloc_sva = amd_iommu_domain_alloc_sva,
> > .probe_device = amd_iommu_probe_device,
> > .release_device = amd_iommu_release_device,
>
> This will be an HWPT-based nesting support, v.s. vIOMMU-based.
>
> If AMD wants to enable its Command/Event Buffers, I think this
> should follow the vIOMMU model instead.
I've been expecting drivers to do both, like ARM.. Nested is the basic
infrastructure and then the viommu changes what domain id nested will
use similar to how ARM constrains the VMID based on the viommu.
Suravee is this how you see AMD evolving as well?
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-22 21:16 ` Jason Gunthorpe
@ 2025-08-22 21:45 ` Nicolin Chen
2025-08-25 13:38 ` Jason Gunthorpe
0 siblings, 1 reply; 26+ messages in thread
From: Nicolin Chen @ 2025-08-22 21:45 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Fri, Aug 22, 2025 at 06:16:18PM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 22, 2025 at 12:51:02PM -0700, Nicolin Chen wrote:
> > > @@ -3113,6 +3116,7 @@ const struct iommu_ops amd_iommu_ops = {
> > > .release_domain = &release_domain,
> > > .identity_domain = &identity_domain.domain,
> > > .domain_alloc_paging_flags = amd_iommu_domain_alloc_paging_flags,
> > > + .domain_alloc_nested = amd_iommu_domain_alloc_nested,
> > > .domain_alloc_sva = amd_iommu_domain_alloc_sva,
> > > .probe_device = amd_iommu_probe_device,
> > > .release_device = amd_iommu_release_device,
> >
> > This will be an HWPT-based nesting support, v.s. vIOMMU-based.
> >
> > If AMD wants to enable its Command/Event Buffers, I think this
> > should follow the vIOMMU model instead.
>
> I've been expecting drivers to do both, like ARM.. Nested is the basic
> infrastructure and then the viommu changes what domain id nested will
> use similar to how ARM constrains the VMID based on the viommu.
Hmm, we haven't implemented both in ARM but only the vIOMMU-based
one..
Yea, AMD could start with the HWPT-based one, but that would need
an invalidation op(?), which seems missing in this series. So, to
support direct invalidation via Command Buffers, vIOMMU should be
the only option, right?
Nicolin
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-22 21:45 ` Nicolin Chen
@ 2025-08-25 13:38 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-08-25 13:38 UTC (permalink / raw)
To: Nicolin Chen
Cc: Suravee Suthikulpanit, linux-kernel, robin.murphy, will, joro,
kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Fri, Aug 22, 2025 at 02:45:58PM -0700, Nicolin Chen wrote:
> On Fri, Aug 22, 2025 at 06:16:18PM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 22, 2025 at 12:51:02PM -0700, Nicolin Chen wrote:
> > > > @@ -3113,6 +3116,7 @@ const struct iommu_ops amd_iommu_ops = {
> > > > .release_domain = &release_domain,
> > > > .identity_domain = &identity_domain.domain,
> > > > .domain_alloc_paging_flags = amd_iommu_domain_alloc_paging_flags,
> > > > + .domain_alloc_nested = amd_iommu_domain_alloc_nested,
> > > > .domain_alloc_sva = amd_iommu_domain_alloc_sva,
> > > > .probe_device = amd_iommu_probe_device,
> > > > .release_device = amd_iommu_release_device,
> > >
> > > This will be an HWPT-based nesting support, v.s. vIOMMU-based.
> > >
> > > If AMD wants to enable its Command/Event Buffers, I think this
> > > should follow the vIOMMU model instead.
> >
> > I've been expecting drivers to do both, like ARM.. Nested is the basic
> > infrastructure and then the viommu changes what domain id nested will
> > use similar to how ARM constrains the VMID based on the viommu.
>
> Hmm, we haven't implemented both in ARM but only the vIOMMU-based
> one..
Oh? I mis-remember we had kept the hwpt based non-ATS invalidation
around.. I guess that was Intel that ended up like that.. OK then
> Yea, AMD could start with the HWPT-based one, but that would need
> an invalidation op(?), which seems missing in this series. So, to
> support direct invalidation via Command Buffers, vIOMMU should be
> the only option, right?
Certainly if there is no invalidation op the series is incomplete.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach
2025-08-22 20:20 ` Nicolin Chen
@ 2025-08-28 0:36 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 26+ messages in thread
From: Suthikulpanit, Suravee @ 2025-08-28 0:36 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh
On 8/22/2025 3:20 PM, Nicolin Chen wrote:
> On Wed, Aug 20, 2025 at 11:30:09AM +0000, Suravee Suthikulpanit wrote:
>> +static inline bool has_gcr3_table(struct gcr3_tbl_info *gcr3_info)
>> +{
>> + if (!gcr3_info || (!gcr3_info->gcr3_tbl && !gcr3_info->trp_gpa))
>> + return false;
>
> "gcr3_info" seems always pointing to "&dev_data->gcr3_info", which
> can never be NULL.
right
>> @@ -2061,7 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
>> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
>> struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
>>
>> - if (gcr3_info && gcr3_info->gcr3_tbl)
>> + /*
>> + * For nested domain, use parent domain to setup v1 table
>> + * information and domain id.
>> + */
>> + if (amd_iommu_domain_is_nested(domain))
>> + domain = domain->parent;
>> +
>> + if (has_gcr3_table(gcr3_info))
>> domid = dev_data->gcr3_info.domid;
>
> There is already a local variable "gcr3_info".
right.
>> +static int nested_gcr3_update(struct protection_domain *pdom, struct device *dev)
>> +{
>> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
>> + struct iommu_hwpt_amd_v2 *hwpt = &pdom->guest_hwpt;
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> + if (!pdev || !hwpt)
>> + return -EINVAL;
>
> to_pci_dev is a container_of from the dev. !pdev indicates a !dev
> that should never happen in the path of an attach_dev op. Or, did
> you actually want to check if dev_is_pci(dev)?
correct, I should have just checked for dev_is_pci(dev).
> Also, hwpt is "&pdom->guest_hwpt", which would never be NULL.
>
>> +static int amd_iommu_nested_attach_device(struct iommu_domain *dom, struct device *dev)
>> +{
>> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
>> + struct protection_domain *pdom = to_pdomain(dom);
>> + struct pci_dev *pdev;
>> + int ret;
>> +
>> + if (dev_data->domain == pdom)
>> + return 0;
>> +
>> + ret = nested_gcr3_update(pdom, dev);
>> + if (ret)
>> + return ret;
>> +
>> + if (dev_data->domain)
>> + amd_iommu_detach_device(dev);
>> +
>> + ret = __amd_iommu_attach_device(dev, pdom);
>> + if (ret)
>> + return ret;
>> +
>> + pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
>> + if (pdev)
>> + amd_iommu_pdev_enable_cap_ats(pdev);
>
> Is "dev_data->dev" expected to be "dev"?
correct.
Thanks for the review. I'll clean up the logic in
amd_iommu_nested_attach_device() to return error early for non-pci device.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
2025-08-22 5:39 ` kernel test robot
2025-08-22 19:51 ` Nicolin Chen
@ 2025-09-01 7:09 ` Sairaj Kodilkar
2025-09-02 11:42 ` Jason Gunthorpe
2025-09-02 13:18 ` Jason Gunthorpe
3 siblings, 1 reply; 26+ messages in thread
From: Sairaj Kodilkar @ 2025-09-01 7:09 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh
On 8/20/2025 5:00 PM, Suravee Suthikulpanit wrote:
> The child domain is allocated with IOMMU_DOMAIN_NESTED type to store
> stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
> table along with guest (v2) page tables. The struct iommu_hwpt_amd_v2
> contains this information, and is passed from user-space as a parameter
> of the struct iommu_ops.domain_alloc_nested().
>
> The parent domain is tracked using the struct protection_domain.parent.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/Makefile | 2 +-
> drivers/iommu/amd/amd_iommu.h | 5 ++
> drivers/iommu/amd/amd_iommu_types.h | 5 ++
> drivers/iommu/amd/iommu.c | 22 ++++----
> drivers/iommu/amd/nested.c | 79 +++++++++++++++++++++++++++++
> 5 files changed, 103 insertions(+), 10 deletions(-)
> create mode 100644 drivers/iommu/amd/nested.c
>
> diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
> index 59c04a67f398..9ccf9d61810c 100644
> --- a/drivers/iommu/amd/Makefile
> +++ b/drivers/iommu/amd/Makefile
> @@ -1,3 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o
> +obj-y += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o nested.o
> obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index 3ff380afb9f4..8e86d5b1d915 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -8,6 +8,7 @@
> #define AMD_IOMMU_H
>
> #include <linux/iommu.h>
> +#include <uapi/linux/iommufd.h>
>
> #include "amd_iommu_types.h"
>
> @@ -190,4 +191,8 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
> struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
> struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
>
> +/* NESTED */
> +struct iommu_domain *
> +amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
> + u32 flags, const struct iommu_user_data *user_data);
> #endif /* AMD_IOMMU_H */
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 5343b99913e4..94f51a09b364 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -20,6 +20,8 @@
> #include <linux/irqreturn.h>
> #include <linux/io-pgtable.h>
>
> +#include <uapi/linux/iommufd.h>
> +
> /*
> * Maximum number of IOMMUs supported
> */
> @@ -605,6 +607,9 @@ struct protection_domain {
>
> struct mmu_notifier mn; /* mmu notifier for the SVA domain */
> struct list_head dev_data_list; /* List of pdom_dev_data */
> +
> + struct protection_domain *parent; /* Nested parent domain */
> + struct iommu_hwpt_amd_v2 guest_hwpt;
> };
>
> /*
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 46682c8ba28d..ea790a8997ee 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2616,6 +2616,7 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> const struct iommu_user_data *user_data)
>
> {
> + struct iommu_domain *dom = ERR_PTR(-EOPNOTSUPP);
> struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
> const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> @@ -2626,29 +2627,31 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
> return ERR_PTR(-EOPNOTSUPP);
>
> - pr_debug("%s: IOMMU devid=%#x, flags=%#x\n", __func__, dev_data->devid, flags);
> + pr_debug("%s: IOMMU devid=%#x, flags=%#x, supported_flags=%#x\n", __func__, dev_data->devid, flags, supported_flags);
>
> switch (flags & supported_flags) {
> case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
> case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
> case IOMMU_HWPT_ALLOC_NEST_PARENT:
> /* Allocate domain with v1 page table for dirty tracking */
> - if (!amd_iommu_hd_support(iommu))
> - break;
> - return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> + if (amd_iommu_hd_support(iommu))
> + dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> + break;
> case IOMMU_HWPT_ALLOC_PASID:
> /* Allocate domain with v2 page table if IOMMU supports PASID. */
> - if (!amd_iommu_pasid_supported())
> - break;
> - return do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
> + if (amd_iommu_pasid_supported())
> + dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
> + break;
> case 0:
> /* If nothing specific is required use the kernel commandline default */
> - return do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> + dom = do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> + break;
> default:
> pr_err("%s: Unhandled flag : 0x%x\n", __func__, flags);
> break;
> }
> - return ERR_PTR(-EOPNOTSUPP);
> +
> + return dom;
> }
>
> void amd_iommu_domain_free(struct iommu_domain *dom)
> @@ -3113,6 +3116,7 @@ const struct iommu_ops amd_iommu_ops = {
> .release_domain = &release_domain,
> .identity_domain = &identity_domain.domain,
> .domain_alloc_paging_flags = amd_iommu_domain_alloc_paging_flags,
> + .domain_alloc_nested = amd_iommu_domain_alloc_nested,
> .domain_alloc_sva = amd_iommu_domain_alloc_sva,
> .probe_device = amd_iommu_probe_device,
> .release_device = amd_iommu_release_device,
> diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
> new file mode 100644
> index 000000000000..09f2a455af33
> --- /dev/null
> +++ b/drivers/iommu/amd/nested.c
> @@ -0,0 +1,79 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + */
> +
> +#define pr_fmt(fmt) "AMD-Vi: " fmt
> +#define dev_fmt(fmt) pr_fmt(fmt)
> +
> +#include <linux/iommu.h>
> +#include <uapi/linux/iommufd.h>
> +
> +#include "amd_iommu.h"
> +#include "amd_iommu_types.h"
> +
> +const struct iommu_domain_ops nested_domain_ops = {
> + .free = amd_iommu_domain_free,
> +};
> +
> +static int udata_to_iommu_hwpt_amd_v2(const struct iommu_user_data *user_data,
> + struct iommu_hwpt_amd_v2 *hwpt)
> +{
> + if (!user_data)
> + return -EINVAL;
> +
> + if (user_data->type != IOMMU_HWPT_DATA_AMD_V2)
> + return -EOPNOTSUPP;
> +
> + return iommu_copy_struct_from_user(hwpt, user_data,
> + IOMMU_HWPT_DATA_AMD_V2,
> + dte);
> +}
> +
> +struct iommu_domain *
> +amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
> + u32 flags, const struct iommu_user_data *user_data)
> +{
> + int ret;
> + struct iommu_hwpt_amd_v2 hwpt;
> + struct protection_domain *pdom;
> +
> + if (parent->ops != amd_iommu_ops.default_domain_ops)
> + return ERR_PTR(-EINVAL);
> +
> + ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + pdom = kzalloc(sizeof(*pdom), GFP_KERNEL);
> + if (IS_ERR(pdom))
> + return ERR_PTR(-ENOMEM);
> +
> + pdom->id = amd_iommu_pdom_id_alloc();
> + if (!pdom->id)
> + goto out_err;
> +
> + pr_debug("%s: Allocating nested domain with parent domid=%#x\n",
> + __func__, to_pdomain(parent)->id);
> +
> + spin_lock_init(&pdom->lock);
> + INIT_LIST_HEAD(&pdom->dev_list);
> + INIT_LIST_HEAD(&pdom->dev_data_list);
> + xa_init(&pdom->iommu_array);
> +
> + pdom->pd_mode = PD_MODE_V2;
> + pdom->iop.pgtbl.cfg.amd.nid = NUMA_NO_NODE;
> + pdom->parent = to_pdomain(parent);
> + pdom->domain.ops = &nested_domain_ops;
> + pdom->domain.type = IOMMU_DOMAIN_NESTED;
> + pdom->domain.geometry.aperture_start = 0;
> + pdom->domain.geometry.aperture_end = ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> + pdom->domain.geometry.force_aperture = true;
> + pdom->domain.pgsize_bitmap = pdom->iop.pgtbl.cfg.pgsize_bitmap;
Hi Suravee,
You are assigning a unintialized pdom->iop.pgtbl.cfg.pgsize_bitmap to
pdom->domain.pgsize_bitmap. In non-nested domain attach
pdom_setup_pgtable() takes care of initializing pgsize_bitmap.
Thanks
Sairaj
> + memcpy(&pdom->guest_hwpt, &hwpt, sizeof(struct iommu_hwpt_amd_v2));
> +
> + return &pdom->domain;
> +out_err:
> + kfree(pdom);
> + return ERR_PTR(-EINVAL);
> +}
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-09-01 7:09 ` Sairaj Kodilkar
@ 2025-09-02 11:42 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 11:42 UTC (permalink / raw)
To: Sairaj Kodilkar
Cc: Suravee Suthikulpanit, nicolinc, linux-kernel, robin.murphy, will,
joro, kevin.tian, jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Mon, Sep 01, 2025 at 12:39:56PM +0530, Sairaj Kodilkar wrote:
> > + pdom->pd_mode = PD_MODE_V2;
> > + pdom->parent = to_pdomain(parent);
> > + pdom->domain.ops = &nested_domain_ops;
> > + pdom->domain.type = IOMMU_DOMAIN_NESTED;
> > + pdom->iop.pgtbl.cfg.amd.nid = NUMA_NO_NODE;
> > + pdom->domain.geometry.aperture_start = 0;
> > + pdom->domain.geometry.aperture_end = ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> > + pdom->domain.geometry.force_aperture = true;
> > + pdom->domain.pgsize_bitmap = pdom->iop.pgtbl.cfg.pgsize_bitmap;
>
> Hi Suravee,
> You are assigning a unintialized pdom->iop.pgtbl.cfg.pgsize_bitmap to
> pdom->domain.pgsize_bitmap. In non-nested domain attach pdom_setup_pgtable()
> takes care of initializing pgsize_bitmap.
nested domains should not have a map/unmap function in their ops so
none of this code should be present.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static
2025-08-20 11:30 ` [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
@ 2025-09-02 13:03 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:03 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:02AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in other files in subsequent patches.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 1 +
> drivers/iommu/amd/iommu.c | 8 ++++----
> 2 files changed, 5 insertions(+), 4 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 4/8] iommu/amd: Introduce struct gcr3_tbl_info.giov
2025-08-20 11:30 ` [PATCH 4/8] iommu/amd: Introduce struct gcr3_tbl_info.giov Suravee Suthikulpanit
@ 2025-09-02 13:07 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:07 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:05AM +0000, Suravee Suthikulpanit wrote:
> @@ -2036,7 +2037,7 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
> target->data[0] |= DTE_FLAG_GV |
> FIELD_PREP(DTE_GLX, gcr3_info->glx) |
> FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
> - if (pdom_is_v2_pgtbl_mode(dev_data->domain))
> + if (gcr3_info->giov)
> target->data[0] |= DTE_FLAG_GIOV;
>
> target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
Can we please fix this properly as I've asked many times now and given
sample code :\
Building the DTE for nested should be done with a more natural
connection to the provided vDTE fragment, not by trying to squeeze it
through this existing function by re-using gcr3_info.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 5/8] iommufd: Introduce data struct for AMD nested domain allocation
2025-08-20 11:30 ` [PATCH 5/8] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2025-09-02 13:09 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:09 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:06AM +0000, Suravee Suthikulpanit wrote:
> Introduce IOMMU_HWPT_DATA_AMD_V2 data type for AMD IOMMU v2 page table,
> which is used for stage-1 in nested translation. The data structure
> contains information necessary for setting up the AMD HW-vIOMMU support.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> include/uapi/linux/iommufd.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
I wouldn't call this AMD_V2, the spec calls it "GCR3" and "Guest"
IOMMU_HWPT_DATA_AMD_GUEST
Otherwise
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/8] iommu/amd: Making device attach / detach helpers non-static
2025-08-20 11:30 ` [PATCH 2/8] iommu/amd: Making device attach / detach helpers non-static Suravee Suthikulpanit
@ 2025-09-02 13:10 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:10 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:03AM +0000, Suravee Suthikulpanit wrote:
> To allow reuse in subsequent patches.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 2 ++
> drivers/iommu/amd/iommu.c | 11 +++++------
> 2 files changed, 7 insertions(+), 6 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> -static int attach_device(struct device *dev,
> - struct protection_domain *domain)
> +int __amd_iommu_attach_device(struct device *dev, struct protection_domain *domain)
Though it would be nice to see a clearer name for this than __
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/8] iommu/amd: Add support for nest parent domain allocation
2025-08-20 11:30 ` [PATCH 6/8] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
@ 2025-09-02 13:12 ` Jason Gunthorpe
0 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:12 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:07AM +0000, Suravee Suthikulpanit wrote:
> @@ -2626,6 +2645,7 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> /* If nothing specific is required use the kernel commandline default */
> return do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> default:
> + pr_err("%s: Unhandled flag : 0x%x\n", __func__, flags);
No prints triggerable by userspace! Remove them all!
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/8] iommu/amd: Add support for nested domain allocation
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
` (2 preceding siblings ...)
2025-09-01 7:09 ` Sairaj Kodilkar
@ 2025-09-02 13:18 ` Jason Gunthorpe
3 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:18 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:08AM +0000, Suravee Suthikulpanit wrote:
> @@ -605,6 +607,9 @@ struct protection_domain {
>
> struct mmu_notifier mn; /* mmu notifier for the SVA domain */
> struct list_head dev_data_list; /* List of pdom_dev_data */
> +
> + struct protection_domain *parent; /* Nested parent domain */
> + struct iommu_hwpt_amd_v2 guest_hwpt;
guest_dte is a better name.
> @@ -2616,6 +2616,7 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> const struct iommu_user_data *user_data)
>
> {
> + struct iommu_domain *dom = ERR_PTR(-EOPNOTSUPP);
> struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
> const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> @@ -2626,29 +2627,31 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
> if ((flags & ~supported_flags) || user_data || !is_nest_parent_supported(flags))
> return ERR_PTR(-EOPNOTSUPP);
>
> - pr_debug("%s: IOMMU devid=%#x, flags=%#x\n", __func__, dev_data->devid, flags);
> + pr_debug("%s: IOMMU devid=%#x, flags=%#x, supported_flags=%#x\n", __func__, dev_data->devid, flags, supported_flags);
>
> switch (flags & supported_flags) {
> case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
> case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
> case IOMMU_HWPT_ALLOC_NEST_PARENT:
> /* Allocate domain with v1 page table for dirty tracking */
> - if (!amd_iommu_hd_support(iommu))
> - break;
> - return do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> + if (amd_iommu_hd_support(iommu))
> + dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V1);
> + break;
> case IOMMU_HWPT_ALLOC_PASID:
> /* Allocate domain with v2 page table if IOMMU supports PASID. */
> - if (!amd_iommu_pasid_supported())
> - break;
> - return do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
> + if (amd_iommu_pasid_supported())
> + dom = do_iommu_domain_alloc(dev, flags, PD_MODE_V2);
> + break;
> case 0:
> /* If nothing specific is required use the kernel commandline default */
> - return do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> + dom = do_iommu_domain_alloc(dev, 0, amd_iommu_pgtable);
> + break;
> default:
> pr_err("%s: Unhandled flag : 0x%x\n", __func__, flags);
> break;
> }
> - return ERR_PTR(-EOPNOTSUPP);
> +
> + return dom;
Why is all this being done? Nothing touches dom on the return path
here.
> +static int udata_to_iommu_hwpt_amd_v2(const struct iommu_user_data *user_data,
> + struct iommu_hwpt_amd_v2 *hwpt)
> +{
> + if (!user_data)
> + return -EINVAL;
> +
> + if (user_data->type != IOMMU_HWPT_DATA_AMD_V2)
> + return -EOPNOTSUPP;
> +
> + return iommu_copy_struct_from_user(hwpt, user_data,
> + IOMMU_HWPT_DATA_AMD_V2,
> + dte);
> +}
Don't need this helper, iommu_copy_struct_from_user() does everything.
> +struct iommu_domain *
> +amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
> + u32 flags, const struct iommu_user_data *user_data)
> +{
> + int ret;
> + struct iommu_hwpt_amd_v2 hwpt;
> + struct protection_domain *pdom;
> +
> + if (parent->ops != amd_iommu_ops.default_domain_ops)
> + return ERR_PTR(-EINVAL);
This should check it was allocated as a parent domain too.
> + ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + pdom = kzalloc(sizeof(*pdom), GFP_KERNEL);
> + if (IS_ERR(pdom))
> + return ERR_PTR(-ENOMEM);
I'm not sure it makes sense to allocate a protection_domain here, this
doesn't really use much of the struct. Can you make it into its own
struct? It would be clearer and safer..
> + pdom->id = amd_iommu_pdom_id_alloc();
> + if (!pdom->id)
> + goto out_err;
> +
> + pr_debug("%s: Allocating nested domain with parent domid=%#x\n",
> + __func__, to_pdomain(parent)->id);
> +
> + spin_lock_init(&pdom->lock);
> + INIT_LIST_HEAD(&pdom->dev_list);
> + INIT_LIST_HEAD(&pdom->dev_data_list);
> + xa_init(&pdom->iommu_array);
> +
> + pdom->pd_mode = PD_MODE_V2;
Nothing should read pd_mode, please check it..
> + pdom->iop.pgtbl.cfg.amd.nid = NUMA_NO_NODE;
> + pdom->parent = to_pdomain(parent);
> + pdom->domain.ops = &nested_domain_ops;
> + pdom->domain.type = IOMMU_DOMAIN_NESTED;
> + pdom->domain.geometry.aperture_start = 0;
> + pdom->domain.geometry.aperture_end = ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> + pdom->domain.geometry.force_aperture = true;
> + pdom->domain.pgsize_bitmap = pdom->iop.pgtbl.cfg.pgsize_bitmap;
And all of these are unnecessary, should never be read.
jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach
2025-08-20 11:30 ` [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-08-22 20:20 ` Nicolin Chen
@ 2025-09-02 13:25 ` Jason Gunthorpe
2025-09-03 8:27 ` Sairaj Kodilkar
2 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2025-09-02 13:25 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh
On Wed, Aug 20, 2025 at 11:30:09AM +0000, Suravee Suthikulpanit wrote:
> @@ -2023,17 +2035,31 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
> struct dev_table_entry *target)
> {
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> + struct protection_domain *pdom = dev_data->domain;
> u64 gcr3;
>
> - if (!gcr3_info->gcr3_tbl)
> + if (!has_gcr3_table(gcr3_info))
> return;
>
> - pr_debug("%s: devid=%#x, glx=%#x, giov=%#x, gcr3_tbl=%#llx\n",
> + /* We need to check host capability before setting the mode. */
> + if ((pdom->guest_paging_mode == PAGE_MODE_5_LEVEL) &&
> + (amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)) {
> + pr_err("Cannot support Guest paging mode=%#x (dom_id=%#x).\n",
> + pdom->guest_paging_mode, pdom->id);
Should be checked during allocation time
And again please don't mess up this function with nested DTEs.
The vDTE should be validated during creation or fail creation. I see
this is missing validation, every single bit in the vDTE needs to be
checked to be 0 or supported by the kernel.
The logic should simply take the vDTE and merge it with the host DTE
as a simple bitwise operation.
This is why I keep saying to fix the flow here so this can be written
properly, and don't mess with the gcr3_info.
> @@ -2293,7 +2326,8 @@ int __amd_iommu_attach_device(struct device *dev, struct protection_domain *doma
> goto out;
>
> /* Setup GCR3 table */
> - if (pdom_is_sva_capable(domain)) {
> + if (!amd_iommu_domain_is_nested(domain) && pdom_is_sva_capable(domain)) {
> + pr_warn("%s: Allocating guest page table\n", __func__);
??
> -const struct iommu_domain_ops nested_domain_ops = {
> - .free = amd_iommu_domain_free,
> -};
> +const struct iommu_domain_ops nested_domain_ops;
Put stuff where it belongs when first adding it..
> +static inline u64 hwpt_to_gcr3_trp(u64 *dte)
> +{
> + u64 gcr3;
> +
> + gcr3 = (FIELD_GET(DTE_GCR3_14_12, dte[0]) << 12);
> + gcr3 |= (FIELD_GET(DTE_GCR3_30_15, dte[1]) << 15);
> + gcr3 |= (FIELD_GET(DTE_GCR3_51_31, dte[1]) << 31);
> + return gcr3;
> +}
> +
> +static int nested_gcr3_update(struct protection_domain *pdom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct iommu_hwpt_amd_v2 *hwpt = &pdom->guest_hwpt;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pdev || !hwpt)
> + return -EINVAL;
> +
> + /* Note: Currently only support GCR3TRPMode with nested translation */
> + if (!check_feature2(FEATURE_GCR3TRPMODE))
> + return -EOPNOTSUPP;
> +
> + if (FIELD_GET(DTE_GPT_LEVEL_MASK, hwpt->dte[2]) == GUEST_PGTABLE_5_LEVEL)
> + pdom->guest_paging_mode = PAGE_MODE_5_LEVEL;
> + else
> + pdom->guest_paging_mode = PAGE_MODE_4_LEVEL;
> +
> + dev_data->ppr = FIELD_GET(DTE_FLAG_PPR, hwpt->dte[0]);
> + dev_data->gcr3_info.glx = FIELD_GET(DTE_FLAG_GLX, hwpt->dte[0]);
> + dev_data->gcr3_info.giov = FIELD_GET(DTE_FLAG_GIOV, hwpt->dte[0]);
> + dev_data->gcr3_info.trp_gpa = hwpt_to_gcr3_trp(hwpt->dte);
> + /* Due to possible aliasing issue use nested domain ID */
> + dev_data->gcr3_info.domid = pdom->id;
> + pr_debug("%s: devid=%#x, domid=%#x, trp_gpa=%#llx, glx=%#x\n", __func__,
> + pci_dev_id(pdev),
> + dev_data->gcr3_info.domid,
> + dev_data->gcr3_info.trp_gpa,
> + dev_data->gcr3_info.glx);
> +
> + return 0;
> +}
None of this logic is needed if the vDTE is treated bitwise.
> +static int amd_iommu_nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct protection_domain *pdom = to_pdomain(dom);
> + struct pci_dev *pdev;
> + int ret;
> +
> + if (dev_data->domain == pdom)
> + return 0;
> +
> + ret = nested_gcr3_update(pdom, dev);
> + if (ret)
> + return ret;
> +
> + if (dev_data->domain)
> + amd_iommu_detach_device(dev);
I'm strongly against not supporting hitless vDTE update - this is part
of the HW spec, the VMM should implement it, not create problematic
bugs to deal with down the road. Everytime we let the VMM deviate from
the HW spec in undiscoverable ways it causes problems :(
Meaning you can't call detach_device, you have to support hitless
update of the DTE between different attachment types. You already did
the hard work of making update_dte256(), but the surrounding flows
still need fixing.
Jason
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach
2025-08-20 11:30 ` [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-08-22 20:20 ` Nicolin Chen
2025-09-02 13:25 ` Jason Gunthorpe
@ 2025-09-03 8:27 ` Sairaj Kodilkar
2 siblings, 0 replies; 26+ messages in thread
From: Sairaj Kodilkar @ 2025-09-03 8:27 UTC (permalink / raw)
To: Suravee Suthikulpanit, jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh
On 8/20/2025 5:00 PM, Suravee Suthikulpanit wrote:
> Programs guest translation settings in the host DTE when attaches the
> nested domain to a device.
>
> Also, enable the GCR3TRPMode feature when supported.
>
> Note that nested translation is only supported with the GCR3TRP mode.
> When it is enabled, the AMD IOMMU driver programs the GCR3 Table Root
> Pointer field of the device table entry with the GPA provided by the guest.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 5 ++
> drivers/iommu/amd/init.c | 3 ++
> drivers/iommu/amd/iommu.c | 49 +++++++++++++++---
> drivers/iommu/amd/nested.c | 80 +++++++++++++++++++++++++++--
> 4 files changed, 127 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 94f51a09b364..f8c392aadeb1 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -191,6 +191,7 @@
> #define CONTROL_EPH_EN 45
> #define CONTROL_XT_EN 50
> #define CONTROL_INTCAPXT_EN 51
> +#define CONTROL_GCR3TRPMODE 58
> #define CONTROL_IRTCACHEDIS 59
> #define CONTROL_SNPAVIC_EN 61
>
> @@ -420,6 +421,8 @@
> #define DTE_FLAG_V BIT_ULL(0)
> #define DTE_FLAG_TV BIT_ULL(1)
> #define DTE_FLAG_HAD (3ULL << 7)
> +#define DTE_FLAG_PPR BIT_ULL(52)
> +#define DTE_FLAG_GLX BIT_ULL(53)
> #define DTE_FLAG_GIOV BIT_ULL(54)
> #define DTE_FLAG_GV BIT_ULL(55)
> #define DTE_GLX GENMASK_ULL(57, 56)
> @@ -555,6 +558,7 @@ struct amd_irte_ops;
>
> struct gcr3_tbl_info {
> u64 *gcr3_tbl; /* Guest CR3 table */
> + u64 trp_gpa; /* Guest CR3 TRP GPA for nested domain */
> int glx; /* Number of levels for GCR3 table */
> u32 pasid_cnt; /* Track attached PASIDs */
> u16 domid; /* Per device domain ID */
> @@ -610,6 +614,7 @@ struct protection_domain {
>
> struct protection_domain *parent; /* Nested parent domain */
> struct iommu_hwpt_amd_v2 guest_hwpt;
> + u16 guest_paging_mode; /* Guest paging mode */
> };
>
> /*
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 8de689b2c5ed..b340afd6901f 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -971,6 +971,9 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
> return;
>
> iommu_feature_enable(iommu, CONTROL_GT_EN);
> +
> + if (check_feature2(FEATURE_GCR3TRPMODE))
> + iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
> }
>
> /* sets a specific bit in the device table entry. */
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index ea790a8997ee..935eaffb6814 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1856,6 +1856,18 @@ static void free_gcr3_tbl_level2(u64 *tbl)
> }
> }
>
> +static inline bool amd_iommu_domain_is_nested(struct protection_domain *pdom)
> +{
> + return (pdom && (pdom->domain.type == IOMMU_DOMAIN_NESTED));
> +}
> +
> +static inline bool has_gcr3_table(struct gcr3_tbl_info *gcr3_info)
> +{
> + if (!gcr3_info || (!gcr3_info->gcr3_tbl && !gcr3_info->trp_gpa))
> + return false;
> + return true;
> +}
> +
> static void free_gcr3_table(struct gcr3_tbl_info *gcr3_info)
> {
> if (gcr3_info->glx == 2)
> @@ -1901,7 +1913,7 @@ static int setup_gcr3_table(struct gcr3_tbl_info *gcr3_info,
> if (levels > amd_iommu_max_glx_val)
> return -EINVAL;
>
> - if (gcr3_info->gcr3_tbl)
> + if (has_gcr3_table(gcr3_info))
> return -EBUSY;
>
> /* Allocate per device domain ID */
> @@ -2023,17 +2035,31 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
> struct dev_table_entry *target)
> {
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> + struct protection_domain *pdom = dev_data->domain;
> u64 gcr3;
>
> - if (!gcr3_info->gcr3_tbl)
> + if (!has_gcr3_table(gcr3_info))
> return;
>
> - pr_debug("%s: devid=%#x, glx=%#x, giov=%#x, gcr3_tbl=%#llx\n",
> + /* We need to check host capability before setting the mode. */
> + if ((pdom->guest_paging_mode == PAGE_MODE_5_LEVEL) &&
> + (amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)) {
> + pr_err("Cannot support Guest paging mode=%#x (dom_id=%#x).\n",
> + pdom->guest_paging_mode, pdom->id);
> + return;
> + }
> +
> + pr_debug("%s: devid=%#x, glx=%#x, giov=%#x, gcr3_tbl=%#llx, trp_gpa=%#llx, type=%#x\n",
> __func__, dev_data->devid, gcr3_info->glx, gcr3_info->giov,
> - (unsigned long long)gcr3_info->gcr3_tbl);
> + (unsigned long long)gcr3_info->gcr3_tbl, gcr3_info->trp_gpa,
> + pdom->domain.type);
>
> gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
>
> + /* For nested domain, use GCR3 GPA provided */
> + if (gcr3_info->trp_gpa)
> + gcr3 = gcr3_info->trp_gpa;
> +
> target->data[0] |= DTE_FLAG_GV |
> FIELD_PREP(DTE_GLX, gcr3_info->glx) |
> FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
> @@ -2044,7 +2070,7 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
> FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
>
> /* Guest page table can only support 4 and 5 levels */
> - if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
> + if (pdom->guest_paging_mode == PAGE_MODE_5_LEVEL)
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
> else
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
> @@ -2061,7 +2087,14 @@ static void set_dte_entry(struct amd_iommu *iommu,
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
>
> - if (gcr3_info && gcr3_info->gcr3_tbl)
> + /*
> + * For nested domain, use parent domain to setup v1 table
> + * information and domain id.
> + */
> + if (amd_iommu_domain_is_nested(domain))
> + domain = domain->parent;
> +
> + if (has_gcr3_table(gcr3_info))
> domid = dev_data->gcr3_info.domid;
> else
> domid = domain->id;
> @@ -2293,7 +2326,8 @@ int __amd_iommu_attach_device(struct device *dev, struct protection_domain *doma
> goto out;
>
> /* Setup GCR3 table */
> - if (pdom_is_sva_capable(domain)) {
> + if (!amd_iommu_domain_is_nested(domain) && pdom_is_sva_capable(domain)) {
> + pr_warn("%s: Allocating guest page table\n", __func__);
> ret = init_gcr3_table(dev_data, domain);
> if (ret) {
> pdom_detach_iommu(iommu, domain);
> @@ -2519,6 +2553,7 @@ static int pdom_setup_pgtable(struct protection_domain *domain,
> fmt = AMD_IOMMU_V1;
> break;
> case PD_MODE_V2:
> + domain->guest_paging_mode = amd_iommu_gpt_level;
> fmt = AMD_IOMMU_V2;
> break;
> case PD_MODE_NONE:
> diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
> index 09f2a455af33..c9bf44e6298d 100644
> --- a/drivers/iommu/amd/nested.c
> +++ b/drivers/iommu/amd/nested.c
> @@ -12,9 +12,7 @@
> #include "amd_iommu.h"
> #include "amd_iommu_types.h"
>
> -const struct iommu_domain_ops nested_domain_ops = {
> - .free = amd_iommu_domain_free,
> -};
> +const struct iommu_domain_ops nested_domain_ops;
>
> static int udata_to_iommu_hwpt_amd_v2(const struct iommu_user_data *user_data,
> struct iommu_hwpt_amd_v2 *hwpt)
> @@ -77,3 +75,79 @@ amd_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
> kfree(pdom);
> return ERR_PTR(-EINVAL);
> }
> +
> +static inline u64 hwpt_to_gcr3_trp(u64 *dte)
> +{
> + u64 gcr3;
> +
> + gcr3 = (FIELD_GET(DTE_GCR3_14_12, dte[0]) << 12);
> + gcr3 |= (FIELD_GET(DTE_GCR3_30_15, dte[1]) << 15);
> + gcr3 |= (FIELD_GET(DTE_GCR3_51_31, dte[1]) << 31);
> + return gcr3;
> +}
> +
> +static int nested_gcr3_update(struct protection_domain *pdom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct iommu_hwpt_amd_v2 *hwpt = &pdom->guest_hwpt;
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pdev || !hwpt)
> + return -EINVAL;
> +
> + /* Note: Currently only support GCR3TRPMode with nested translation */
> + if (!check_feature2(FEATURE_GCR3TRPMODE))
> + return -EOPNOTSUPP;
> +
> + if (FIELD_GET(DTE_GPT_LEVEL_MASK, hwpt->dte[2]) == GUEST_PGTABLE_5_LEVEL)
> + pdom->guest_paging_mode = PAGE_MODE_5_LEVEL;
> + else
> + pdom->guest_paging_mode = PAGE_MODE_4_LEVEL;
> +
> + dev_data->ppr = FIELD_GET(DTE_FLAG_PPR, hwpt->dte[0]);
> + dev_data->gcr3_info.glx = FIELD_GET(DTE_FLAG_GLX, hwpt->dte[0]);
> + dev_data->gcr3_info.giov = FIELD_GET(DTE_FLAG_GIOV, hwpt->dte[0]);
> + dev_data->gcr3_info.trp_gpa = hwpt_to_gcr3_trp(hwpt->dte);
> + /* Due to possible aliasing issue use nested domain ID */
> + dev_data->gcr3_info.domid = pdom->id;
> + pr_debug("%s: devid=%#x, domid=%#x, trp_gpa=%#llx, glx=%#x\n", __func__,
> + pci_dev_id(pdev),
> + dev_data->gcr3_info.domid,
> + dev_data->gcr3_info.trp_gpa,
> + dev_data->gcr3_info.glx);
> +
> + return 0;
> +}
> +
> +static int amd_iommu_nested_attach_device(struct iommu_domain *dom, struct device *dev)
> +{
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct protection_domain *pdom = to_pdomain(dom);
> + struct pci_dev *pdev;
> + int ret;
> +
> + if (dev_data->domain == pdom)
> + return 0;
> +
> + ret = nested_gcr3_update(pdom, dev);
> + if (ret)
> + return ret;
> +
> + if (dev_data->domain)
> + amd_iommu_detach_device(dev);
> +
> + ret = __amd_iommu_attach_device(dev, pdom);
> + if (ret)
> + return ret;
> +
> + pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
> + if (pdev)
> + amd_iommu_pdev_enable_cap_ats(pdev);
> +
Hi Suravee,
You dont need to enable ATS capability here. The function
__amd_iommu_attach_device() takes care of it.
> + return ret;
> +}
> +
> +const struct iommu_domain_ops nested_domain_ops = {
> + .attach_dev = amd_iommu_nested_attach_device,
> + .free = amd_iommu_domain_free,
> +};
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-09-03 8:27 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-20 11:30 [PATCH 0/8] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 1/8] iommu/amd: Make amd_iommu_pdom_id_alloc() non-static Suravee Suthikulpanit
2025-09-02 13:03 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 2/8] iommu/amd: Making device attach / detach helpers non-static Suravee Suthikulpanit
2025-09-02 13:10 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 3/8] iommu/amd: Making amd_iommu_pdev_enable_cap_ats non-static Suravee Suthikulpanit
2025-08-20 11:30 ` [PATCH 4/8] iommu/amd: Introduce struct gcr3_tbl_info.giov Suravee Suthikulpanit
2025-09-02 13:07 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 5/8] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
2025-09-02 13:09 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 6/8] iommu/amd: Add support for nest parent " Suravee Suthikulpanit
2025-09-02 13:12 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 7/8] iommu/amd: Add support for nested " Suravee Suthikulpanit
2025-08-22 5:39 ` kernel test robot
2025-08-22 19:51 ` Nicolin Chen
2025-08-22 21:16 ` Jason Gunthorpe
2025-08-22 21:45 ` Nicolin Chen
2025-08-25 13:38 ` Jason Gunthorpe
2025-09-01 7:09 ` Sairaj Kodilkar
2025-09-02 11:42 ` Jason Gunthorpe
2025-09-02 13:18 ` Jason Gunthorpe
2025-08-20 11:30 ` [PATCH 8/8] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2025-08-22 20:20 ` Nicolin Chen
2025-08-28 0:36 ` Suthikulpanit, Suravee
2025-09-02 13:25 ` Jason Gunthorpe
2025-09-03 8:27 ` Sairaj Kodilkar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).