* [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support
@ 2026-01-15 6:08 Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 01/13] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
` (13 more replies)
0 siblings, 14 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This series introduces support for AMD IOMMU nested page table translation
with the host (v1) and guest (v2) page tables.
In this mode, the AMD IOMMU driver configures the Device Table Entry (DTE)
with host page table root pointer, which is configured by allocating domain
with page table type IOMMU_HWPT_ALLOC_NEST_PARENT.
The guest page tables and Guest CR3 (GCR3) tables are managed by Guest OS,
and stored in the guest DTE (gDTE) in guest memory. VMM is responsible for
passing gDTE information to the host IOMMU driver using struct
iommu_hwpt_amd_guest when allocating a domain type IOMMU_DOMAIN_NESTED.
Then, the gDTE is parsed and program onto host DTE by the AMD IOMMU driver.
In addition, this series introduces base code for IOMMUFD vIOMMU for AMD
IOMMU, and implements vIOMMU-based nested domain allocation interface.
The struct nested_domain to store nested domain information, and
set_dte_nested() helper function to handle DTE programing for the nested
domain.
The series is separated into two parts:
* Patch 1 adds support for hw_info for AMD IOMMU
* Patch 2-4 are preparatory patches.
* Patch 5-13 implement nested translation support
Please note that this is a preparatory series for nested translation w/
AMD vIOMMU support. Currently, the amd_iommu_alloc_domain_nested() has not
been hooked with the struct iommufd_viommu_ops.alloc_domain_nested.
This will be fully enabled in subsequent series, which will introduce
the support for IOMMUFD vIOMMU for AMD.
Changes from V5:
(https://lore.kernel.org/linux-iommu/20251112182506.7165-1-suravee.suthikulpanit@amd.com/)
* This series is rebased on top of
Git repo: git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
Branch: next
Commit: c7fe9384c85d ("amd/iommu: Make protection domain ID functions non-static")
* Rebase on top of Linux v6.19-rc1, and adopt the new gen_pt framework.
* Remove patch "iommu/amd: Make amd_iommu_pdom_id_alloc() non-static"
since it has been accepted from another series.
* Remove patch "iommu/amd: Make amd_iommu_pdom_id_free() non-static"
since it has been accepted from another series.
* Patch 1:
- Rebase "[PATCH v5] iommu/amd: Add support for hw_info for iommu capability query"
(https://lore.kernel.org/linux-iommu/20250926141901.511313-1-suravee.suthikulpanit@amd.com/)
* Patch 10:
- Fix race condition using xa_lock()/xa_unlock() per Nicolin comment
- Introduce gdom_info_load_or_alloc_locked() per Jason suggestion
* Patch 12:
- Rework logic for set_dte_v1() / amd_iommu_set_dte_v1()
- Remove check for snp enabled and domid=0 since no longer applicable.
* Patch 13:
- Move amd_iommu_update_dte() to nested_attach_device().
- Add WARN_ON for PASID enabled.
Thanks,
Suravee
Suravee Suthikulpanit (13):
iommu/amd: Add support for hw_info for iommu capability query
iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
iommu/amd: Introduce helper function amd_iommu_update_dte()
iommufd: Introduce data struct for AMD nested domain allocation
iommu/amd: Always enable GCR3TRPMode when supported.
iommu/amd: Add support for nest parent domain allocation
iommu/amd: Introduce struct amd_iommu_viommu
iommu/amd: Add support for nested domain allocation
iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain
invalidation
iommu/amd: Refactor persistent DTE bits programming into
amd_iommu_make_clear_dte()
iommu/amd: Refactor logic to program the host page table in DTE
iommu/amd: Add support for nested domain attach/detach
drivers/iommu/amd/Kconfig | 10 +
drivers/iommu/amd/Makefile | 1 +
drivers/iommu/amd/amd_iommu.h | 33 ++++
drivers/iommu/amd/amd_iommu_types.h | 48 ++++-
drivers/iommu/amd/init.c | 10 +-
drivers/iommu/amd/iommu.c | 264 +++++++++++++++----------
drivers/iommu/amd/iommufd.c | 78 ++++++++
drivers/iommu/amd/iommufd.h | 20 ++
drivers/iommu/amd/nested.c | 294 ++++++++++++++++++++++++++++
include/uapi/linux/iommufd.h | 39 ++++
10 files changed, 692 insertions(+), 105 deletions(-)
create mode 100644 drivers/iommu/amd/iommufd.c
create mode 100644 drivers/iommu/amd/iommufd.h
create mode 100644 drivers/iommu/amd/nested.c
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v6 01/13] iommu/amd: Add support for hw_info for iommu capability query
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 02/13] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
` (12 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
AMD IOMMU Extended Feature (EFR) and Extended Feature 2 (EFR2) registers
specify features supported by each IOMMU hardware instance.
The IOMMU driver checks each feature-specific bits before enabling
each feature at run time.
For IOMMUFD, the hypervisor passes the raw value of amd_iommu_efr and
amd_iommu_efr2 to VMM via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/Kconfig | 10 ++++++++++
drivers/iommu/amd/Makefile | 1 +
drivers/iommu/amd/iommu.c | 2 ++
drivers/iommu/amd/iommufd.c | 31 +++++++++++++++++++++++++++++++
drivers/iommu/amd/iommufd.h | 15 +++++++++++++++
include/uapi/linux/iommufd.h | 28 ++++++++++++++++++++++++++++
6 files changed, 87 insertions(+)
create mode 100644 drivers/iommu/amd/iommufd.c
create mode 100644 drivers/iommu/amd/iommufd.h
diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index f2acf471cb5d..588355ff7eb7 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -30,6 +30,16 @@ config AMD_IOMMU
your BIOS for an option to enable it or if you have an IVRS ACPI
table.
+config AMD_IOMMU_IOMMUFD
+ bool "Enable IOMMUFD features for AMD IOMMU (EXPERIMENTAL)"
+ depends on IOMMUFD
+ depends on AMD_IOMMU
+ help
+ Support for IOMMUFD features intended to support virtual machines
+ with accelerated virtual IOMMUs.
+
+ Say Y here if you are doing development and testing on this feature.
+
config AMD_IOMMU_DEBUGFS
bool "Enable AMD IOMMU internals in DebugFS"
depends on AMD_IOMMU && IOMMU_DEBUGFS
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 5412a563c697..41f053b49dce 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += iommu.o init.o quirks.o ppr.o pasid.o
+obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 5d45795c367a..b6154c73e514 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -43,6 +43,7 @@
#include <linux/generic_pt/iommu.h>
#include "amd_iommu.h"
+#include "iommufd.h"
#include "../irq_remapping.h"
#include "../iommu-pages.h"
@@ -3079,6 +3080,7 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
+ .hw_info = amd_iommufd_hw_info,
.blocked_domain = &blocked_domain,
.release_domain = &blocked_domain,
.identity_domain = &identity_domain.domain,
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
new file mode 100644
index 000000000000..72eaaa923d04
--- /dev/null
+++ b/drivers/iommu/amd/iommufd.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/iommu.h>
+
+#include "iommufd.h"
+#include "amd_iommu.h"
+#include "amd_iommu_types.h"
+
+void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
+{
+ struct iommu_hw_info_amd *hwinfo;
+
+ if (*type != IOMMU_HW_INFO_TYPE_DEFAULT &&
+ *type != IOMMU_HW_INFO_TYPE_AMD)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ hwinfo = kzalloc(sizeof(*hwinfo), GFP_KERNEL);
+ if (!hwinfo)
+ return ERR_PTR(-ENOMEM);
+
+ *length = sizeof(*hwinfo);
+ *type = IOMMU_HW_INFO_TYPE_AMD;
+
+ hwinfo->efr = amd_iommu_efr;
+ hwinfo->efr2 = amd_iommu_efr2;
+
+ return hwinfo;
+}
diff --git a/drivers/iommu/amd/iommufd.h b/drivers/iommu/amd/iommufd.h
new file mode 100644
index 000000000000..f880be80a30d
--- /dev/null
+++ b/drivers/iommu/amd/iommufd.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#ifndef AMD_IOMMUFD_H
+#define AMD_IOMMUFD_H
+
+#if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
+void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type);
+#else
+#define amd_iommufd_hw_info NULL
+#endif /* CONFIG_AMD_IOMMU_IOMMUFD */
+
+#endif /* AMD_IOMMUFD_H */
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 2c41920b641d..3db37f6042a0 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -623,6 +623,32 @@ struct iommu_hw_info_tegra241_cmdqv {
__u8 __reserved;
};
+/**
+ * struct iommu_hw_info_amd - AMD IOMMU device info
+ *
+ * @efr : Value of AMD IOMMU Extended Feature Register (EFR)
+ * @efr2: Value of AMD IOMMU Extended Feature 2 Register (EFR2)
+ *
+ * Please See description of these registers in the following sections of
+ * the AMD I/O Virtualization Technology (IOMMU) Specification.
+ * (https://docs.amd.com/v/u/en-US/48882_3.10_PUB)
+ *
+ * - MMIO Offset 0030h IOMMU Extended Feature Register
+ * - MMIO Offset 01A0h IOMMU Extended Feature 2 Register
+ *
+ * Note: The EFR and EFR2 are raw values reported by hardware.
+ * VMM is responsible to determine the appropriate flags to be exposed to
+ * the VM since cetertain features are not currently supported by the kernel
+ * for HW-vIOMMU.
+ *
+ * Current VMM-allowed list of feature flags are:
+ * - EFR[GTSup, GASup, GioSup, PPRSup, EPHSup, GATS, GLX, PASmax]
+ */
+struct iommu_hw_info_amd {
+ __aligned_u64 efr;
+ __aligned_u64 efr2;
+};
+
/**
* enum iommu_hw_info_type - IOMMU Hardware Info Types
* @IOMMU_HW_INFO_TYPE_NONE: Output by the drivers that do not report hardware
@@ -632,6 +658,7 @@ struct iommu_hw_info_tegra241_cmdqv {
* @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type
* @IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV (extension for ARM
* SMMUv3) info type
+ * @IOMMU_HW_INFO_TYPE_AMD: AMD IOMMU info type
*/
enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_NONE = 0,
@@ -639,6 +666,7 @@ enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_INTEL_VTD = 1,
IOMMU_HW_INFO_TYPE_ARM_SMMUV3 = 2,
IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV = 3,
+ IOMMU_HW_INFO_TYPE_AMD = 4,
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 02/13] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 01/13] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 03/13] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
` (11 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Also change the define to use GENMASK_ULL instead.
There is no functional change.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/init.c | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 320733e7d8b4..14801d734684 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -358,7 +358,7 @@
#define DTE_FLAG_IOTLB BIT_ULL(32)
#define DTE_FLAG_MASK (0x3ffULL << 32)
-#define DEV_DOMID_MASK 0xffffULL
+#define DTE_DOMID_MASK GENMASK_ULL(15, 0)
#define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
#define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 384c90b4f90a..cfbc9ff105c3 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1179,7 +1179,7 @@ static bool __reuse_device_table(struct amd_iommu *iommu)
for (devid = 0; devid <= pci_seg->last_bdf; devid++) {
old_dev_tbl_entry = &pci_seg->old_dev_tbl_cpy[devid];
dte_v = FIELD_GET(DTE_FLAG_V, old_dev_tbl_entry->data[0]);
- dom_id = FIELD_GET(DEV_DOMID_MASK, old_dev_tbl_entry->data[1]);
+ dom_id = FIELD_GET(DTE_DOMID_MASK, old_dev_tbl_entry->data[1]);
if (!dte_v || !dom_id)
continue;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b6154c73e514..2ab0f3c98425 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2116,7 +2116,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
if (dev_data->ats_enabled)
new.data[1] |= DTE_FLAG_IOTLB;
- old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+ old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 03/13] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 01/13] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 02/13] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 04/13] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
` (10 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
This will be reused in a new nested.c file for nested translation.
Also, remove unused function parameter ptr.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 8 ++++++++
drivers/iommu/amd/iommu.c | 13 ++-----------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index b742ef1adb35..8eb5e9857079 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,4 +190,12 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+static inline void
+amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
+{
+ /* All existing DTE must have V bit set */
+ new->data128[0] = DTE_FLAG_V;
+ new->data128[1] = 0;
+}
+
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2ab0f3c98425..ded8d4ba86e3 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2011,14 +2011,6 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
-static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
- struct dev_table_entry *new)
-{
- /* All existing DTE must have V bit set */
- new->data128[0] = DTE_FLAG_V;
- new->data128[1] = 0;
-}
-
/*
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
@@ -2068,7 +2060,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
struct pt_iommu_amdv1_hw_info pt_info;
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
@@ -2149,9 +2141,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
struct dev_table_entry new = {};
- struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- make_clear_dte(dev_data, dte, &new);
+ amd_iommu_make_clear_dte(dev_data, &new);
update_dte256(iommu, dev_data, &new);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 04/13] iommu/amd: Introduce helper function amd_iommu_update_dte()
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (2 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 03/13] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 05/13] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
` (9 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Which includes DTE update, clone_aliases, DTE flush and completion-wait
commands to avoid code duplication when reuse to setup DTE for nested
translation.
Also, make amd_iommu_update_dte() non-static to reuse in
in a new nested.c file for nested translation.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 4 ++++
drivers/iommu/amd/iommu.c | 24 ++++++++++++++++++------
2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 8eb5e9857079..d97b9b6d76d3 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,6 +190,10 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+void amd_iommu_update_dte(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new);
+
static inline void
amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
{
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ded8d4ba86e3..6fea4ac97e3d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -76,6 +76,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
phys_addr_t top_paddr, unsigned int top_level);
+static int device_flush_dte(struct iommu_dev_data *dev_data);
+
static void amd_iommu_change_top(struct pt_iommu *iommu_table,
phys_addr_t top_paddr, unsigned int top_level);
@@ -86,6 +88,10 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain);
static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable);
+static void clone_aliases(struct amd_iommu *iommu, struct device *dev);
+
+static int iommu_completion_wait(struct amd_iommu *iommu);
+
/****************************************************************************
*
* Helper functions
@@ -203,6 +209,16 @@ static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_da
spin_unlock_irqrestore(&dev_data->dte_lock, flags);
}
+void amd_iommu_update_dte(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
+{
+ update_dte256(iommu, dev_data, new);
+ clone_aliases(iommu, dev_data->dev);
+ device_flush_dte(dev_data);
+ iommu_completion_wait(iommu);
+}
+
static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
struct dev_table_entry *dte)
{
@@ -2123,7 +2139,7 @@ static void set_dte_entry(struct amd_iommu *iommu,
set_dte_gcr3_table(iommu, dev_data, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
@@ -2143,7 +2159,7 @@ static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_
struct dev_table_entry new = {};
amd_iommu_make_clear_dte(dev_data, &new);
- update_dte256(iommu, dev_data, &new);
+ amd_iommu_update_dte(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
@@ -2155,10 +2171,6 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
set_dte_entry(iommu, dev_data, 0, 0);
else
clear_dte_entry(iommu, dev_data);
-
- clone_aliases(iommu, dev_data->dev);
- device_flush_dte(dev_data);
- iommu_completion_wait(iommu);
}
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 05/13] iommufd: Introduce data struct for AMD nested domain allocation
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (3 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 04/13] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 06/13] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
` (8 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
include/uapi/linux/iommufd.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 3db37f6042a0..1dafbc552d37 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -465,16 +465,27 @@ struct iommu_hwpt_arm_smmuv3 {
__aligned_le64 ste[2];
};
+/**
+ * struct iommu_hwpt_amd_guest - AMD IOMMU guest I/O page table data
+ * (IOMMU_HWPT_DATA_AMD_GUEST)
+ * @dte: Guest Device Table Entry (DTE)
+ */
+struct iommu_hwpt_amd_guest {
+ __aligned_u64 dte[4];
+};
+
/**
* enum iommu_hwpt_data_type - IOMMU HWPT Data Type
* @IOMMU_HWPT_DATA_NONE: no data
* @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
* @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
+ * @IOMMU_HWPT_DATA_AMD_GUEST: AMD IOMMU guest page table
*/
enum iommu_hwpt_data_type {
IOMMU_HWPT_DATA_NONE = 0,
IOMMU_HWPT_DATA_VTD_S1 = 1,
IOMMU_HWPT_DATA_ARM_SMMUV3 = 2,
+ IOMMU_HWPT_DATA_AMD_GUEST = 3,
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 06/13] iommu/amd: Always enable GCR3TRPMode when supported.
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (4 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 05/13] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 07/13] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
` (7 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The GCR3TRPMode feature allows the DTE[GCR3TRP] field to be configured
with GPA (instead of SPA). This simplifies the implementation, and is
a pre-requisite for nested translation support.
Therefore, always enable this feature if available.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 ++
drivers/iommu/amd/init.c | 8 ++++++++
2 files changed, 10 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 14801d734684..d8753841cd1f 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -108,6 +108,7 @@
/* Extended Feature 2 Bits */
#define FEATURE_SEVSNPIO_SUP BIT_ULL(1)
+#define FEATURE_GCR3TRPMODE BIT_ULL(3)
#define FEATURE_SNPAVICSUP GENMASK_ULL(7, 5)
#define FEATURE_SNPAVICSUP_GAM(x) \
(FIELD_GET(FEATURE_SNPAVICSUP, x) == 0x1)
@@ -186,6 +187,7 @@
#define CONTROL_EPH_EN 45
#define CONTROL_XT_EN 50
#define CONTROL_INTCAPXT_EN 51
+#define CONTROL_GCR3TRPMODE 58
#define CONTROL_IRTCACHEDIS 59
#define CONTROL_SNPAVIC_EN 61
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index cfbc9ff105c3..b1c344ed7dbd 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1122,6 +1122,14 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
return;
iommu_feature_enable(iommu, CONTROL_GT_EN);
+
+ /*
+ * This feature needs to be enabled prior to a call
+ * to iommu_snp_enable(). Since this function is called
+ * in early_enable_iommu(), it is safe to enable here.
+ */
+ if (check_feature2(FEATURE_GCR3TRPMODE))
+ iommu_feature_enable(iommu, CONTROL_GCR3TRPMODE);
}
/* sets a specific bit in the device table entry. */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 07/13] iommu/amd: Add support for nest parent domain allocation
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (5 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 06/13] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 08/13] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
` (6 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To support nested translation, the nest parent domain is allocated with
IOMMU_HWPT_ALLOC_NEST_PARENT flag, and stores information of the v1 page
table for stage 2 (i.e. GPA->SPA).
Also, only support nest parent domain on AMD system, which can support
the Guest CR3 Table (GCR3TRPMode) feature. This feature is required in
order to program DTE[GCR3 Table Root Pointer] with the GPA.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6fea4ac97e3d..9b2fd852eb88 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2765,6 +2765,14 @@ static struct iommu_domain *amd_iommu_domain_alloc_paging_v2(struct device *dev,
return &domain->domain;
}
+static inline bool is_nest_parent_supported(u32 flags)
+{
+ /* Only allow nest parent when these features are supported */
+ return check_feature(FEATURE_GT) &&
+ check_feature(FEATURE_GIOSUP) &&
+ check_feature2(FEATURE_GCR3TRPMODE);
+}
+
static struct iommu_domain *
amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
const struct iommu_user_data *user_data)
@@ -2772,16 +2780,28 @@ amd_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
{
struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
- IOMMU_HWPT_ALLOC_PASID;
+ IOMMU_HWPT_ALLOC_PASID |
+ IOMMU_HWPT_ALLOC_NEST_PARENT;
if ((flags & ~supported_flags) || user_data)
return ERR_PTR(-EOPNOTSUPP);
switch (flags & supported_flags) {
case IOMMU_HWPT_ALLOC_DIRTY_TRACKING:
- /* Allocate domain with v1 page table for dirty tracking */
- if (!amd_iommu_hd_support(iommu))
+ case IOMMU_HWPT_ALLOC_NEST_PARENT:
+ case IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT:
+ /*
+ * Allocate domain with v1 page table for dirty tracking
+ * and/or Nest parent.
+ */
+ if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
+ !amd_iommu_hd_support(iommu))
+ break;
+
+ if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
+ !is_nest_parent_supported(flags))
break;
+
return amd_iommu_domain_alloc_paging_v1(dev, flags);
case IOMMU_HWPT_ALLOC_PASID:
/* Allocate domain with v2 page table if IOMMU supports PASID. */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 08/13] iommu/amd: Introduce struct amd_iommu_viommu
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (6 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 07/13] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 09/13] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
` (5 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Which stores reference to nested parent domain assigned during the call to
struct iommu_ops.viommu_init(). Information in the nest parent is needed
when setting up the nested translation.
Note that the viommu initialization will be introduced in subsequent
commit.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 6 ++++++
drivers/iommu/amd/iommu.c | 2 ++
drivers/iommu/amd/iommufd.c | 16 ++++++++++++++++
drivers/iommu/amd/iommufd.h | 5 +++++
4 files changed, 29 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index d8753841cd1f..d5b3393ab3a9 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -17,6 +17,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/pci.h>
+#include <linux/iommufd.h>
#include <linux/irqreturn.h>
#include <linux/generic_pt/iommu.h>
@@ -495,6 +496,11 @@ struct pdom_iommu_info {
u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */
};
+struct amd_iommu_viommu {
+ struct iommufd_viommu core;
+ struct protection_domain *parent; /* nest parent domain for this viommu */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9b2fd852eb88..ebc96f1f564f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3116,6 +3116,8 @@ const struct iommu_ops amd_iommu_ops = {
.is_attach_deferred = amd_iommu_is_attach_deferred,
.def_domain_type = amd_iommu_def_domain_type,
.page_response = amd_iommu_page_response,
+ .get_viommu_size = amd_iommufd_get_viommu_size,
+ .viommu_init = amd_iommufd_viommu_init,
};
#ifdef CONFIG_IRQ_REMAP
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
index 72eaaa923d04..eb6119bdcf12 100644
--- a/drivers/iommu/amd/iommufd.c
+++ b/drivers/iommu/amd/iommufd.c
@@ -29,3 +29,19 @@ void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
return hwinfo;
}
+
+size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type)
+{
+ return VIOMMU_STRUCT_SIZE(struct amd_iommu_viommu, core);
+}
+
+int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
+ const struct iommu_user_data *user_data)
+{
+ struct protection_domain *pdom = to_pdomain(parent);
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+
+ aviommu->parent = pdom;
+
+ return 0;
+}
diff --git a/drivers/iommu/amd/iommufd.h b/drivers/iommu/amd/iommufd.h
index f880be80a30d..f05aad495b5b 100644
--- a/drivers/iommu/amd/iommufd.h
+++ b/drivers/iommu/amd/iommufd.h
@@ -8,8 +8,13 @@
#if IS_ENABLED(CONFIG_AMD_IOMMU_IOMMUFD)
void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type);
+size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type viommu_type);
+int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
+ const struct iommu_user_data *user_data);
#else
#define amd_iommufd_hw_info NULL
+#define amd_iommufd_viommu_init NULL
+#define amd_iommufd_get_viommu_size NULL
#endif /* CONFIG_AMD_IOMMU_IOMMUFD */
#endif /* AMD_IOMMUFD_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 09/13] iommu/amd: Add support for nested domain allocation
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (7 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 08/13] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 10/13] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
` (4 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
The nested domain is allocated with IOMMU_DOMAIN_NESTED type to store
stage-1 translation (i.e. GVA->GPA). This includes the GCR3 root pointer
table along with guest page tables. The struct iommu_hwpt_amd_guest
contains this information, and is passed from user-space as a parameter
of the struct iommu_ops.domain_alloc_nested().
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 4 +
drivers/iommu/amd/amd_iommu_types.h | 14 ++++
drivers/iommu/amd/nested.c | 110 ++++++++++++++++++++++++++++
4 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/amd/nested.c
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 41f053b49dce..94b8ef2acb18 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += iommu.o init.o quirks.o ppr.o pasid.o
-obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o
+obj-$(CONFIG_AMD_IOMMU_IOMMUFD) += iommufd.o nested.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index d97b9b6d76d3..aa29afe96e90 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -202,4 +202,8 @@ amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry
new->data128[1] = 0;
}
+/* NESTED */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data);
#endif /* AMD_IOMMU_H */
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index d5b3393ab3a9..487ee6123de5 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -21,6 +21,8 @@
#include <linux/irqreturn.h>
#include <linux/generic_pt/iommu.h>
+#include <uapi/linux/iommufd.h>
+
/*
* Maximum number of IOMMUs supported
*/
@@ -353,6 +355,8 @@
#define DTE_FLAG_V BIT_ULL(0)
#define DTE_FLAG_TV BIT_ULL(1)
#define DTE_FLAG_HAD (3ULL << 7)
+#define DTE_MODE_MASK GENMASK_ULL(11, 9)
+#define DTE_HOST_TRP GENMASK_ULL(51, 12)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
@@ -501,6 +505,16 @@ struct amd_iommu_viommu {
struct protection_domain *parent; /* nest parent domain for this viommu */
};
+/*
+ * Nested domain is specifically used for nested translation
+ */
+struct nested_domain {
+ struct iommu_domain domain; /* generic domain handle used by iommu core code */
+ u16 gdom_id; /* domain ID from gDTE */
+ struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
+ struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
+};
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
new file mode 100644
index 000000000000..a8c0bb4dd733
--- /dev/null
+++ b/drivers/iommu/amd/nested.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#define dev_fmt(fmt) "AMD-Vi: " fmt
+
+#include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
+
+#include "amd_iommu.h"
+
+static const struct iommu_domain_ops nested_domain_ops;
+
+static inline struct nested_domain *to_ndomain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct nested_domain, domain);
+}
+
+/*
+ * Validate guest DTE to make sure that configuration for host (v1)
+ * and guest (v2) page tables are valid when allocating nested domain.
+ */
+static int validate_gdte_nested(struct iommu_hwpt_amd_guest *gdte)
+{
+ u32 gpt_level = FIELD_GET(DTE_GPT_LEVEL_MASK, gdte->dte[2]);
+
+ /* Must be zero: Mode, Host-TPR */
+ if (FIELD_GET(DTE_MODE_MASK, gdte->dte[0]) != 0 ||
+ FIELD_GET(DTE_HOST_TRP, gdte->dte[0]) != 0)
+ return -EINVAL;
+
+ /* GCR3 TRP must be non-zero if V, GV is set */
+ if (FIELD_GET(DTE_FLAG_V, gdte->dte[0]) == 1 &&
+ FIELD_GET(DTE_FLAG_GV, gdte->dte[0]) == 1 &&
+ FIELD_GET(DTE_GCR3_14_12, gdte->dte[0]) == 0 &&
+ FIELD_GET(DTE_GCR3_30_15, gdte->dte[1]) == 0 &&
+ FIELD_GET(DTE_GCR3_51_31, gdte->dte[1]) == 0)
+ return -EINVAL;
+
+ /* Valid Guest Paging Mode values are 0 and 1 */
+ if (gpt_level != GUEST_PGTABLE_4_LEVEL &&
+ gpt_level != GUEST_PGTABLE_5_LEVEL)
+ return -EINVAL;
+
+ /* GLX = 3 is reserved */
+ if (FIELD_GET(DTE_GLX, gdte->dte[0]) == 3)
+ return -EINVAL;
+
+ /*
+ * We need to check host capability before setting
+ * the Guest Paging Mode
+ */
+ if (gpt_level == GUEST_PGTABLE_5_LEVEL &&
+ amd_iommu_gpt_level < PAGE_MODE_5_LEVEL)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+/*
+ * This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
+ * during the call to struct iommu_ops.viommu_init().
+ */
+struct iommu_domain *
+amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
+ const struct iommu_user_data *user_data)
+{
+ int ret;
+ struct nested_domain *ndom;
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+
+ if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ndom = kzalloc(sizeof(*ndom), GFP_KERNEL);
+ if (!ndom)
+ return ERR_PTR(-ENOMEM);
+
+ ret = iommu_copy_struct_from_user(&ndom->gdte, user_data,
+ IOMMU_HWPT_DATA_AMD_GUEST,
+ dte);
+ if (ret)
+ goto out_err;
+
+ ret = validate_gdte_nested(&ndom->gdte);
+ if (ret)
+ goto out_err;
+
+ ndom->gdom_id = FIELD_GET(DTE_DOMID_MASK, ndom->gdte.dte[1]);
+ ndom->domain.ops = &nested_domain_ops;
+ ndom->domain.type = IOMMU_DOMAIN_NESTED;
+ ndom->viommu = aviommu;
+
+ return &ndom->domain;
+out_err:
+ kfree(ndom);
+ return ERR_PTR(ret);
+}
+
+static void nested_domain_free(struct iommu_domain *dom)
+{
+ struct nested_domain *ndom = to_ndomain(dom);
+
+ kfree(ndom);
+}
+
+static const struct iommu_domain_ops nested_domain_ops = {
+ .free = nested_domain_free,
+};
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 10/13] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (8 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 09/13] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-19 17:13 ` Jason Gunthorpe
2026-01-15 6:08 ` [PATCH v6 11/13] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
` (3 subsequent siblings)
13 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Each nested domain is assigned guest domain ID (gDomID), which guest OS
programs into guest Device Table Entry (gDTE). For each gDomID, the driver
assigns a corresponding host domain ID (hDomID), which will be programmed
into the host Device Table Entry (hDTE).
The hDomID is allocated during amd_iommu_alloc_domain_nested(),
and free during nested_domain_free(). The gDomID-to-hDomID mapping info
(struct guest_domain_mapping_info) is stored in a per-viommu xarray
(struct amd_iommu_viommu.gdomid_array), which is indexed by gDomID.
Note also that parent domain can be shared among struct iommufd_viommu.
Therefore, when hypervisor invalidates the nest parent domain, the AMD
IOMMU command INVALIDATE_IOMMU_PAGES must be issued for each hDomID in
the gdomid_array. This is handled by the iommu_flush_pages_v1_hdom_ids(),
where it iterates through struct protection_domain.viommu_list.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 23 ++++++
drivers/iommu/amd/iommu.c | 38 ++++++++++
drivers/iommu/amd/iommufd.c | 31 ++++++++
drivers/iommu/amd/nested.c | 111 ++++++++++++++++++++++++++++
4 files changed, 203 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 487ee6123de5..4a98ac7dca0f 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -503,6 +503,22 @@ struct pdom_iommu_info {
struct amd_iommu_viommu {
struct iommufd_viommu core;
struct protection_domain *parent; /* nest parent domain for this viommu */
+ struct list_head pdom_list; /* For protection_domain->viommu_list */
+
+ /*
+ * Per-vIOMMU guest domain ID to host domain ID mapping.
+ * Indexed by guest domain ID.
+ */
+ struct xarray gdomid_array;
+};
+
+/*
+ * Contains guest domain ID mapping info,
+ * which is stored in the struct xarray gdomid_array.
+ */
+struct guest_domain_mapping_info {
+ refcount_t users;
+ u32 hdom_id; /* Host domain ID */
};
/*
@@ -511,6 +527,7 @@ struct amd_iommu_viommu {
struct nested_domain {
struct iommu_domain domain; /* generic domain handle used by iommu core code */
u16 gdom_id; /* domain ID from gDTE */
+ struct guest_domain_mapping_info *gdom_info;
struct iommu_hwpt_amd_guest gdte; /* Guest vIOMMU DTE */
struct amd_iommu_viommu *viommu; /* AMD hw-viommu this nested domain belong to */
};
@@ -535,6 +552,12 @@ struct protection_domain {
struct mmu_notifier mn; /* mmu notifier for the SVA domain */
struct list_head dev_data_list; /* List of pdom_dev_data */
+
+ /*
+ * Store reference to list of vIOMMUs, which use this protection domain.
+ * This will be used to look up host domain ID when flushing this domain.
+ */
+ struct list_head viommu_list;
};
PT_IOMMU_CHECK_DOMAIN(struct protection_domain, iommu, domain);
PT_IOMMU_CHECK_DOMAIN(struct protection_domain, amdv1.iommu, domain);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ebc96f1f564f..e33076b99aac 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1539,6 +1539,32 @@ static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
iommu_completion_wait(iommu);
}
+static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
+{
+ int ret = 0;
+ struct amd_iommu_viommu *aviommu;
+
+ list_for_each_entry(aviommu, &pdom->viommu_list, pdom_list) {
+ unsigned long i;
+ struct guest_domain_mapping_info *gdom_info;
+ struct amd_iommu *iommu = container_of(aviommu->core.iommu_dev,
+ struct amd_iommu, iommu);
+
+ xa_lock(&aviommu->gdomid_array);
+ xa_for_each(&aviommu->gdomid_array, i, gdom_info) {
+ struct iommu_cmd cmd;
+
+ pr_debug("%s: iommu=%#x, hdom_id=%#x\n", __func__,
+ iommu->devid, gdom_info->hdom_id);
+ build_inv_iommu_pages(&cmd, address, size, gdom_info->hdom_id,
+ IOMMU_NO_PASID, false);
+ ret |= iommu_queue_command(iommu, &cmd);
+ }
+ xa_unlock(&aviommu->gdomid_array);
+ }
+ return ret;
+}
+
static void amd_iommu_flush_all(struct amd_iommu *iommu)
{
struct iommu_cmd cmd;
@@ -1687,6 +1713,17 @@ static int domain_flush_pages_v1(struct protection_domain *pdom,
ret |= iommu_queue_command(pdom_iommu_info->iommu, &cmd);
}
+ /*
+ * A domain w/ v1 table can be a nest parent, which can have
+ * multiple nested domains. Each nested domain has 1:1 mapping
+ * between gDomID and hDomID. Therefore, flush every hDomID
+ * associated to this nest parent domain.
+ *
+ * See drivers/iommu/amd/nested.c: amd_iommu_alloc_domain_nested()
+ */
+ if (!list_empty(&pdom->viommu_list))
+ ret |= iommu_flush_pages_v1_hdom_ids(pdom, address, size);
+
return ret;
}
@@ -2504,6 +2541,7 @@ static void protection_domain_init(struct protection_domain *domain)
spin_lock_init(&domain->lock);
INIT_LIST_HEAD(&domain->dev_list);
INIT_LIST_HEAD(&domain->dev_data_list);
+ INIT_LIST_HEAD(&domain->viommu_list);
xa_init(&domain->iommu_array);
}
diff --git a/drivers/iommu/amd/iommufd.c b/drivers/iommu/amd/iommufd.c
index eb6119bdcf12..2e50633d9c72 100644
--- a/drivers/iommu/amd/iommufd.c
+++ b/drivers/iommu/amd/iommufd.c
@@ -9,6 +9,8 @@
#include "amd_iommu.h"
#include "amd_iommu_types.h"
+static const struct iommufd_viommu_ops amd_viommu_ops;
+
void *amd_iommufd_hw_info(struct device *dev, u32 *length, u32 *type)
{
struct iommu_hw_info_amd *hwinfo;
@@ -38,10 +40,39 @@ size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type vi
int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
const struct iommu_user_data *user_data)
{
+ unsigned long flags;
struct protection_domain *pdom = to_pdomain(parent);
struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+ xa_init_flags(&aviommu->gdomid_array, XA_FLAGS_ALLOC1);
aviommu->parent = pdom;
+ viommu->ops = &amd_viommu_ops;
+
+ spin_lock_irqsave(&pdom->lock, flags);
+ list_add(&aviommu->pdom_list, &pdom->viommu_list);
+ spin_unlock_irqrestore(&pdom->lock, flags);
+
return 0;
}
+
+static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
+{
+ unsigned long flags;
+ struct amd_iommu *iommu = container_of(viommu->iommu_dev, struct amd_iommu, iommu);
+ struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
+ struct protection_domain *pdom = aviommu->parent;
+
+ spin_lock_irqsave(&pdom->lock, flags);
+ list_del(&aviommu->pdom_list);
+ spin_unlock_irqrestore(&pdom->lock, flags);
+ xa_destroy(&aviommu->gdomid_array);
+}
+
+/*
+ * See include/linux/iommufd.h
+ * struct iommufd_viommu_ops - vIOMMU specific operations
+ */
+static const struct iommufd_viommu_ops amd_viommu_ops = {
+ .destroy = amd_iommufd_viommu_destroy,
+};
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index a8c0bb4dd733..8154a773eed8 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -6,6 +6,7 @@
#define dev_fmt(fmt) "AMD-Vi: " fmt
#include <linux/iommu.h>
+#include <linux/refcount.h>
#include <uapi/linux/iommufd.h>
#include "amd_iommu.h"
@@ -58,6 +59,33 @@ static int validate_gdte_nested(struct iommu_hwpt_amd_guest *gdte)
return 0;
}
+static void *gdom_info_load_or_alloc_locked(struct xarray *xa, unsigned long index)
+{
+ struct guest_domain_mapping_info *elm, *res;
+
+ elm = xa_load(xa, index);
+ if (elm)
+ return elm;
+
+ xa_unlock(xa);
+ elm = kzalloc(sizeof(struct guest_domain_mapping_info), GFP_KERNEL);
+ xa_lock(xa);
+ if (!elm)
+ return ERR_PTR(-ENOMEM);
+
+ res = __xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL);
+ if (xa_is_err(res))
+ res = ERR_PTR(xa_err(res));
+
+ if (res) {
+ kfree(elm);
+ return res;
+ }
+
+ refcount_set(&elm->users, 0);
+ return elm;
+}
+
/*
* This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
* during the call to struct iommu_ops.viommu_init().
@@ -68,6 +96,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
{
int ret;
struct nested_domain *ndom;
+ struct guest_domain_mapping_info *gdom_info;
struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
@@ -92,7 +121,63 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
ndom->domain.type = IOMMU_DOMAIN_NESTED;
ndom->viommu = aviommu;
+ /*
+ * Normally, when a guest has multiple pass-through devices,
+ * the IOMMU driver setup DTEs with the same stage-2 table and
+ * use the same host domain ID (hDomId). In case of nested translation,
+ * if the guest setup different stage-1 tables with same PASID,
+ * IOMMU would use the same TLB tag. This will results in TLB
+ * aliasing issue.
+ *
+ * The guest is assigning gDomIDs based on its own algorithm for managing
+ * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
+ * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
+ * to a single hDomID. This is done using an xarray in the vIOMMU to
+ * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
+ * command must be issued for each hDomID in the xarray.
+ */
+ xa_lock(&aviommu->gdomid_array);
+
+ gdom_info = gdom_info_load_or_alloc_locked(&aviommu->gdomid_array, ndom->gdom_id);
+ if (IS_ERR(gdom_info)) {
+ xa_unlock(&aviommu->gdomid_array);
+ ret = PTR_ERR(gdom_info);
+ goto out_err;
+ }
+
+ /* Check if gDomID exist */
+ if (refcount_inc_not_zero(&gdom_info->users)) {
+ ndom->gdom_info = gdom_info;
+ xa_unlock(&aviommu->gdomid_array);
+
+ pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, gdom_info->hdom_id);
+
+ return &ndom->domain;
+ }
+
+ /* The gDomID does not exist. We allocate new hdom_id */
+ gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
+ if (gdom_info->hdom_id <= 0) {
+ __xa_cmpxchg(&aviommu->gdomid_array,
+ ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
+ xa_unlock(&aviommu->gdomid_array);
+ ret = -ENOSPC;
+ goto out_err_gdom_info;
+ }
+
+ ndom->gdom_info = gdom_info;
+ refcount_set(&gdom_info->users, 1);
+
+ xa_unlock(&aviommu->gdomid_array);
+
+ pr_debug("%s: Allocate gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, gdom_info->hdom_id);
+
return &ndom->domain;
+
+out_err_gdom_info:
+ kfree(gdom_info);
out_err:
kfree(ndom);
return ERR_PTR(ret);
@@ -100,8 +185,34 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
static void nested_domain_free(struct iommu_domain *dom)
{
+ struct guest_domain_mapping_info *curr;
struct nested_domain *ndom = to_ndomain(dom);
+ struct amd_iommu_viommu *aviommu = ndom->viommu;
+
+ xa_lock(&aviommu->gdomid_array);
+
+ if (!refcount_dec_and_test(&ndom->gdom_info->users)) {
+ xa_unlock(&aviommu->gdomid_array);
+ return;
+ }
+
+ /*
+ * The refcount for the gdom_id to hdom_id mapping is zero.
+ * It is now safe to remove the mapping.
+ */
+ curr = __xa_cmpxchg(&aviommu->gdomid_array, ndom->gdom_id,
+ ndom->gdom_info, NULL, GFP_ATOMIC);
+
+ xa_unlock(&aviommu->gdomid_array);
+ if (WARN_ON(!curr || xa_err(curr)))
+ return;
+
+ /* success */
+ pr_debug("%s: Free gdom_id=%#x, hdom_id=%#x\n",
+ __func__, ndom->gdom_id, curr->hdom_id);
+ amd_iommu_pdom_id_free(ndom->gdom_info->hdom_id);
+ kfree(curr);
kfree(ndom);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 11/13] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte()
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (9 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 10/13] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 12/13] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
` (2 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
To help avoid duplicate logic when programing DTE for nested translation.
Note that this commit changes behavior of when the IOMMU driver is
switching domain during attach and the blocking domain, where DTE bit
fields for interrupt pass-through (i.e. Lint0, Lint1, NMI, INIT, ExtInt)
and System management message could be affected. These DTE bits are
specified in the IVRS table for specific devices, and should be persistent.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 13 +++++++++++++
drivers/iommu/amd/iommu.c | 11 -----------
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index aa29afe96e90..00fc9c6073de 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -197,9 +197,22 @@ void amd_iommu_update_dte(struct amd_iommu *iommu,
static inline void
amd_iommu_make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *new)
{
+ struct dev_table_entry *initial_dte;
+ struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
+
/* All existing DTE must have V bit set */
new->data128[0] = DTE_FLAG_V;
new->data128[1] = 0;
+
+ /*
+ * Restore cached persistent DTE bits, which can be set by information
+ * in IVRS table. See set_dev_entry_from_acpi().
+ */
+ initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
+ if (initial_dte) {
+ new->data128[0] |= initial_dte->data128[0];
+ new->data128[1] |= initial_dte->data128[1];
+ }
}
/* NESTED */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e33076b99aac..dafd34465fc0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2106,7 +2106,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
{
u16 domid;
u32 old_domid;
- struct dev_table_entry *initial_dte;
struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
@@ -2164,16 +2163,6 @@ static void set_dte_entry(struct amd_iommu *iommu,
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
new.data[1] |= domid;
- /*
- * Restore cached persistent DTE bits, which can be set by information
- * in IVRS table. See set_dev_entry_from_acpi().
- */
- initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
- if (initial_dte) {
- new.data128[0] |= initial_dte->data128[0];
- new.data128[1] |= initial_dte->data128[1];
- }
-
set_dte_gcr3_table(iommu, dev_data, &new);
amd_iommu_update_dte(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 12/13] iommu/amd: Refactor logic to program the host page table in DTE
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (10 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 11/13] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 13/13] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2026-01-18 9:56 ` [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Jörg Rödel
13 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce the amd_iommu_set_dte_v1() helper function to configure
IOMMU host (v1) page table into DTE. This will be used later
when attaching nested doamin.
Also, remove obsolete warning when SNP is enabled and domain id
is zero since this check is no longer applicable.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 4 +
drivers/iommu/amd/amd_iommu_types.h | 1 +
drivers/iommu/amd/iommu.c | 150 ++++++++++++++--------------
3 files changed, 82 insertions(+), 73 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 00fc9c6073de..02f10922f70b 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -190,6 +190,10 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain, u16 domid,
+ struct pt_iommu_amdv1_hw_info *pt_info,
+ struct dev_table_entry *new);
void amd_iommu_update_dte(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *new);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 4a98ac7dca0f..cfcbad6c28ff 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -357,6 +357,7 @@
#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_MODE_MASK GENMASK_ULL(11, 9)
#define DTE_HOST_TRP GENMASK_ULL(51, 12)
+#define DTE_FLAG_PPR BIT_ULL(52)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX GENMASK_ULL(57, 56)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index dafd34465fc0..17b0f48f1721 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2068,102 +2068,106 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
* Note:
* The old value for GCR3 table and GPT have been cleared from caller.
*/
-static void set_dte_gcr3_table(struct amd_iommu *iommu,
- struct iommu_dev_data *dev_data,
- struct dev_table_entry *target)
+static void set_dte_gcr3_table(struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
- u64 gcr3;
+ u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- if (!gcr3_info->gcr3_tbl)
- return;
-
- pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
- __func__, dev_data->devid, gcr3_info->glx,
- (unsigned long long)gcr3_info->gcr3_tbl);
-
- gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+ new->data[0] |= DTE_FLAG_TV |
+ (dev_data->ppr ? DTE_FLAG_PPR : 0) |
+ (pdom_is_v2_pgtbl_mode(dev_data->domain) ? DTE_FLAG_GIOV : 0) |
+ DTE_FLAG_GV |
+ FIELD_PREP(DTE_GLX, gcr3_info->glx) |
+ FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
+ DTE_FLAG_IR | DTE_FLAG_IW;
- target->data[0] |= DTE_FLAG_GV |
- FIELD_PREP(DTE_GLX, gcr3_info->glx) |
- FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
- if (pdom_is_v2_pgtbl_mode(dev_data->domain))
- target->data[0] |= DTE_FLAG_GIOV;
-
- target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
- FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
+ new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid) |
+ FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
+ (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
+ FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
/* Guest page table can only support 4 and 5 levels */
if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
- target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
+ new->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
else
- target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
+ new->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
+}
+
+void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain, u16 domid,
+ struct pt_iommu_amdv1_hw_info *pt_info,
+ struct dev_table_entry *new)
+{
+ u64 host_pt_root = __sme_set(pt_info->host_pt_root);
+
+ /* Note Dirty tracking is used for v1 table only for now */
+ new->data[0] |= DTE_FLAG_TV |
+ FIELD_PREP(DTE_MODE_MASK, pt_info->mode) |
+ (domain->dirty_tracking ? DTE_FLAG_HAD : 0) |
+ FIELD_PREP(DTE_HOST_TRP, host_pt_root >> 12) |
+ DTE_FLAG_IR | DTE_FLAG_IW;
+
+ new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid) |
+ (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
+}
+
+static void set_dte_v1(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain, u16 domid,
+ phys_addr_t top_paddr, unsigned int top_level,
+ struct dev_table_entry *new)
+{
+ struct pt_iommu_amdv1_hw_info pt_info;
+
+ /*
+ * When updating the IO pagetable, the new top and level
+ * are provided as parameters. For other operations i.e.
+ * device attach, retrieve the current pagetable info
+ * via the IOMMU PT API.
+ */
+ if (top_paddr) {
+ pt_info.host_pt_root = top_paddr;
+ pt_info.mode = top_level + 1;
+ } else {
+ WARN_ON(top_paddr || top_level);
+ pt_iommu_amdv1_hw_info(&domain->amdv1, &pt_info);
+ }
+
+ amd_iommu_set_dte_v1(dev_data, domain, domid, &pt_info, new);
+}
+
+static void set_dte_passthrough(struct iommu_dev_data *dev_data,
+ struct protection_domain *domain,
+ struct dev_table_entry *new)
+{
+ new->data[0] |= DTE_FLAG_TV | DTE_FLAG_IR | DTE_FLAG_IW;
+
+ new->data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id) |
+ (dev_data->ats_enabled) ? DTE_FLAG_IOTLB : 0;
}
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
phys_addr_t top_paddr, unsigned int top_level)
{
- u16 domid;
u32 old_domid;
struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- struct pt_iommu_amdv1_hw_info pt_info;
amd_iommu_make_clear_dte(dev_data, &new);
- if (gcr3_info && gcr3_info->gcr3_tbl)
- domid = dev_data->gcr3_info.domid;
- else {
- domid = domain->id;
-
- if (domain->domain.type & __IOMMU_DOMAIN_PAGING) {
- /*
- * When updating the IO pagetable, the new top and level
- * are provided as parameters. For other operations i.e.
- * device attach, retrieve the current pagetable info
- * via the IOMMU PT API.
- */
- if (top_paddr) {
- pt_info.host_pt_root = top_paddr;
- pt_info.mode = top_level + 1;
- } else {
- WARN_ON(top_paddr || top_level);
- pt_iommu_amdv1_hw_info(&domain->amdv1,
- &pt_info);
- }
-
- new.data[0] |= __sme_set(pt_info.host_pt_root) |
- (pt_info.mode & DEV_ENTRY_MODE_MASK)
- << DEV_ENTRY_MODE_SHIFT;
- }
- }
-
- new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW;
-
- /*
- * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
- * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
- * do_iommu_domain_alloc().
- */
- WARN_ON(amd_iommu_snp_en && (domid == 0));
- new.data[0] |= DTE_FLAG_TV;
-
- if (dev_data->ppr)
- new.data[0] |= 1ULL << DEV_ENTRY_PPR;
-
- if (domain->dirty_tracking)
- new.data[0] |= DTE_FLAG_HAD;
-
- if (dev_data->ats_enabled)
- new.data[1] |= DTE_FLAG_IOTLB;
-
old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
- new.data[1] |= domid;
-
- set_dte_gcr3_table(iommu, dev_data, &new);
+ if (gcr3_info->gcr3_tbl)
+ set_dte_gcr3_table(dev_data, &new);
+ else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
+ set_dte_passthrough(dev_data, domain, &new);
+ else if ((domain->domain.type & __IOMMU_DOMAIN_PAGING) &&
+ domain->pd_mode == PD_MODE_V1)
+ set_dte_v1(dev_data, domain, domain->id, top_paddr, top_level, &new);
+ else
+ WARN_ON(true);
amd_iommu_update_dte(iommu, dev_data, &new);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v6 13/13] iommu/amd: Add support for nested domain attach/detach
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (11 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 12/13] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
@ 2026-01-15 6:08 ` Suravee Suthikulpanit
2026-01-19 17:15 ` Jason Gunthorpe
2026-01-18 9:56 ` [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Jörg Rödel
13 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulpanit @ 2026-01-15 6:08 UTC (permalink / raw)
To: jgg, nicolinc
Cc: linux-kernel, robin.murphy, will, joro, kevin.tian, jsnitsel,
vasant.hegde, iommu, santosh.shukla, sairaj.arunkodilkar,
jon.grimm, prashanthpra, wvw, wnliu, gptran, kpsingh,
joao.m.martins, alejandro.j.jimenez, Suravee Suthikulpanit
Introduce set_dte_nested() to program guest translation settings in
the host DTE when attaches the nested domain to a device.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/nested.c | 73 ++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
diff --git a/drivers/iommu/amd/nested.c b/drivers/iommu/amd/nested.c
index 8154a773eed8..66cc36133c8b 100644
--- a/drivers/iommu/amd/nested.c
+++ b/drivers/iommu/amd/nested.c
@@ -183,6 +183,78 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
return ERR_PTR(ret);
}
+static void set_dte_nested(struct amd_iommu *iommu, struct iommu_domain *dom,
+ struct iommu_dev_data *dev_data, struct dev_table_entry *new)
+{
+ struct protection_domain *parent;
+ struct nested_domain *ndom = to_ndomain(dom);
+ struct iommu_hwpt_amd_guest *gdte = &ndom->gdte;
+ struct pt_iommu_amdv1_hw_info pt_info;
+
+ /*
+ * The nest parent domain is attached during the call to the
+ * struct iommu_ops.viommu_init(), which will be stored as part
+ * of the struct amd_iommu_viommu.parent.
+ */
+ if (WARN_ON(!ndom->viommu || !ndom->viommu->parent))
+ return;
+
+ parent = ndom->viommu->parent;
+ amd_iommu_make_clear_dte(dev_data, new);
+
+ /* Retrieve the current pagetable info via the IOMMU PT API. */
+ pt_iommu_amdv1_hw_info(&parent->amdv1, &pt_info);
+
+ /*
+ * Use domain ID from nested domain to program DTE.
+ * See amd_iommu_alloc_domain_nested().
+ */
+ amd_iommu_set_dte_v1(dev_data, parent, ndom->gdom_info->hdom_id,
+ &pt_info, new);
+
+ /* GV is required for nested page table */
+ new->data[0] |= DTE_FLAG_GV;
+
+ /* Guest PPR */
+ new->data[0] |= gdte->dte[0] & DTE_FLAG_PPR;
+
+ /* Guest translation stuff */
+ new->data[0] |= gdte->dte[0] & (DTE_GLX | DTE_FLAG_GIOV);
+
+ /* GCR3 table */
+ new->data[0] |= gdte->dte[0] & DTE_GCR3_14_12;
+ new->data[1] |= gdte->dte[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31);
+
+ /* Guest paging mode */
+ new->data[2] |= gdte->dte[2] & DTE_GPT_LEVEL_MASK;
+}
+
+static int nested_attach_device(struct iommu_domain *dom, struct device *dev,
+ struct iommu_domain *old)
+{
+ struct dev_table_entry new = {0};
+ struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+ struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
+ int ret = 0;
+
+ /*
+ * Needs to make sure PASID is not enabled
+ * for this attach path.
+ */
+ if (WARN_ON(dev_data->pasid_enabled))
+ return -EINVAL;
+
+ mutex_lock(&dev_data->mutex);
+
+ set_dte_nested(iommu, dom, dev_data, &new);
+
+ amd_iommu_update_dte(iommu, dev_data, &new);
+
+ mutex_unlock(&dev_data->mutex);
+
+ return ret;
+}
+
static void nested_domain_free(struct iommu_domain *dom)
{
struct guest_domain_mapping_info *curr;
@@ -217,5 +289,6 @@ static void nested_domain_free(struct iommu_domain *dom)
}
static const struct iommu_domain_ops nested_domain_ops = {
+ .attach_dev = nested_attach_device,
.free = nested_domain_free,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
` (12 preceding siblings ...)
2026-01-15 6:08 ` [PATCH v6 13/13] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2026-01-18 9:56 ` Jörg Rödel
13 siblings, 0 replies; 17+ messages in thread
From: Jörg Rödel @ 2026-01-18 9:56 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: jgg, nicolinc, linux-kernel, robin.murphy, will, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Jan 15, 2026 at 06:08:01AM +0000, Suravee Suthikulpanit wrote:
> Suravee Suthikulpanit (13):
> iommu/amd: Add support for hw_info for iommu capability query
> iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK
> iommu/amd: Make amd_iommu_make_clear_dte() non-static inline
> iommu/amd: Introduce helper function amd_iommu_update_dte()
> iommufd: Introduce data struct for AMD nested domain allocation
> iommu/amd: Always enable GCR3TRPMode when supported.
> iommu/amd: Add support for nest parent domain allocation
> iommu/amd: Introduce struct amd_iommu_viommu
> iommu/amd: Add support for nested domain allocation
> iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain
> invalidation
> iommu/amd: Refactor persistent DTE bits programming into
> amd_iommu_make_clear_dte()
> iommu/amd: Refactor logic to program the host page table in DTE
> iommu/amd: Add support for nested domain attach/detach
Applied, thanks Suravee.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6 10/13] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation
2026-01-15 6:08 ` [PATCH v6 10/13] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
@ 2026-01-19 17:13 ` Jason Gunthorpe
0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2026-01-19 17:13 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Jan 15, 2026 at 06:08:11AM +0000, Suravee Suthikulpanit wrote:
> +static int iommu_flush_pages_v1_hdom_ids(struct protection_domain *pdom, u64 address, size_t size)
> +{
> + int ret = 0;
> + struct amd_iommu_viommu *aviommu;
> +
> + list_for_each_entry(aviommu, &pdom->viommu_list, pdom_list) {
> + unsigned long i;
You should have some lockdeps here for this list iteration..
> +static void *gdom_info_load_or_alloc_locked(struct xarray *xa, unsigned long index)
> +{
> + struct guest_domain_mapping_info *elm, *res;
> +
> + elm = xa_load(xa, index);
> + if (elm)
> + return elm;
> +
> + xa_unlock(xa);
> + elm = kzalloc(sizeof(struct guest_domain_mapping_info), GFP_KERNEL);
> + xa_lock(xa);
> + if (!elm)
> + return ERR_PTR(-ENOMEM);
> +
> + res = __xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL);
> + if (xa_is_err(res))
> + res = ERR_PTR(xa_err(res));
> +
> + if (res) {
> + kfree(elm);
> + return res;
> + }
> +
> + refcount_set(&elm->users, 0);
> + return elm;
> +}
> +
> /*
> * This function is assigned to struct iommufd_viommu_ops.alloc_domain_nested()
> * during the call to struct iommu_ops.viommu_init().
> @@ -68,6 +96,7 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> {
> int ret;
> struct nested_domain *ndom;
> + struct guest_domain_mapping_info *gdom_info;
> struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
>
> if (user_data->type != IOMMU_HWPT_DATA_AMD_GUEST)
> @@ -92,7 +121,63 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> ndom->domain.type = IOMMU_DOMAIN_NESTED;
> ndom->viommu = aviommu;
>
> + /*
> + * Normally, when a guest has multiple pass-through devices,
> + * the IOMMU driver setup DTEs with the same stage-2 table and
> + * use the same host domain ID (hDomId). In case of nested translation,
> + * if the guest setup different stage-1 tables with same PASID,
> + * IOMMU would use the same TLB tag. This will results in TLB
> + * aliasing issue.
> + *
> + * The guest is assigning gDomIDs based on its own algorithm for managing
> + * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
> + * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
> + * to a single hDomID. This is done using an xarray in the vIOMMU to
> + * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
> + * command must be issued for each hDomID in the xarray.
> + */
> + xa_lock(&aviommu->gdomid_array);
> +
> + gdom_info = gdom_info_load_or_alloc_locked(&aviommu->gdomid_array, ndom->gdom_id);
> + if (IS_ERR(gdom_info)) {
> + xa_unlock(&aviommu->gdomid_array);
> + ret = PTR_ERR(gdom_info);
> + goto out_err;
> + }
> +
> + /* Check if gDomID exist */
> + if (refcount_inc_not_zero(&gdom_info->users)) {
> + ndom->gdom_info = gdom_info;
> + xa_unlock(&aviommu->gdomid_array);
This is pretty tortured, the alloc flow inside
gdom_info_load_or_alloc_locked() should do the
amd_iommu_pdom_id_alloc() and set the refcount to 1 before installing
it in the xarray, then you don't need any of this here.
> + /* The gDomID does not exist. We allocate new hdom_id */
> + gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
Then this allocation wouldn't have to be ATOMIC.
But it looks working the way it is so no rush
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v6 13/13] iommu/amd: Add support for nested domain attach/detach
2026-01-15 6:08 ` [PATCH v6 13/13] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
@ 2026-01-19 17:15 ` Jason Gunthorpe
0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2026-01-19 17:15 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: nicolinc, linux-kernel, robin.murphy, will, joro, kevin.tian,
jsnitsel, vasant.hegde, iommu, santosh.shukla,
sairaj.arunkodilkar, jon.grimm, prashanthpra, wvw, wnliu, gptran,
kpsingh, joao.m.martins, alejandro.j.jimenez
On Thu, Jan 15, 2026 at 06:08:14AM +0000, Suravee Suthikulpanit wrote:
> +static int nested_attach_device(struct iommu_domain *dom, struct device *dev,
> + struct iommu_domain *old)
> +{
> + struct dev_table_entry new = {0};
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data);
> + int ret = 0;
> +
> + /*
> + * Needs to make sure PASID is not enabled
> + * for this attach path.
> + */
> + if (WARN_ON(dev_data->pasid_enabled))
> + return -EINVAL;
Well, that's one way, but a rather big hammer as we do want to support
assigning PASID capable functions to VMs.
You have it on your list to fix it up properly?
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-01-19 17:15 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-15 6:08 [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 01/13] iommu/amd: Add support for hw_info for iommu capability query Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 02/13] iommu/amd: Rename DEV_DOMID_MASK to DTE_DOMID_MASK Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 03/13] iommu/amd: Make amd_iommu_make_clear_dte() non-static inline Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 04/13] iommu/amd: Introduce helper function amd_iommu_update_dte() Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 05/13] iommufd: Introduce data struct for AMD nested domain allocation Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 06/13] iommu/amd: Always enable GCR3TRPMode when supported Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 07/13] iommu/amd: Add support for nest parent domain allocation Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 08/13] iommu/amd: Introduce struct amd_iommu_viommu Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 09/13] iommu/amd: Add support for nested domain allocation Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 10/13] iommu/amd: Introduce gDomID-to-hDomID Mapping and handle parent domain invalidation Suravee Suthikulpanit
2026-01-19 17:13 ` Jason Gunthorpe
2026-01-15 6:08 ` [PATCH v6 11/13] iommu/amd: Refactor persistent DTE bits programming into amd_iommu_make_clear_dte() Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 12/13] iommu/amd: Refactor logic to program the host page table in DTE Suravee Suthikulpanit
2026-01-15 6:08 ` [PATCH v6 13/13] iommu/amd: Add support for nested domain attach/detach Suravee Suthikulpanit
2026-01-19 17:15 ` Jason Gunthorpe
2026-01-18 9:56 ` [PATCH v6 00/13] iommu/amd: Introduce Nested Translation support Jörg Rödel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox