* [PATCH v6 1/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 2/9] asm/rwonce: Introduce [READ|WRITE]_ONCE() support for __int128 Suravee Suthikulpanit
` (8 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
According to the AMD IOMMU spec, IOMMU hardware reads the entire DTE
in a single 256-bit transaction. It is recommended to update DTE using
128-bit operation followed by an INVALIDATE_DEVTAB_ENTYRY command when
the IV=1b or V=1b before the change.
According to the AMD BIOS and Kernel Developer's Guide (BDKG) dated back
to family 10h Processor [1], which is the first introduction of AMD IOMMU,
AMD processor always has CPUID Fn0000_0001_ECX[CMPXCHG16B]=1.
Therefore, it is safe to assume cmpxchg128 is available with all AMD
processor w/ IOMMU.
In addition, the CMPXCHG16B feature has already been checked separately
before enabling the GA, XT, and GAM modes. Consolidate the detection logic,
and fail the IOMMU initialization if the feature is not supported.
[1] https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/init.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 43131c3a2172..a1a0bd398c14 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1764,13 +1764,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling it.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+ /* GAM requires GA mode. */
+ if ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1780,13 +1775,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * XT, GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling them.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
+ /* XT and GAM require GA mode. */
+ if ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0) {
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
}
@@ -3051,6 +3041,11 @@ static int __init early_amd_iommu_init(void)
return -EINVAL;
}
+ if (!boot_cpu_has(X86_FEATURE_CX16)) {
+ pr_err("Failed to initialize. The CMPXCHG16B feature is required.\n");
+ return -EINVAL;
+ }
+
/*
* Validate checksum here so we don't need to do it when
* we actually parse the table
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v6 2/9] asm/rwonce: Introduce [READ|WRITE]_ONCE() support for __int128
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 1/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 13:08 ` Jason Gunthorpe
2024-10-16 5:17 ` [PATCH v6 3/9] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
` (7 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Uros Bizjak,
Suravee Suthikulpanit
From: Uros Bizjak <ubizjak@gmail.com>
Currently, [READ|WRITE]_ONCE() do not support variable of type __int128.
Re-define "__dword_type" from type "long long" to __int128 if supported.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
include/asm-generic/rwonce.h | 2 +-
include/linux/compiler_types.h | 8 +++++++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h
index 8d0a6280e982..8bf942ad5ef3 100644
--- a/include/asm-generic/rwonce.h
+++ b/include/asm-generic/rwonce.h
@@ -33,7 +33,7 @@
* (e.g. a virtual address) and a strong prevailing wind.
*/
#define compiletime_assert_rwonce_type(t) \
- compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
+ compiletime_assert(__native_word(t) || sizeof(t) == sizeof(__dword_type), \
"Unsupported access size for {READ,WRITE}_ONCE().")
/*
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 1a957ea2f4fe..54b56ae25db7 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -469,6 +469,12 @@ struct ftrace_likely_data {
unsigned type: (unsigned type)0, \
signed type: (signed type)0
+#ifdef __SIZEOF_INT128__
+#define __dword_type __int128
+#else
+#define __dword_type long long
+#endif
+
#define __unqual_scalar_typeof(x) typeof( \
_Generic((x), \
char: (char)0, \
@@ -476,7 +482,7 @@ struct ftrace_likely_data {
__scalar_type_to_expr_cases(short), \
__scalar_type_to_expr_cases(int), \
__scalar_type_to_expr_cases(long), \
- __scalar_type_to_expr_cases(long long), \
+ __scalar_type_to_expr_cases(__dword_type), \
default: (x)))
/* Is this type a native word size -- useful for atomic operations */
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v6 2/9] asm/rwonce: Introduce [READ|WRITE]_ONCE() support for __int128
2024-10-16 5:17 ` [PATCH v6 2/9] asm/rwonce: Introduce [READ|WRITE]_ONCE() support for __int128 Suravee Suthikulpanit
@ 2024-10-16 13:08 ` Jason Gunthorpe
0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 13:08 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand, Uros Bizjak
On Wed, Oct 16, 2024 at 05:17:49AM +0000, Suravee Suthikulpanit wrote:
> From: Uros Bizjak <ubizjak@gmail.com>
>
> Currently, [READ|WRITE]_ONCE() do not support variable of type __int128.
> Re-define "__dword_type" from type "long long" to __int128 if supported.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> include/asm-generic/rwonce.h | 2 +-
> include/linux/compiler_types.h | 8 +++++++-
> 2 files changed, 8 insertions(+), 2 deletions(-)
I guess it makes sense that the "strong prevailing wind" would apply
to 2x u64 reads as well as 2x u32 reads. Though use with caution..
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v6 3/9] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 1/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 2/9] asm/rwonce: Introduce [READ|WRITE]_ONCE() support for __int128 Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 4/9] iommu/amd: Introduce per-device DTE cache to store persistent bits Suravee Suthikulpanit
` (6 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
The current implementation does not follow 128-bit write requirement
to update DTE as specified in the AMD I/O Virtualization Techonology
(IOMMU) Specification.
Therefore, modify the struct dev_table_entry to contain union of u128 data
array, and introduce a helper functions update_dte256() to update DTE using
two 128-bit cmpxchg operations to update 256-bit DTE with the modified
structure, and take into account the DTE[V, GV] bits when programming
the DTE to ensure proper order of DTE programming and flushing.
In addition, introduce a per-DTE spin_lock struct dev_data.dte_lock to
provide synchronization when updating the DTE to prevent cmpxchg128
failure.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 10 ++-
drivers/iommu/amd/iommu.c | 112 ++++++++++++++++++++++++++++
2 files changed, 121 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 601fb4ee6900..f537b264f118 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -425,9 +425,13 @@
#define DTE_GCR3_SHIFT_C 43
#define DTE_GPT_LEVEL_SHIFT 54
+#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
#define GCR3_VALID 0x01ULL
+/* DTE[128:179] | DTE[184:191] */
+#define DTE_DATA2_INTR_MASK ~GENMASK_ULL(55, 52)
+
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
@@ -832,6 +836,7 @@ struct devid_map {
struct iommu_dev_data {
/*Protect against attach/detach races */
spinlock_t lock;
+ spinlock_t dte_lock; /* DTE lock for 256-bit access */
struct list_head list; /* For domain->dev_list */
struct llist_node dev_data_list; /* For global dev_data_list */
@@ -882,7 +887,10 @@ extern struct amd_iommu *amd_iommus[MAX_IOMMUS];
* Structure defining one entry in the device table
*/
struct dev_table_entry {
- u64 data[4];
+ union {
+ u64 data[4];
+ u128 data128[2];
+ };
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 8364cd6fa47d..ab0d3f46871e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -77,12 +77,114 @@ static void detach_device(struct device *dev);
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data);
+static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid);
+
/****************************************************************************
*
* Helper functions
*
****************************************************************************/
+static void write_dte_upper128(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old = {};
+
+ old.data128[1] = READ_ONCE(ptr->data128[1]);
+ do {
+ /* Need to preserve DTE_DATA2_INTR_MASK */
+ new->data[2] &= ~DTE_DATA2_INTR_MASK;
+ new->data[2] |= old.data[2] & DTE_DATA2_INTR_MASK;
+
+ /* Note: try_cmpxchg inherently update &old.data128[1] on failure */
+ } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1], new->data128[1]));
+}
+
+static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old = {};
+
+ old.data128[0] = READ_ONCE(ptr->data128[0]);
+ do {
+ /* Note: try_cmpxchg inherently update &old.data128[0] on failure */
+ } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0]));
+}
+
+/*
+ * Note:
+ * IOMMU reads the entire Device Table entry in a single 256-bit transaction
+ * but the driver is programming DTE using 2 128-bit cmpxchg. So, the driver
+ * need to ensure the following:
+ * - DTE[V|GV] bit is being written last when setting.
+ * - DTE[V|GV] bit is being written first when clearing.
+ *
+ * This function is used only by code, which updates DMA translation part of the DTE.
+ * So, only consider control bits related to DMA when updating the entry.
+ */
+static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
+{
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry *ptr = &dev_table[dev_data->devid];
+
+ spin_lock(&dev_data->dte_lock);
+
+ if (!(ptr->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is not valid. */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!(new->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is valid. New DTE is not valid. */
+ write_dte_lower128(ptr, new);
+ write_dte_upper128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
+ /*
+ * Both DTEs are valid.
+ * Existing DTE has no guest page table.
+ */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
+ /*
+ * Both DTEs are valid.
+ * Existing DTE has guest page table,
+ * new DTE has no guest page table,
+ */
+ write_dte_lower128(ptr, new);
+ write_dte_upper128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (FIELD_GET(DTE_GPT_LEVEL_MASK, ptr->data[2]) !=
+ FIELD_GET(DTE_GPT_LEVEL_MASK, new->data[2])) {
+ /*
+ * Both DTEs are valid and have guest page table,
+ * but have different number of levels. So, we need
+ * to upadte both upper and lower 128-bit value, which
+ * require disabling and flushing.
+ */
+ struct dev_table_entry clear = {};
+
+ /* First disable DTE */
+ write_dte_lower128(ptr, &clear);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+
+ /* Then update DTE */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else {
+ /*
+ * Both DTEs are valid and have guest page table,
+ * and same number of levels. We just need to only
+ * update the lower 128-bit. So no need to disable DTE.
+ */
+ write_dte_lower128(ptr, new);
+ }
+
+ spin_unlock(&dev_data->dte_lock);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -203,6 +305,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
return NULL;
spin_lock_init(&dev_data->lock);
+ spin_lock_init(&dev_data->dte_lock);
dev_data->devid = devid;
ratelimit_default_init(&dev_data->rs);
@@ -1272,6 +1375,15 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
return iommu_queue_command(iommu, &cmd);
}
+static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
+{
+ int ret;
+
+ ret = iommu_flush_dte(iommu, devid);
+ if (!ret)
+ iommu_completion_wait(iommu);
+}
+
static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
{
u32 devid;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v6 4/9] iommu/amd: Introduce per-device DTE cache to store persistent bits
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (2 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 3/9] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 13:21 ` Jason Gunthorpe
2024-10-16 5:17 ` [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
` (5 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
Currently, IOMMU driver initializes each Device Table Entry (DTE) starting
from when it parses the ACPI IVRS table during one-time initialization,
and the value is directly programmed into the table. The value is stored
in the table until next system reset. This makes the DTE programming
difficult since it needs to ensure that all persistent DTE bits are not
overwritten during runtime.
Introduce per-device DTE cache to store persistent DTE bits.
Please note also that the amd_iommu_apply_erratum_63() is not updated since
it will be removed in subsequent patch.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 2 ++
drivers/iommu/amd/amd_iommu_types.h | 21 +++++++++++----------
drivers/iommu/amd/init.c | 26 +++++++++++++++++++-------
drivers/iommu/amd/iommu.c | 2 +-
4 files changed, 33 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 6386fa4556d9..96c3bfc234f8 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -177,3 +177,5 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
#endif
+
+struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index f537b264f118..3f53d3bc79cb 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -830,6 +830,16 @@ struct devid_map {
/* Device may request super-user privileges */
#define AMD_IOMMU_DEVICE_FLAG_PRIV_SUP 0x10
+/*
+ * Structure defining one entry in the device table
+ */
+struct dev_table_entry {
+ union {
+ u64 data[4];
+ u128 data128[2];
+ };
+};
+
/*
* This struct contains device specific data for the IOMMU
*/
@@ -858,6 +868,7 @@ struct iommu_dev_data {
bool defer_attach;
struct ratelimit_state rs; /* Ratelimit IOPF messages */
+ struct dev_table_entry dte_cache;
};
/* Map HPET and IOAPIC ids to the devid used by the IOMMU */
@@ -883,16 +894,6 @@ extern struct list_head amd_iommu_list;
*/
extern struct amd_iommu *amd_iommus[MAX_IOMMUS];
-/*
- * Structure defining one entry in the device table
- */
-struct dev_table_entry {
- union {
- u64 data[4];
- u128 data128[2];
- };
-};
-
/*
* One entry for unity mappings parsed out of the ACPI table.
*/
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a1a0bd398c14..552a13f7668c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -992,6 +992,18 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
iommu_feature_enable(iommu, CONTROL_GT_EN);
}
+static void set_dte_cache_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+ int i = (bit >> 6) & 0x03;
+ int _bit = bit & 0x3f;
+ struct iommu_dev_data *dev_data = find_dev_data(iommu, devid);
+
+ if (!dev_data)
+ return;
+
+ dev_data->dte_cache.data[i] |= (1UL << _bit);
+}
+
/* sets a specific bit in the device table entry. */
static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
u16 devid, u8 bit)
@@ -1159,19 +1171,19 @@ static void __init set_dev_entry_from_acpi(struct amd_iommu *iommu,
u16 devid, u32 flags, u32 ext_flags)
{
if (flags & ACPI_DEVFLAG_INITPASS)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
if (flags & ACPI_DEVFLAG_EXTINT)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
if (flags & ACPI_DEVFLAG_NMI)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
if (flags & ACPI_DEVFLAG_SYSMGT1)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
if (flags & ACPI_DEVFLAG_SYSMGT2)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
if (flags & ACPI_DEVFLAG_LINT0)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
if (flags & ACPI_DEVFLAG_LINT1)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
+ set_dte_cache_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
amd_iommu_apply_erratum_63(iommu, devid);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ab0d3f46871e..28516d89168a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -393,7 +393,7 @@ static void setup_aliases(struct amd_iommu *iommu, struct device *dev)
clone_aliases(iommu, dev);
}
-static struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid)
+struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid)
{
struct iommu_dev_data *dev_data;
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v6 4/9] iommu/amd: Introduce per-device DTE cache to store persistent bits
2024-10-16 5:17 ` [PATCH v6 4/9] iommu/amd: Introduce per-device DTE cache to store persistent bits Suravee Suthikulpanit
@ 2024-10-16 13:21 ` Jason Gunthorpe
0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 13:21 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Wed, Oct 16, 2024 at 05:17:51AM +0000, Suravee Suthikulpanit wrote:
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index f537b264f118..3f53d3bc79cb 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -830,6 +830,16 @@ struct devid_map {
> /* Device may request super-user privileges */
> #define AMD_IOMMU_DEVICE_FLAG_PRIV_SUP 0x10
>
> +/*
> + * Structure defining one entry in the device table
> + */
> +struct dev_table_entry {
> + union {
> + u64 data[4];
> + u128 data128[2];
> + };
> +};
It would be appropriate to put this hunk into the prior patch so your
series does not move code it just added..
> /*
> * This struct contains device specific data for the IOMMU
> */
> @@ -858,6 +868,7 @@ struct iommu_dev_data {
> bool defer_attach;
>
> struct ratelimit_state rs; /* Ratelimit IOPF messages */
> + struct dev_table_entry dte_cache;
> };
I would call this initial_dte or something, it isn't a cache, it is
the recoding of the ACPI data into a DTE format.
/* Stores INIT/EINT,NMIm SYSMGTx,LINTx values from ACPI */
struct dev_tabl_entry initial_dte;
> @@ -1159,19 +1171,19 @@ static void __init set_dev_entry_from_acpi(struct amd_iommu *iommu,
> u16 devid, u32 flags, u32 ext_flags)
> {
> if (flags & ACPI_DEVFLAG_INITPASS)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
> if (flags & ACPI_DEVFLAG_EXTINT)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
> if (flags & ACPI_DEVFLAG_NMI)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
> if (flags & ACPI_DEVFLAG_SYSMGT1)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
> if (flags & ACPI_DEVFLAG_SYSMGT2)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
> if (flags & ACPI_DEVFLAG_LINT0)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
> if (flags & ACPI_DEVFLAG_LINT1)
> - set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
> + set_dte_cache_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
Doesn't this break the driver at this patch? Nothing reads from the
dte_cache at this point in the series so the real DTE never got these
bits?
Maybe just add a temporary line here to copy the entire dte_cache to
the real dte? The next patch fixes it I think?
But I like the idea, I think this is the a much more understandable
direction to go.
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (3 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 4/9] iommu/amd: Introduce per-device DTE cache to store persistent bits Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 13:52 ` Jason Gunthorpe
` (2 more replies)
2024-10-16 5:17 ` [PATCH v6 6/9] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
` (4 subsequent siblings)
9 siblings, 3 replies; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
Also, the set_dte_entry() is used to program several DTE fields (e.g.
stage1 table, stage2 table, domain id, and etc.), which is difficult
to keep track with current implementation.
Therefore, separate logic for clearing DTE (i.e. make_clear_dte) and
another function for setting up the GCR3 Table Root Pointer, GIOV, GV,
GLX, and GuestPagingMode into another function set_dte_gcr3_table().
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 126 ++++++++++++++++++++++----------------
1 file changed, 73 insertions(+), 53 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 28516d89168a..1e61201baf92 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1954,90 +1954,110 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
+static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
+ struct dev_table_entry *new)
+{
+ /* All existing DTE must have V bit set */
+ new->data128[0] = DTE_FLAG_V;
+ new->data128[1] = 0;
+}
+
+/*
+ * Note:
+ * The old value for GCR3 table and GPT have been cleared from caller.
+ */
+static void set_dte_gcr3_table(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *target)
+{
+ struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ u64 tmp, gcr3;
+
+ if (!gcr3_info->gcr3_tbl)
+ return;
+
+ pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
+ __func__, dev_data->devid, gcr3_info->glx,
+ (unsigned long long)gcr3_info->gcr3_tbl);
+
+ tmp = gcr3_info->glx;
+ target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
+ if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+ target->data[0] |= DTE_FLAG_GIOV;
+ target->data[0] |= DTE_FLAG_GV;
+
+
+ gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+
+ /* Encode GCR3 table into DTE */
+ tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
+ target->data[0] |= tmp;
+ tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
+ target->data[1] |= tmp;
+
+ /* Guest page table can only support 4 and 5 levels */
+ if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
+ target->data[2] |= ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
+}
+
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data)
{
- u64 pte_root = 0;
- u64 flags = 0;
- u32 old_domid;
- u16 devid = dev_data->devid;
u16 domid;
+ u32 old_domid;
+ struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
else
domid = domain->id;
+ make_clear_dte(dev_data, dte, &new);
+
if (domain->iop.mode != PAGE_MODE_NONE)
- pte_root = iommu_virt_to_phys(domain->iop.root);
+ new.data[0] = iommu_virt_to_phys(domain->iop.root);
- pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
+ new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
+ new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
/*
* When SNP is enabled, Only set TV bit when IOMMU
* page translation is in use.
*/
if (!amd_iommu_snp_en || (domid != 0))
- pte_root |= DTE_FLAG_TV;
-
- flags = dev_table[devid].data[1];
-
- if (dev_data->ats_enabled)
- flags |= DTE_FLAG_IOTLB;
+ new.data[0] |= DTE_FLAG_TV;
if (dev_data->ppr)
- pte_root |= 1ULL << DEV_ENTRY_PPR;
+ new.data[0] |= 1ULL << DEV_ENTRY_PPR;
if (domain->dirty_tracking)
- pte_root |= DTE_FLAG_HAD;
-
- if (gcr3_info && gcr3_info->gcr3_tbl) {
- u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- u64 glx = gcr3_info->glx;
- u64 tmp;
-
- pte_root |= DTE_FLAG_GV;
- pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
-
- /* First mask out possible old values for GCR3 table */
- tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
- flags &= ~tmp;
+ new.data[0] |= DTE_FLAG_HAD;
- tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
- flags &= ~tmp;
-
- /* Encode GCR3 table into DTE */
- tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
- pte_root |= tmp;
-
- tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
- flags |= tmp;
+ if (dev_data->ats_enabled)
+ new.data[1] |= DTE_FLAG_IOTLB;
+ else
+ new.data[1] &= ~DTE_FLAG_IOTLB;
- tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
- flags |= tmp;
+ old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+ new.data[1] &= ~DEV_DOMID_MASK;
+ new.data[1] |= domid;
- if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
- dev_table[devid].data[2] |=
- ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
- }
-
- /* GIOV is supported with V2 page table mode only */
- if (pdom_is_v2_pgtbl_mode(domain))
- pte_root |= DTE_FLAG_GIOV;
- }
+ /*
+ * Restore cached persistent DTE bits, which can be set by information
+ * in IVRS table. See set_dev_entry_from_acpi().
+ */
+ new.data128[0] |= dev_data->dte_cache.data128[0];
+ new.data128[1] |= dev_data->dte_cache.data128[1];
- flags &= ~DEV_DOMID_MASK;
- flags |= domid;
+ set_dte_gcr3_table(iommu, dev_data, &new);
- old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
- dev_table[devid].data[1] = flags;
- dev_table[devid].data[0] = pte_root;
+ update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-16 5:17 ` [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-10-16 13:52 ` Jason Gunthorpe
2024-10-16 14:07 ` Jason Gunthorpe
2024-10-16 14:12 ` Jason Gunthorpe
2 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 13:52 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Wed, Oct 16, 2024 at 05:17:52AM +0000, Suravee Suthikulpanit wrote:
> +static void set_dte_gcr3_table(struct amd_iommu *iommu,
> + struct iommu_dev_data *dev_data,
> + struct dev_table_entry *target)
> +{
> + struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> + u64 tmp, gcr3;
> +
> + if (!gcr3_info->gcr3_tbl)
> + return;
> +
> + pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
> + __func__, dev_data->devid, gcr3_info->glx,
> + (unsigned long long)gcr3_info->gcr3_tbl);
> +
> + tmp = gcr3_info->glx;
> + target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
> + if (pdom_is_v2_pgtbl_mode(dev_data->domain))
> + target->data[0] |= DTE_FLAG_GIOV;
> + target->data[0] |= DTE_FLAG_GV;
> +
> +
> + gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
> +
> + /* Encode GCR3 table into DTE */
> + tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
> + target->data[0] |= tmp;
> + tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
> + tmp |= DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
> + target->data[1] |= tmp;
> +
> + /* Guest page table can only support 4 and 5 levels */
> + if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
> + target->data[2] |= ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
> +}
This looks OK but suggest to use the new macros and things, it is much
more readable:
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 53e129835b2668..fbae0803bceaa0 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -409,8 +409,7 @@
#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
-#define DTE_GLX_SHIFT (56)
-#define DTE_GLX_MASK (3)
+#define DTE_GLX GENMASK_ULL(57, 56)
#define DTE_FLAG_IR BIT_ULL(61)
#define DTE_FLAG_IW BIT_ULL(62)
@@ -418,15 +417,10 @@
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DEV_DOMID_MASK 0xffffULL
-#define DTE_GCR3_VAL_A(x) (((x) >> 12) & 0x00007ULL)
-#define DTE_GCR3_VAL_B(x) (((x) >> 15) & 0x0ffffULL)
-#define DTE_GCR3_VAL_C(x) (((x) >> 31) & 0x1fffffULL)
+#define DTE_GCR3_14_12 GENMASK_ULL(57, 56)
+#define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
+#define DTE_GCR3_51_31 GENMASK_ULL(63, 43)
-#define DTE_GCR3_SHIFT_A 58
-#define DTE_GCR3_SHIFT_B 16
-#define DTE_GCR3_SHIFT_C 43
-
-#define DTE_GPT_LEVEL_SHIFT 54
#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
#define GCR3_VALID 0x01ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index caea101df7b93d..b0d2174583dbc9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2012,7 +2012,7 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
struct dev_table_entry *target)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
- u64 tmp, gcr3;
+ u64 gcr3;
if (!gcr3_info->gcr3_tbl)
return;
@@ -2021,25 +2021,24 @@ static void set_dte_gcr3_table(struct amd_iommu *iommu,
__func__, dev_data->devid, gcr3_info->glx,
(unsigned long long)gcr3_info->gcr3_tbl);
- tmp = gcr3_info->glx;
- target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
- if (pdom_is_v2_pgtbl_mode(dev_data->domain))
- target->data[0] |= DTE_FLAG_GIOV;
- target->data[0] |= DTE_FLAG_GV;
-
-
gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- /* Encode GCR3 table into DTE */
- tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
- target->data[0] |= tmp;
- tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
- tmp |= DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
- target->data[1] |= tmp;
+ target->data[0] |= DTE_FLAG_GV |
+ FIELD_PREP(DTE_GLX, gcr3_info->glx) |
+ FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
+ if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+ target->data[0] |= DTE_FLAG_GIOV;
+
+ target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
+ FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
/* Guest page table can only support 4 and 5 levels */
if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
- target->data[2] |= ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
+ target->data[2] |=
+ FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
+ else
+ target->data[2] |=
+ FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
}
static void set_dte_entry(struct amd_iommu *iommu,
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-16 5:17 ` [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-10-16 13:52 ` Jason Gunthorpe
@ 2024-10-16 14:07 ` Jason Gunthorpe
2024-10-16 14:12 ` Jason Gunthorpe
2 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 14:07 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Wed, Oct 16, 2024 at 05:17:52AM +0000, Suravee Suthikulpanit wrote:
> + if (dev_data->ats_enabled)
> + new.data[1] |= DTE_FLAG_IOTLB;
> + else
> + new.data[1] &= ~DTE_FLAG_IOTLB;
No need to clear, it is already 0
> - tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
> - flags |= tmp;
> + old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
> + new.data[1] &= ~DEV_DOMID_MASK;
> + new.data[1] |= domid;
Also no need to clear, it is already zero.
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-16 5:17 ` [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-10-16 13:52 ` Jason Gunthorpe
2024-10-16 14:07 ` Jason Gunthorpe
@ 2024-10-16 14:12 ` Jason Gunthorpe
2 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 14:12 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Wed, Oct 16, 2024 at 05:17:52AM +0000, Suravee Suthikulpanit wrote:
> /*
> * When SNP is enabled, Only set TV bit when IOMMU
> * page translation is in use.
> */
> if (!amd_iommu_snp_en || (domid != 0))
> + new.data[0] |= DTE_FLAG_TV;
This one still doesn't seem quite right..
Since the blocking domain path now uses make_clear_dte(), the only
time we'd get here is for IDENTITY,
Except SNP does not support identity:
if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY))
return ERR_PTR(-EINVAL);
So this is impossible code.
/* SNP is not allowed to use identity */
WARN_ON(amd_iommu_snp_en && domid == 0)
??
I guess the original introduction got the rational wrong it should
have been "Use TV=0 instead of TV=1/IR=0/IW=0 for BLOCKED/cleared
domains because SNP does not support TV=1/Mode=0 at all. IDENTITY is
already disabled."
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v6 6/9] iommu/amd: Introduce helper function get_dte256()
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (4 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128() Suravee Suthikulpanit
` (3 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
And use it in clone_alias() along with update_dte256().
Also use get_dte256() in dump_dte_entry().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 59 +++++++++++++++++++++++++++++++--------
1 file changed, 48 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1e61201baf92..c03e2d9d2990 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -185,6 +185,20 @@ static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_da
spin_unlock(&dev_data->dte_lock);
}
+static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *dte)
+{
+ struct dev_table_entry *ptr;
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+ ptr = &dev_table[dev_data->devid];
+
+ spin_lock(&dev_data->dte_lock);
+ dte->data128[0] = READ_ONCE(ptr->data128[0]);
+ dte->data128[1] = READ_ONCE(ptr->data128[1]);
+ spin_unlock(&dev_data->dte_lock);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -333,9 +347,11 @@ static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid
static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
{
+ struct dev_table_entry new;
struct amd_iommu *iommu;
- struct dev_table_entry *dev_table;
+ struct iommu_dev_data *dev_data, *alias_data;
u16 devid = pci_dev_id(pdev);
+ int ret = 0;
if (devid == alias)
return 0;
@@ -344,13 +360,27 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
if (!iommu)
return 0;
- amd_iommu_set_rlookup_table(iommu, alias);
- dev_table = get_dev_table(iommu);
- memcpy(dev_table[alias].data,
- dev_table[devid].data,
- sizeof(dev_table[alias].data));
+ /* Copy the data from pdev */
+ dev_data = dev_iommu_priv_get(&pdev->dev);
+ if (!dev_data) {
+ pr_err("%s : Failed to get dev_data for 0x%x\n", __func__, devid);
+ ret = -EINVAL;
+ goto out;
+ }
+ get_dte256(iommu, dev_data, &new);
- return 0;
+ /* Setup alias */
+ alias_data = find_dev_data(iommu, alias);
+ if (!alias_data) {
+ pr_err("%s : Failed to get alias dev_data for 0x%x\n", __func__, alias);
+ ret = -EINVAL;
+ goto out;
+ }
+ update_dte256(iommu, alias_data, &new);
+
+ amd_iommu_set_rlookup_table(iommu, alias);
+out:
+ return ret;
}
static void clone_aliases(struct amd_iommu *iommu, struct device *dev)
@@ -623,6 +653,12 @@ static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
return -ENOMEM;
dev_data->dev = dev;
+
+ /*
+ * The dev_iommu_priv_set() needes to be called before setup_aliases.
+ * Otherwise, subsequent call to dev_iommu_priv_get() will fail.
+ */
+ dev_iommu_priv_set(dev, dev_data);
setup_aliases(iommu, dev);
/*
@@ -636,8 +672,6 @@ static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
dev_data->flags = pdev_get_caps(to_pci_dev(dev));
}
- dev_iommu_priv_set(dev, dev_data);
-
return 0;
}
@@ -684,10 +718,13 @@ static void amd_iommu_uninit_device(struct device *dev)
static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
{
int i;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry dte;
+ struct iommu_dev_data *dev_data = find_dev_data(iommu, devid);
+
+ get_dte256(iommu, dev_data, &dte);
for (i = 0; i < 4; ++i)
- pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
+ pr_err("DTE[%d]: %016llx\n", i, dte.data[i]);
}
static void dump_command(unsigned long phys_addr)
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128()
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (5 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 6/9] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 13:30 ` Jason Gunthorpe
2024-10-16 5:17 ` [PATCH v6 8/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
` (2 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
To simplify DTE programming, and remove amd_iommu_apply_erratum_63() and
helper functions since no longer used.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 -
drivers/iommu/amd/amd_iommu_types.h | 2 ++
drivers/iommu/amd/init.c | 36 -----------------------------
drivers/iommu/amd/iommu.c | 6 +++--
4 files changed, 6 insertions(+), 39 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 96c3bfc234f8..1467bfc34fdf 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -16,7 +16,6 @@ irqreturn_t amd_iommu_int_thread_evtlog(int irq, void *data);
irqreturn_t amd_iommu_int_thread_pprlog(int irq, void *data);
irqreturn_t amd_iommu_int_thread_galog(int irq, void *data);
irqreturn_t amd_iommu_int_handler(int irq, void *data);
-void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type,
u8 cntrl_intr, u8 cntrl_log,
u32 status_run_mask, u32 status_overflow_mask);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 3f53d3bc79cb..53e129835b26 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -220,6 +220,8 @@
#define DEV_ENTRY_EX 0x67
#define DEV_ENTRY_SYSMGT1 0x68
#define DEV_ENTRY_SYSMGT2 0x69
+#define DTE_DATA1_SYSMGT_MASK GENMASK_ULL(41, 40)
+
#define DEV_ENTRY_IRQ_TBL_EN 0x80
#define DEV_ENTRY_INIT_PASS 0xb8
#define DEV_ENTRY_EINT_PASS 0xb9
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 552a13f7668c..31f10a071abd 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1014,29 +1014,6 @@ static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
dev_table[devid].data[i] |= (1UL << _bit);
}
-static void set_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
-{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- return __set_dev_entry_bit(dev_table, devid, bit);
-}
-
-static int __get_dev_entry_bit(struct dev_table_entry *dev_table,
- u16 devid, u8 bit)
-{
- int i = (bit >> 6) & 0x03;
- int _bit = bit & 0x3f;
-
- return (dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
-}
-
-static int get_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
-{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- return __get_dev_entry_bit(dev_table, devid, bit);
-}
-
static bool __copy_device_table(struct amd_iommu *iommu)
{
u64 int_ctl, int_tab_len, entry = 0;
@@ -1152,17 +1129,6 @@ static bool copy_device_table(void)
return true;
}
-void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
-{
- int sysmgt;
-
- sysmgt = get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1) |
- (get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2) << 1);
-
- if (sysmgt == 0x01)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
-}
-
/*
* This function takes the device specific flags read from the ACPI
* table and sets up the device table entry with that information
@@ -1185,8 +1151,6 @@ static void __init set_dev_entry_from_acpi(struct amd_iommu *iommu,
if (flags & ACPI_DEVFLAG_LINT1)
set_dte_cache_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
- amd_iommu_apply_erratum_63(iommu, devid);
-
amd_iommu_set_rlookup_table(iommu, devid);
}
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c03e2d9d2990..a8c0a57003a8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -105,6 +105,10 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
old.data128[0] = READ_ONCE(ptr->data128[0]);
do {
+ /* Apply erratum 63 */
+ if (FIELD_GET(DTE_DATA1_SYSMGT_MASK, new->data[1]) == 0x1)
+ new->data[0] |= DTE_FLAG_IW;
+
/* Note: try_cmpxchg inherently update &old.data128[0] on failure */
} while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0]));
}
@@ -2117,8 +2121,6 @@ static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
dev_table[devid].data[0] |= DTE_FLAG_TV;
dev_table[devid].data[1] &= DTE_FLAG_MASK;
-
- amd_iommu_apply_erratum_63(iommu, devid);
}
/* Update and flush DTE for the given device */
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128()
2024-10-16 5:17 ` [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128() Suravee Suthikulpanit
@ 2024-10-16 13:30 ` Jason Gunthorpe
2024-10-31 8:53 ` Suthikulpanit, Suravee
2024-10-31 8:53 ` Suthikulpanit, Suravee
0 siblings, 2 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 13:30 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Wed, Oct 16, 2024 at 05:17:54AM +0000, Suravee Suthikulpanit wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index c03e2d9d2990..a8c0a57003a8 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -105,6 +105,10 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
>
> old.data128[0] = READ_ONCE(ptr->data128[0]);
> do {
> + /* Apply erratum 63 */
> + if (FIELD_GET(DTE_DATA1_SYSMGT_MASK, new->data[1]) == 0x1)
> + new->data[0] |= DTE_FLAG_IW;
> +
Why not put it in set_dte_entry() ?
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128()
2024-10-16 13:30 ` Jason Gunthorpe
@ 2024-10-31 8:53 ` Suthikulpanit, Suravee
2024-10-31 8:53 ` Suthikulpanit, Suravee
1 sibling, 0 replies; 21+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-31 8:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 10/16/2024 8:30 PM, Jason Gunthorpe wrote:
> On Wed, Oct 16, 2024 at 05:17:54AM +0000, Suravee Suthikulpanit wrote:
>
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index c03e2d9d2990..a8c0a57003a8 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -105,6 +105,10 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
>>
>> old.data128[0] = READ_ONCE(ptr->data128[0]);
>> do {
>> + /* Apply erratum 63 */
>> + if (FIELD_GET(DTE_DATA1_SYSMGT_MASK, new->data[1]) == 0x1)
>> + new->data[0] |= DTE_FLAG_IW;
>> +
>
> Why not put it in set_dte_entry() ?
>
> Jason
I have reworked this part in v7 to move it to other part. I will send
out v7 for review next.
Suravee
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128()
2024-10-16 13:30 ` Jason Gunthorpe
2024-10-31 8:53 ` Suthikulpanit, Suravee
@ 2024-10-31 8:53 ` Suthikulpanit, Suravee
1 sibling, 0 replies; 21+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-31 8:53 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 10/16/2024 8:30 PM, Jason Gunthorpe wrote:
> On Wed, Oct 16, 2024 at 05:17:54AM +0000, Suravee Suthikulpanit wrote:
>
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index c03e2d9d2990..a8c0a57003a8 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -105,6 +105,10 @@ static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_ent
>>
>> old.data128[0] = READ_ONCE(ptr->data128[0]);
>> do {
>> + /* Apply erratum 63 */
>> + if (FIELD_GET(DTE_DATA1_SYSMGT_MASK, new->data[1]) == 0x1)
>> + new->data[0] |= DTE_FLAG_IW;
>> +
>
> Why not put it in set_dte_entry() ?
Ok. Good point.
Suravee
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v6 8/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (6 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 7/9] iommu/amd: Move erratum 63 logic to write_dte_lower128() Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 5:17 ` [PATCH v6 9/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
2024-10-16 14:22 ` [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Jason Gunthorpe
9 siblings, 0 replies; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
By reusing the make_clear_dte() and update_dte256().
Also, there is no need to set TV bit for non-SNP system when clearing DTE
for blocked domain.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a8c0a57003a8..9ef6ddae3b66 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2110,17 +2110,13 @@ static void set_dte_entry(struct amd_iommu *iommu,
}
}
-static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- /* remove entry from the device table seen by the hardware */
- dev_table[devid].data[0] = DTE_FLAG_V;
-
- if (!amd_iommu_snp_en)
- dev_table[devid].data[0] |= DTE_FLAG_TV;
+ struct dev_table_entry new = {};
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- dev_table[devid].data[1] &= DTE_FLAG_MASK;
+ make_clear_dte(dev_data, dte, &new);
+ update_dte256(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
@@ -2131,7 +2127,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
if (set)
set_dte_entry(iommu, dev_data);
else
- clear_dte_entry(iommu, dev_data->devid);
+ clear_dte_entry(iommu, dev_data);
clone_aliases(iommu, dev_data->dev);
device_flush_dte(dev_data);
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH v6 9/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (7 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 8/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-10-16 5:17 ` Suravee Suthikulpanit
2024-10-16 14:22 ` [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Jason Gunthorpe
9 siblings, 0 replies; 21+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-16 5:17 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
When updating only within a 64-bit tuple of a DTE, just lock the DTE and
use WRITE_ONCE() because it is writing to memory read back by HW.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 43 +++++++++++++++++++----------------
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1467bfc34fdf..23b9e92cc33b 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -178,3 +178,4 @@ struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
#endif
struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid);
+struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9ef6ddae3b66..caea101df7b9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -331,7 +331,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
return dev_data;
}
-static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid)
+struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid)
{
struct iommu_dev_data *dev_data;
struct llist_node *node;
@@ -2787,12 +2787,12 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
struct protection_domain *pdomain = to_pdomain(domain);
- struct dev_table_entry *dev_table;
+ struct dev_table_entry *dte;
struct iommu_dev_data *dev_data;
bool domain_flush = false;
struct amd_iommu *iommu;
unsigned long flags;
- u64 pte_root;
+ u64 new;
spin_lock_irqsave(&pdomain->lock, flags);
if (!(pdomain->dirty_tracking ^ enable)) {
@@ -2801,16 +2801,15 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
}
list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+ spin_lock(&dev_data->dte_lock);
iommu = get_amd_iommu_from_dev_data(dev_data);
-
- dev_table = get_dev_table(iommu);
- pte_root = dev_table[dev_data->devid].data[0];
-
- pte_root = (enable ? pte_root | DTE_FLAG_HAD :
- pte_root & ~DTE_FLAG_HAD);
+ dte = &get_dev_table(iommu)[dev_data->devid];
+ new = READ_ONCE(dte->data[0]);
+ new = (enable ? new | DTE_FLAG_HAD : new & ~DTE_FLAG_HAD);
+ WRITE_ONCE(dte->data[0], new);
+ spin_unlock(&dev_data->dte_lock);
/* Flush device DTE */
- dev_table[dev_data->devid].data[0] = pte_root;
device_flush_dte(dev_data);
domain_flush = true;
}
@@ -3075,17 +3074,23 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
struct irq_remap_table *table)
{
- u64 dte;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ u64 new;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[devid];
+ struct iommu_dev_data *dev_data = search_dev_data(iommu, devid);
+
+ if (dev_data)
+ spin_lock(&dev_data->dte_lock);
- dte = dev_table[devid].data[2];
- dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
- dte |= iommu_virt_to_phys(table->table);
- dte |= DTE_IRQ_REMAP_INTCTL;
- dte |= DTE_INTTABLEN;
- dte |= DTE_IRQ_REMAP_ENABLE;
+ new = READ_ONCE(dte->data[2]);
+ new &= ~DTE_IRQ_PHYS_ADDR_MASK;
+ new |= iommu_virt_to_phys(table->table);
+ new |= DTE_IRQ_REMAP_INTCTL;
+ new |= DTE_INTTABLEN;
+ new |= DTE_IRQ_REMAP_ENABLE;
+ WRITE_ONCE(dte->data[2], new);
- dev_table[devid].data[2] = dte;
+ if (dev_data)
+ spin_unlock(&dev_data->dte_lock);
}
static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 devid)
--
2.34.1
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE
2024-10-16 5:17 [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (8 preceding siblings ...)
2024-10-16 5:17 ` [PATCH v6 9/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
@ 2024-10-16 14:22 ` Jason Gunthorpe
2024-10-31 9:15 ` Suthikulpanit, Suravee
9 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 14:22 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Wed, Oct 16, 2024 at 05:17:47AM +0000, Suravee Suthikulpanit wrote:
> This series modifies current implementation to use 128-bit cmpxchg to
> update DTE when needed as specified in the AMD I/O Virtualization
> Techonology (IOMMU) Specification.
>
> Please note that I have verified with the hardware designer, and they have
> confirmed that the IOMMU hardware has always been implemented with 256-bit
> read. The next revision of the IOMMU spec will be updated to correctly
> describe this part. Therefore, I have updated the implementation to avoid
> unnecessary flushing.
>
> Changes in v6:
>
> * Patch 2, 4, 7: Newly add
>
> * Patch 3, 5, 6, 7, 9: Add READ_ONCE() per Uros.
>
> * Patch 3:
> - Modify write_dte_[higher|lower]128() to avoid copying old DTE in the loop.
>
> * Patch 5:
> - Use dev_data->dte_cache to restore persistent DTE bits in set_dte_entry().
> - Simplify make_clear_dte():
> - Remove bit preservation logic.
> - Remove non-SNP check for setting TV since it should not be needed.
>
> * Patch 6:
> - Use find_dev_data(..., alias) since the dev_data might not have been allocated.
> - Move dev_iommu_priv_set() to before setup_aliases().
I wanted to see how far this was to being split up neatly like ARM is,
I came up with this, which seems pretty good to me. This would
probably be the next step to get to, then you'd lift the individual
set functions higher up the call chain into their respective attach
functions.
static void set_dte_identity(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *target)
{
/*
* SNP does not support TV=1/Mode=1 in any case, and can't do IDENTITY
*/
if (WARN_ON(amd_iommu_snp_en))
return;
/* mode is zero */
target->data[0] |= DTE_FLAG_TV | DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
if (dev_data->ats_enabled)
target->data[1] |= DTE_FLAG_IOTLB;
/* ppr is not allowed for identity */
target->data128[0] |= dev_data->dte_cache.data128[0];
target->data128[1] |= dev_data->dte_cache.data128[1];
}
static void set_dte_gcr3_table(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *target)
{
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
u64 gcr3;
if (!gcr3_info->gcr3_tbl)
return;
pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
__func__, dev_data->devid, gcr3_info->glx,
(unsigned long long)gcr3_info->gcr3_tbl);
gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
target->data[0] |= DTE_FLAG_GV | DTE_FLAG_TV | DTE_FLAG_IR |
DTE_FLAG_IW | DTE_FLAG_V |
FIELD_PREP(DTE_GLX, gcr3_info->glx) |
FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
if (pdom_is_v2_pgtbl_mode(dev_data->domain))
target->data[0] |= DTE_FLAG_GIOV;
target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
/* Guest page table can only support 4 and 5 levels */
target->data[2] |= FIELD_PREP(
DTE_GPT_LEVEL_MASK, (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL ?
GUEST_PGTABLE_5_LEVEL :
GUEST_PGTABLE_4_LEVEL));
target->data[1] |= dev_data->gcr3_info.domid;
if (dev_data->ppr)
target->data[0] |= 1ULL << DEV_ENTRY_PPR;
if (dev_data->ats_enabled)
target->data[1] |= DTE_FLAG_IOTLB;
target->data128[0] |= dev_data->dte_cache.data128[0];
target->data128[1] |= dev_data->dte_cache.data128[1];
}
static void set_dte_paging(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *target)
{
struct protection_domain *domain = dev_data->domain;
target->data[0] |= DTE_FLAG_TV | DTE_FLAG_IR | DTE_FLAG_IW |
iommu_virt_to_phys(domain->iop.root) |
((domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT) |
DTE_FLAG_V;
if (dev_data->ppr)
target->data[0] |= 1ULL << DEV_ENTRY_PPR;
if (domain->dirty_tracking)
target->data[0] |= DTE_FLAG_HAD;
target->data[1] |= domain->id;
if (dev_data->ats_enabled)
target->data[1] |= DTE_FLAG_IOTLB;
target->data128[0] |= dev_data->dte_cache.data128[0];
target->data128[1] |= dev_data->dte_cache.data128[1];
}
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data)
{
u32 old_domid;
struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
make_clear_dte(dev_data, dte, &new);
if (gcr3_info && gcr3_info->gcr3_tbl)
set_dte_gcr3_table(iommu, dev_data, &new);
else if (domain->iop.mode == PAGE_MODE_NONE)
set_dte_identity(iommu, dev_data, &new);
else
set_dte_paging(iommu, dev_data, &new);
old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
* the previous kernel--if so, it needs to flush the translation cache
* entries for the old domain ID that is being overwritten
*/
if (old_domid) {
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
}
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE
2024-10-16 14:22 ` [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Jason Gunthorpe
@ 2024-10-31 9:15 ` Suthikulpanit, Suravee
2024-10-31 11:33 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-31 9:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 10/16/2024 9:22 PM, Jason Gunthorpe wrote:
>
> ....
>
> I wanted to see how far this was to being split up neatly like ARM is,
> I came up with this, which seems pretty good to me. This would
> probably be the next step to get to, then you'd lift the individual
> set functions higher up the call chain into their respective attach
> functions.
I like this idea and will look into adopting this code when I submit the
nested translation stuff (right after this series) since it will affect
the set_dte_entry().
> .....
>
> static void set_dte_entry(struct amd_iommu *iommu,
> struct iommu_dev_data *dev_data)
> {
> u32 old_domid;
> struct dev_table_entry new = {};
> struct protection_domain *domain = dev_data->domain;
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
>
> make_clear_dte(dev_data, dte, &new);
> if (gcr3_info && gcr3_info->gcr3_tbl)
> set_dte_gcr3_table(iommu, dev_data, &new);
> else if (domain->iop.mode == PAGE_MODE_NONE)
> set_dte_identity(iommu, dev_data, &new);
> else
> set_dte_paging(iommu, dev_data, &new);
This will need to be change once we add nested translation support
because we need to call both set_dte_paging() and set_dte_gcr3().
Thanks,
Suravee
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH v6 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE
2024-10-31 9:15 ` Suthikulpanit, Suravee
@ 2024-10-31 11:33 ` Jason Gunthorpe
0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2024-10-31 11:33 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Thu, Oct 31, 2024 at 04:15:02PM +0700, Suthikulpanit, Suravee wrote:
> On 10/16/2024 9:22 PM, Jason Gunthorpe wrote:
> >
> > ....
> >
> > I wanted to see how far this was to being split up neatly like ARM is,
> > I came up with this, which seems pretty good to me. This would
> > probably be the next step to get to, then you'd lift the individual
> > set functions higher up the call chain into their respective attach
> > functions.
>
> I like this idea and will look into adopting this code when I submit the
> nested translation stuff (right after this series) since it will affect the
> set_dte_entry().
Yes, I definitely want to see this kind of code structure before
nested translation.
> > static void set_dte_entry(struct amd_iommu *iommu,
> > struct iommu_dev_data *dev_data)
> > {
> > u32 old_domid;
> > struct dev_table_entry new = {};
> > struct protection_domain *domain = dev_data->domain;
> > struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> > struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
> >
> > make_clear_dte(dev_data, dte, &new);
> > if (gcr3_info && gcr3_info->gcr3_tbl)
> > set_dte_gcr3_table(iommu, dev_data, &new);
> > else if (domain->iop.mode == PAGE_MODE_NONE)
> > set_dte_identity(iommu, dev_data, &new);
> > else
> > set_dte_paging(iommu, dev_data, &new);
>
> This will need to be change once we add nested translation support because
> we need to call both set_dte_paging() and set_dte_gcr3().
The idea would be to remove set_dte_entry() because the attach
functions just call their specific set_dte_xx() directly, like how arm
is structured.
That will make everything much clearer.
Then the nested attach function would call some set_dte_nested() and
it would use set_dte_paging() internally.
Getting to this level is necessary to get the hitless replace, which
is important..
I hope this series gets landed this cycle, next cycle you should try
to get to hitless replace on the domain path, including this stuff,
then adding the nested domain should be straightforward!
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread