* [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE
@ 2024-10-07 4:13 Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
This series modifies current implementation to use 128-bit cmpxchg to
update DTE when needed as specified in the AMD I/O Virtualization
Techonology (IOMMU) Specification.
Please note that I have verified with the hardware designer, and they have
confirmed that the IOMMU hardware has always been implemented with 256-bit
read. The next revision of the IOMMU spec will be updated to correctly
describe this part. Therefore, I have updated the implementation to avoid
unnecessary flushing.
Changes in v5:
* Rebased on top of v6.12-rcX
* Patch 2: fixup logic in update_dte256()
* Patch 3: Introduce make_clear_dte()
* Patch 5: use make_clear_dte()
v4: https://lore.kernel.org/lkml/20240916171805.324292-1-suravee.suthikulpanit@amd.com/
v3: https://lore.kernel.org/lkml/20240906121308.5013-1-suravee.suthikulpanit@amd.com/
v2: https://lore.kernel.org/lkml/20240829180726.5022-1-suravee.suthikulpanit@amd.com/
v1: https://lore.kernel.org/lkml/20240819161839.4657-1-suravee.suthikulpanit@amd.com/
Thanks,
Suravee
Suravee Suthikulpanit (6):
iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
iommu/amd: Introduce helper function to update 256-bit DTE
iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
iommu/amd: Introduce helper function get_dte256()
iommu/amd: Modify clear_dte_entry() to avoid in-place update
iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
drivers/iommu/amd/amd_iommu_types.h | 15 +-
drivers/iommu/amd/init.c | 23 +-
drivers/iommu/amd/iommu.c | 355 ++++++++++++++++++++--------
3 files changed, 285 insertions(+), 108 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v5 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
@ 2024-10-07 4:13 ` Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
` (4 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
According to the AMD IOMMU spec, IOMMU hardware reads the entire DTE
in a single 256-bit transaction. It is recommended to update DTE using
128-bit operation followed by an INVALIDATE_DEVTAB_ENTYRY command when
the IV=1b or V=1b before the change.
According to the AMD BIOS and Kernel Developer's Guide (BDKG) dated back
to family 10h Processor [1], which is the first introduction of AMD IOMMU,
AMD processor always has CPUID Fn0000_0001_ECX[CMPXCHG16B]=1.
Therefore, it is safe to assume cmpxchg128 is available with all AMD
processor w/ IOMMU.
In addition, the CMPXCHG16B feature has already been checked separately
before enabling the GA, XT, and GAM modes. Consolidate the detection logic,
and fail the IOMMU initialization if the feature is not supported.
[1] https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/init.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 43131c3a2172..a1a0bd398c14 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1764,13 +1764,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling it.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+ /* GAM requires GA mode. */
+ if ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1780,13 +1775,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * XT, GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling them.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
+ /* XT and GAM require GA mode. */
+ if ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0) {
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
}
@@ -3051,6 +3041,11 @@ static int __init early_amd_iommu_init(void)
return -EINVAL;
}
+ if (!boot_cpu_has(X86_FEATURE_CX16)) {
+ pr_err("Failed to initialize. The CMPXCHG16B feature is required.\n");
+ return -EINVAL;
+ }
+
/*
* Validate checksum here so we don't need to do it when
* we actually parse the table
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
@ 2024-10-07 4:13 ` Suravee Suthikulpanit
2024-10-07 14:06 ` Jason Gunthorpe
2024-10-07 14:42 ` Uros Bizjak
2024-10-07 4:13 ` [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
` (3 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
The current implementation does not follow 128-bit write requirement
to update DTE as specified in the AMD I/O Virtualization Techonology
(IOMMU) Specification.
Therefore, modify the struct dev_table_entry to contain union of u128 data
array, and introduce a helper functions update_dte256() to update DTE using
two 128-bit cmpxchg operations to update 256-bit DTE with the modified
structure, and take into account the DTE[V, GV] bits when programming
the DTE to ensure proper order of DTE programming and flushing.
In addition, introduce a per-DTE spin_lock struct dev_data.dte_lock to
provide synchronization when updating the DTE to prevent cmpxchg128
failure.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 13 +++-
drivers/iommu/amd/iommu.c | 114 ++++++++++++++++++++++++++++
2 files changed, 126 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 601fb4ee6900..91f802be7898 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -425,9 +425,16 @@
#define DTE_GCR3_SHIFT_C 43
#define DTE_GPT_LEVEL_SHIFT 54
+#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
#define GCR3_VALID 0x01ULL
+/* DTE[128:179] | DTE[184:191] */
+#define DTE_DATA2_INTR_MASK ~GENMASK_ULL(55, 52)
+
+/* DTE[180:181] */
+#define DTE_DATA2_RESV_MASK GENMASK_ULL(53, 52)
+
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
@@ -832,6 +839,7 @@ struct devid_map {
struct iommu_dev_data {
/*Protect against attach/detach races */
spinlock_t lock;
+ spinlock_t dte_lock; /* DTE lock for 256-bit access */
struct list_head list; /* For domain->dev_list */
struct llist_node dev_data_list; /* For global dev_data_list */
@@ -882,7 +890,10 @@ extern struct amd_iommu *amd_iommus[MAX_IOMMUS];
* Structure defining one entry in the device table
*/
struct dev_table_entry {
- u64 data[4];
+ union {
+ u64 data[4];
+ u128 data128[2];
+ };
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 8364cd6fa47d..deb19af48a3e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -77,12 +77,116 @@ static void detach_device(struct device *dev);
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data);
+static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid);
+
/****************************************************************************
*
* Helper functions
*
****************************************************************************/
+static void write_dte_upper128(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old = {};
+
+ do {
+ old.data128[1] = ptr->data128[1];
+ new->data[2] &= ~DTE_DATA2_INTR_MASK;
+ new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
+ } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1], new->data128[1]));
+}
+
+static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old = {};
+
+ /*
+ * Need to preserve DTE[96:106], which can be set by information in IVRS table.
+ * See set_dev_entry_from_acpi().
+ */
+ new->data[1] |= ptr->data[1] & DTE_FLAG_MASK;
+
+ do {
+ old.data128[0] = ptr->data128[0];
+ } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0]));
+}
+
+/*
+ * Note:
+ * IOMMU reads the entire Device Table entry in a single 256-bit transaction
+ * but the driver is programming DTE using 2 128-bit cmpxchg. So, the driver
+ * need to ensure the following:
+ * - DTE[V|GV] bit is being written last when setting.
+ * - DTE[V|GV] bit is being written first when clearing.
+ *
+ * This function is used only by code, which updates DMA translation part of the DTE.
+ * So, only consider control bits related to DMA when updating the entry.
+ */
+static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
+{
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry *ptr = &dev_table[dev_data->devid];
+
+ spin_lock(&dev_data->dte_lock);
+
+ if (!(ptr->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is not valid. */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!(new->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is valid. New DTE is not valid. */
+ write_dte_lower128(ptr, new);
+ write_dte_upper128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
+ /*
+ * Both DTEs are valid.
+ * Existing DTE has no guest page table.
+ */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
+ /*
+ * Both DTEs are valid.
+ * Existing DTE has guest page table,
+ * new DTE has no guest page table,
+ */
+ write_dte_lower128(ptr, new);
+ write_dte_upper128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (FIELD_GET(DTE_GPT_LEVEL_MASK, ptr->data[2]) !=
+ FIELD_GET(DTE_GPT_LEVEL_MASK, new->data[2])) {
+ /*
+ * Both DTEs are valid and have guest page table,
+ * but have different number of levels. So, we need
+ * to upadte both upper and lower 128-bit value, which
+ * require disabling and flushing.
+ */
+ struct dev_table_entry clear = {};
+
+ /* First disable DTE */
+ write_dte_lower128(ptr, &clear);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+
+ /* Then update DTE */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else {
+ /*
+ * Both DTEs are valid and have guest page table,
+ * and same number of levels. We just need to only
+ * update the lower 128-bit. So no need to disable DTE.
+ */
+ write_dte_lower128(ptr, new);
+ }
+
+ spin_unlock(&dev_data->dte_lock);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -203,6 +307,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
return NULL;
spin_lock_init(&dev_data->lock);
+ spin_lock_init(&dev_data->dte_lock);
dev_data->devid = devid;
ratelimit_default_init(&dev_data->rs);
@@ -1272,6 +1377,15 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
return iommu_queue_command(iommu, &cmd);
}
+static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
+{
+ int ret;
+
+ ret = iommu_flush_dte(iommu, devid);
+ if (!ret)
+ iommu_completion_wait(iommu);
+}
+
static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
{
u32 devid;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-10-07 4:13 ` Suravee Suthikulpanit
2024-10-07 14:05 ` Jason Gunthorpe
2024-10-07 14:17 ` Jason Gunthorpe
2024-10-07 4:13 ` [PATCH v5 4/6] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
` (2 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
Also, the set_dte_entry() is used to program several DTE fields (e.g.
stage1 table, stage2 table, domain id, and etc.), which is difficult
to keep track with current implementation.
Therefore, separate logic for clearing DTE (i.e. make_clear_dte) and
another function for setting up the GCR3 Table Root Pointer, GIOV, GV,
GLX, and GuestPagingMode into another function set_dte_gcr3_table().
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +
drivers/iommu/amd/iommu.c | 132 ++++++++++++++++------------
2 files changed, 80 insertions(+), 54 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 91f802be7898..74aca19725cc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -427,6 +427,8 @@
#define DTE_GPT_LEVEL_SHIFT 54
#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
+#define DTE_SYSMGT_MASK GENMASK_ULL(41, 40)
+
#define GCR3_VALID 0x01ULL
/* DTE[128:179] | DTE[184:191] */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index deb19af48a3e..6fa2f5bb5156 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1956,90 +1956,114 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
+static void make_clear_dte(struct amd_iommu *iommu, struct dev_table_entry *dte,
+ struct dev_table_entry *new)
+{
+ new->data[0] = DTE_FLAG_V;
+
+ /* Apply erratum 63 */
+ if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
+ new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
+
+ if (!amd_iommu_snp_en)
+ new->data[0] |= DTE_FLAG_TV;
+
+ /* Need to preserve interrupt remapping information in DTE[128:255] */
+ new->data128[1] = dte->data128[1];
+
+ /* Mask out old values for GuestPagingMode */
+ new->data[2] &= ~(0x3ULL << DTE_GPT_LEVEL_SHIFT);
+}
+
+/*
+ * Note:
+ * The old value for GCR3 table and GPT have been cleared from caller.
+ */
+static void set_dte_gcr3_table(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *target)
+{
+ struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ u64 tmp, gcr3;
+
+ if (!gcr3_info->gcr3_tbl)
+ return;
+
+ pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
+ __func__, dev_data->devid, gcr3_info->glx,
+ (unsigned long long)gcr3_info->gcr3_tbl);
+
+ tmp = gcr3_info->glx;
+ target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
+ if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+ target->data[0] |= DTE_FLAG_GIOV;
+ target->data[0] |= DTE_FLAG_GV;
+
+
+ gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+
+ /* Encode GCR3 table into DTE */
+ tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
+ target->data[0] |= tmp;
+ tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
+ target->data[1] |= tmp;
+
+ /* Guest page table can only support 4 and 5 levels */
+ if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
+ target->data[2] |= ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
+}
+
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data)
{
- u64 pte_root = 0;
- u64 flags = 0;
- u32 old_domid;
- u16 devid = dev_data->devid;
u16 domid;
+ u32 old_domid;
+ struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
else
domid = domain->id;
+ make_clear_dte(iommu, dte, &new);
+
if (domain->iop.mode != PAGE_MODE_NONE)
- pte_root = iommu_virt_to_phys(domain->iop.root);
+ new.data[0] = iommu_virt_to_phys(domain->iop.root);
- pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
+ new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
+ new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
/*
* When SNP is enabled, Only set TV bit when IOMMU
* page translation is in use.
*/
if (!amd_iommu_snp_en || (domid != 0))
- pte_root |= DTE_FLAG_TV;
-
- flags = dev_table[devid].data[1];
-
- if (dev_data->ats_enabled)
- flags |= DTE_FLAG_IOTLB;
+ new.data[0] |= DTE_FLAG_TV;
if (dev_data->ppr)
- pte_root |= 1ULL << DEV_ENTRY_PPR;
+ new.data[0] |= 1ULL << DEV_ENTRY_PPR;
if (domain->dirty_tracking)
- pte_root |= DTE_FLAG_HAD;
-
- if (gcr3_info && gcr3_info->gcr3_tbl) {
- u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- u64 glx = gcr3_info->glx;
- u64 tmp;
+ new.data[0] |= DTE_FLAG_HAD;
- pte_root |= DTE_FLAG_GV;
- pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
-
- /* First mask out possible old values for GCR3 table */
- tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
- flags &= ~tmp;
-
- tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
- flags &= ~tmp;
-
- /* Encode GCR3 table into DTE */
- tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
- pte_root |= tmp;
-
- tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
- flags |= tmp;
-
- tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
- flags |= tmp;
-
- if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
- dev_table[devid].data[2] |=
- ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
- }
+ if (dev_data->ats_enabled)
+ new.data[1] |= DTE_FLAG_IOTLB;
+ else
+ new.data[1] &= ~DTE_FLAG_IOTLB;
- /* GIOV is supported with V2 page table mode only */
- if (pdom_is_v2_pgtbl_mode(domain))
- pte_root |= DTE_FLAG_GIOV;
- }
+ old_domid = dte->data[1] & DEV_DOMID_MASK;
+ new.data[1] &= ~DEV_DOMID_MASK;
+ new.data[1] |= domid;
- flags &= ~DEV_DOMID_MASK;
- flags |= domid;
+ set_dte_gcr3_table(iommu, dev_data, &new);
- old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
- dev_table[devid].data[1] = flags;
- dev_table[devid].data[0] = pte_root;
+ update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 4/6] iommu/amd: Introduce helper function get_dte256()
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (2 preceding siblings ...)
2024-10-07 4:13 ` [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-10-07 4:13 ` Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
5 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
And use it in clone_alias() along with update_dte256().
Also use get_dte256() in dump_dte_entry().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 49 ++++++++++++++++++++++++++++++++-------
1 file changed, 40 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6fa2f5bb5156..87a7c9dd86d9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -187,6 +187,20 @@ static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_da
spin_unlock(&dev_data->dte_lock);
}
+static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *dte)
+{
+ struct dev_table_entry *ptr;
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+ ptr = &dev_table[dev_data->devid];
+
+ spin_lock(&dev_data->dte_lock);
+ dte->data128[0] = ptr->data128[0];
+ dte->data128[1] = ptr->data128[1];
+ spin_unlock(&dev_data->dte_lock);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -335,9 +349,11 @@ static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid
static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
{
+ struct dev_table_entry new;
struct amd_iommu *iommu;
- struct dev_table_entry *dev_table;
+ struct iommu_dev_data *dev_data, *alias_data;
u16 devid = pci_dev_id(pdev);
+ int ret = 0;
if (devid == alias)
return 0;
@@ -346,13 +362,25 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
if (!iommu)
return 0;
- amd_iommu_set_rlookup_table(iommu, alias);
- dev_table = get_dev_table(iommu);
- memcpy(dev_table[alias].data,
- dev_table[devid].data,
- sizeof(dev_table[alias].data));
+ /* Copy the data from pdev */
+ dev_data = dev_iommu_priv_get(&pdev->dev);
+ if (!dev_data) {
+ ret = -EINVAL;
+ goto out;
+ }
+ get_dte256(iommu, dev_data, &new);
- return 0;
+ /* Setup alias */
+ alias_data = search_dev_data(iommu, alias);
+ if (!alias_data) {
+ ret = -EINVAL;
+ goto out;
+ }
+ update_dte256(iommu, alias_data, &new);
+
+ amd_iommu_set_rlookup_table(iommu, alias);
+out:
+ return ret;
}
static void clone_aliases(struct amd_iommu *iommu, struct device *dev)
@@ -686,10 +714,13 @@ static void amd_iommu_uninit_device(struct device *dev)
static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
{
int i;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry dte;
+ struct iommu_dev_data *dev_data = find_dev_data(iommu, devid);
+
+ get_dte256(iommu, dev_data, &dte);
for (i = 0; i < 4; ++i)
- pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
+ pr_err("DTE[%d]: %016llx\n", i, dte.data[i]);
}
static void dump_command(unsigned long phys_addr)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (3 preceding siblings ...)
2024-10-07 4:13 ` [PATCH v5 4/6] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
@ 2024-10-07 4:13 ` Suravee Suthikulpanit
2024-10-07 14:10 ` Jason Gunthorpe
2024-10-07 4:13 ` [PATCH v5 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
5 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
Lock DTE and copy value to a temporary storage before update using
cmpxchg128.
Also, refactor the function to simplify logic for applying erratum 63.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 87a7c9dd86d9..dd7b02b00be6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2106,19 +2106,13 @@ static void set_dte_entry(struct amd_iommu *iommu,
}
}
-static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- /* remove entry from the device table seen by the hardware */
- dev_table[devid].data[0] = DTE_FLAG_V;
-
- if (!amd_iommu_snp_en)
- dev_table[devid].data[0] |= DTE_FLAG_TV;
-
- dev_table[devid].data[1] &= DTE_FLAG_MASK;
+ struct dev_table_entry new = {};
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- amd_iommu_apply_erratum_63(iommu, devid);
+ make_clear_dte(iommu, dte, &new);
+ update_dte256(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
@@ -2129,7 +2123,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
if (set)
set_dte_entry(iommu, dev_data);
else
- clear_dte_entry(iommu, dev_data->devid);
+ clear_dte_entry(iommu, dev_data);
clone_aliases(iommu, dev_data->dev);
device_flush_dte(dev_data);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v5 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (4 preceding siblings ...)
2024-10-07 4:13 ` [PATCH v5 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-10-07 4:13 ` Suravee Suthikulpanit
5 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-10-07 4:13 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
When updating only within a 64-bit tuple of a DTE, just lock the DTE and
use WRITE_ONCE() because it is writing to memory read back by HW.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 42 ++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index dd7b02b00be6..18a60b4345f1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2783,12 +2783,12 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
struct protection_domain *pdomain = to_pdomain(domain);
- struct dev_table_entry *dev_table;
+ struct dev_table_entry *dte;
struct iommu_dev_data *dev_data;
bool domain_flush = false;
struct amd_iommu *iommu;
unsigned long flags;
- u64 pte_root;
+ u64 new;
spin_lock_irqsave(&pdomain->lock, flags);
if (!(pdomain->dirty_tracking ^ enable)) {
@@ -2797,16 +2797,15 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
}
list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+ spin_lock(&dev_data->dte_lock);
iommu = get_amd_iommu_from_dev_data(dev_data);
-
- dev_table = get_dev_table(iommu);
- pte_root = dev_table[dev_data->devid].data[0];
-
- pte_root = (enable ? pte_root | DTE_FLAG_HAD :
- pte_root & ~DTE_FLAG_HAD);
+ dte = &get_dev_table(iommu)[dev_data->devid];
+ new = dte->data[0];
+ new = (enable ? new | DTE_FLAG_HAD : new & ~DTE_FLAG_HAD);
+ WRITE_ONCE(dte->data[0], new);
+ spin_unlock(&dev_data->dte_lock);
/* Flush device DTE */
- dev_table[dev_data->devid].data[0] = pte_root;
device_flush_dte(dev_data);
domain_flush = true;
}
@@ -3071,17 +3070,24 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
struct irq_remap_table *table)
{
- u64 dte;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ u64 new;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[devid];
+ struct iommu_dev_data *dev_data = search_dev_data(iommu, devid);
+
+ if (dev_data)
+ spin_lock(&dev_data->dte_lock);
+
+ new = dte->data[2];
+ new &= ~DTE_IRQ_PHYS_ADDR_MASK;
+ new |= iommu_virt_to_phys(table->table);
+ new |= DTE_IRQ_REMAP_INTCTL;
+ new |= DTE_INTTABLEN;
+ new |= DTE_IRQ_REMAP_ENABLE;
- dte = dev_table[devid].data[2];
- dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
- dte |= iommu_virt_to_phys(table->table);
- dte |= DTE_IRQ_REMAP_INTCTL;
- dte |= DTE_INTTABLEN;
- dte |= DTE_IRQ_REMAP_ENABLE;
+ WRITE_ONCE(dte->data[2], new);
- dev_table[devid].data[2] = dte;
+ if (dev_data)
+ spin_unlock(&dev_data->dte_lock);
}
static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 devid)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-07 4:13 ` [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-10-07 14:05 ` Jason Gunthorpe
2024-10-16 5:12 ` Suthikulpanit, Suravee
2024-10-07 14:17 ` Jason Gunthorpe
1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 14:05 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Oct 07, 2024 at 04:13:50AM +0000, Suravee Suthikulpanit wrote:
> +static void make_clear_dte(struct amd_iommu *iommu, struct dev_table_entry *dte,
> + struct dev_table_entry *new)
> +{
> + new->data[0] = DTE_FLAG_V;
> +
> + /* Apply erratum 63 */
> + if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
> + new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
> +
> + if (!amd_iommu_snp_en)
> + new->data[0] |= DTE_FLAG_TV;
It would be nice to have a comment here..
clear_dte() must create a blocking configuration as several callers
depend on that.
Why is blocking with TV=1,Mode=0,IW=0,IR=0 used sometimes but
sometimes TV=0 is used instead?
> + /* Need to preserve interrupt remapping information in DTE[128:255] */
> + new->data128[1] = dte->data128[1];
It doesn't need to preserve.. write_dte_upper128() does the
preservation automatically under the right lock. Any bits in
DTE_DATA2_INTR_MASK should be 0 for the input DTE because they will be
ignored by the masking:
+ new->data[2] &= ~DTE_DATA2_INTR_MASK;
+ new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
Also this shouldn't preserve the top Guest related 64 bit for a 'clear
dte' either.
So, I think this can just be
new->data128[1] = 0;
?
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-07 4:13 ` [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-10-07 14:06 ` Jason Gunthorpe
2024-10-11 5:25 ` Suthikulpanit, Suravee
2024-10-07 14:42 ` Uros Bizjak
1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 14:06 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Oct 07, 2024 at 04:13:49AM +0000, Suravee Suthikulpanit wrote:
> +static void write_dte_upper128(struct dev_table_entry *ptr, struct dev_table_entry *new)
> +{
> + struct dev_table_entry old = {};
> +
> + do {
> + old.data128[1] = ptr->data128[1];
> + new->data[2] &= ~DTE_DATA2_INTR_MASK;
> + new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
Why preserve the reserved bits? Shouldn't they be reserved by forced
to 0? Should have a comment explaining this
> +static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
> +{
You might consider re-ordering to avoid the function forward
declaration.
Looks fine otherwise
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-10-07 4:13 ` [PATCH v5 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-10-07 14:10 ` Jason Gunthorpe
0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 14:10 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Oct 07, 2024 at 04:13:52AM +0000, Suravee Suthikulpanit wrote:
> Lock DTE and copy value to a temporary storage before update using
> cmpxchg128.
>
> Also, refactor the function to simplify logic for applying erratum 63.
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-07 4:13 ` [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-10-07 14:05 ` Jason Gunthorpe
@ 2024-10-07 14:17 ` Jason Gunthorpe
2024-10-16 5:15 ` Suthikulpanit, Suravee
1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 14:17 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Oct 07, 2024 at 04:13:50AM +0000, Suravee Suthikulpanit wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index deb19af48a3e..6fa2f5bb5156 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1956,90 +1956,114 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
> return ret;
> }
>
> +static void make_clear_dte(struct amd_iommu *iommu, struct dev_table_entry *dte,
> + struct dev_table_entry *new)
> +{
> + new->data[0] = DTE_FLAG_V;
> +
> + /* Apply erratum 63 */
> + if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
> + new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
Doesn't this need to be
/* Preserve set_dev_entry_from_acpi(), including erratum 64 */
new->data[1] |= dte->data[1] & DTE_SYSMGT_MASK;
if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
And this has a significant security issue, we can't set IW here
because clear_dte must generate a blocked DTE, so TV=1,Mode=0,IW=1 is
not an OK setting!!
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-07 4:13 ` [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
2024-10-07 14:06 ` Jason Gunthorpe
@ 2024-10-07 14:42 ` Uros Bizjak
2024-10-11 10:22 ` Suthikulpanit, Suravee
1 sibling, 1 reply; 17+ messages in thread
From: Uros Bizjak @ 2024-10-07 14:42 UTC (permalink / raw)
To: Suravee Suthikulpanit, linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand
[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]
On 7. 10. 24 06:13, Suravee Suthikulpanit wrote:
> +
> /****************************************************************************
> *
> * Helper functions
> *
> ****************************************************************************/
>
> +static void write_dte_upper128(struct dev_table_entry *ptr, struct dev_table_entry *new)
> +{
> + struct dev_table_entry old = {};
> +
> + do {
> + old.data128[1] = ptr->data128[1];
> + new->data[2] &= ~DTE_DATA2_INTR_MASK;
> + new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
> + } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1], new->data128[1]));
Please note that try_cmpxchg inherently updates &old.data128[1] above on
failure. There is no need to update value again in the loop.
Please also note that the value from ptr->data128[1] should be read
using READ_ONCE() to prevent compiler from merging, refetching or
reordering the read. Currently, there is no READ_ONCE() implemented for
__int128, so something like the attached patch should be used.
Based on the above, the loop should be rewritten as:
old.data128[1] = READ_ONCE(ptr->data128[1]);
do {
new->data[2] &= ~DTE_DATA2_INTR_MASK;
new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
} while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1],
new->data128[1]));
> +}
> +
> +static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_entry *new)
> +{
> + struct dev_table_entry old = {};
> +
> + /*
> + * Need to preserve DTE[96:106], which can be set by information in IVRS table.
> + * See set_dev_entry_from_acpi().
> + */
> + new->data[1] |= ptr->data[1] & DTE_FLAG_MASK;
> +
> + do {
> + old.data128[0] = ptr->data128[0];
> + } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0]));
And this one as:
old.data128[0] = READ_ONCE(ptr->data128[0]);
do {
} while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0],
new->data128[0]));
Best regards,
Uros.
[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1411 bytes --]
diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h
index 8d0a6280e982..8bf942ad5ef3 100644
--- a/include/asm-generic/rwonce.h
+++ b/include/asm-generic/rwonce.h
@@ -33,7 +33,7 @@
* (e.g. a virtual address) and a strong prevailing wind.
*/
#define compiletime_assert_rwonce_type(t) \
- compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
+ compiletime_assert(__native_word(t) || sizeof(t) == sizeof(__dword_type), \
"Unsupported access size for {READ,WRITE}_ONCE().")
/*
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 94b8fedfb077..8615e91f48fd 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -469,6 +469,12 @@ struct ftrace_likely_data {
unsigned type: (unsigned type)0, \
signed type: (signed type)0
+#ifdef __SIZEOF_INT128__
+#define __dword_type __int128
+#else
+#define __dword_type long long
+#endif
+
#define __unqual_scalar_typeof(x) typeof( \
_Generic((x), \
char: (char)0, \
@@ -476,7 +482,7 @@ struct ftrace_likely_data {
__scalar_type_to_expr_cases(short), \
__scalar_type_to_expr_cases(int), \
__scalar_type_to_expr_cases(long), \
- __scalar_type_to_expr_cases(long long), \
+ __scalar_type_to_expr_cases(__dword_type), \
default: (x)))
/* Is this type a native word size -- useful for atomic operations */
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-07 14:06 ` Jason Gunthorpe
@ 2024-10-11 5:25 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-11 5:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 10/7/2024 9:06 PM, Jason Gunthorpe wrote:
> On Mon, Oct 07, 2024 at 04:13:49AM +0000, Suravee Suthikulpanit wrote:
>> +static void write_dte_upper128(struct dev_table_entry *ptr, struct dev_table_entry *new)
>> +{
>> + struct dev_table_entry old = {};
>> +
>> + do {
>> + old.data128[1] = ptr->data128[1];
>> + new->data[2] &= ~DTE_DATA2_INTR_MASK;
>> + new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
>
> Why preserve the reserved bits? Shouldn't they be reserved by forced
> to 0? Should have a comment explaining this
You are correct.
>> +static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
>> +{
>
> You might consider re-ordering to avoid the function forward
> declaration.
This will require moving a lot of other functions as well. We will
consider this in overall clean up later.
Thanks,
Suravee
> Looks fine otherwise
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-07 14:42 ` Uros Bizjak
@ 2024-10-11 10:22 ` Suthikulpanit, Suravee
2024-10-11 11:05 ` Uros Bizjak
0 siblings, 1 reply; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-11 10:22 UTC (permalink / raw)
To: Uros Bizjak, linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand
On 10/7/2024 9:42 PM, Uros Bizjak wrote:
>
>
> On 7. 10. 24 06:13, Suravee Suthikulpanit wrote:
>
>> +
>>
>> /****************************************************************************
>> *
>> * Helper functions
>> *
>>
>> ****************************************************************************/
>> +static void write_dte_upper128(struct dev_table_entry *ptr, struct
>> dev_table_entry *new)
>> +{
>> + struct dev_table_entry old = {};
>> +
>> + do {
>> + old.data128[1] = ptr->data128[1];
>> + new->data[2] &= ~DTE_DATA2_INTR_MASK;
>> + new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK |
>> DTE_DATA2_RESV_MASK);
>> + } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1],
>> new->data128[1]));
>
> Please note that try_cmpxchg inherently updates &old.data128[1] above on
> failure. There is no need to update value again in the loop.
>
> Please also note that the value from ptr->data128[1] should be read
> using READ_ONCE() to prevent compiler from merging, refetching or
> reordering the read. Currently, there is no READ_ONCE() implemented for
> __int128, so something like the attached patch should be used.
Thanks for pointing this out. I will introduce the attached patch
separately in this series on your behalf as author/sign-off, and review
the current code to properly use the READ_ONCE().
Thanks,
Suravee
> Based on the above, the loop should be rewritten as:
>
> old.data128[1] = READ_ONCE(ptr->data128[1]);
> do {
> new->data[2] &= ~DTE_DATA2_INTR_MASK;
> new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK |
> DTE_DATA2_RESV_MASK);
> } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1],
> new->data128[1]));
>
>> +}
>> +
>> +static void write_dte_lower128(struct dev_table_entry *ptr, struct
>> dev_table_entry *new)
>> +{
>> + struct dev_table_entry old = {};
>> +
>> + /*
>> + * Need to preserve DTE[96:106], which can be set by information
>> in IVRS table.
>> + * See set_dev_entry_from_acpi().
>> + */
>> + new->data[1] |= ptr->data[1] & DTE_FLAG_MASK;
>> +
>> + do {
>> + old.data128[0] = ptr->data128[0];
>> + } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0],
>> new->data128[0]));
>
> And this one as:
>
> old.data128[0] = READ_ONCE(ptr->data128[0]);
> do {
> } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0],
> new->data128[0]));
>
> Best regards,
> Uros.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-11 10:22 ` Suthikulpanit, Suravee
@ 2024-10-11 11:05 ` Uros Bizjak
0 siblings, 0 replies; 17+ messages in thread
From: Uros Bizjak @ 2024-10-11 11:05 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand
On Fri, Oct 11, 2024 at 12:22 PM Suthikulpanit, Suravee
<suravee.suthikulpanit@amd.com> wrote:
>
> On 10/7/2024 9:42 PM, Uros Bizjak wrote:
> >
> >
> > On 7. 10. 24 06:13, Suravee Suthikulpanit wrote:
> >
> >> +
> >>
> >> /****************************************************************************
> >> *
> >> * Helper functions
> >> *
> >>
> >> ****************************************************************************/
> >> +static void write_dte_upper128(struct dev_table_entry *ptr, struct
> >> dev_table_entry *new)
> >> +{
> >> + struct dev_table_entry old = {};
> >> +
> >> + do {
> >> + old.data128[1] = ptr->data128[1];
> >> + new->data[2] &= ~DTE_DATA2_INTR_MASK;
> >> + new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK |
> >> DTE_DATA2_RESV_MASK);
> >> + } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1],
> >> new->data128[1]));
> >
> > Please note that try_cmpxchg inherently updates &old.data128[1] above on
> > failure. There is no need to update value again in the loop.
> >
> > Please also note that the value from ptr->data128[1] should be read
> > using READ_ONCE() to prevent compiler from merging, refetching or
> > reordering the read. Currently, there is no READ_ONCE() implemented for
> > __int128, so something like the attached patch should be used.
>
> Thanks for pointing this out. I will introduce the attached patch
> separately in this series on your behalf as author/sign-off, and review
> the current code to properly use the READ_ONCE().
FTR, for the mentioned patch:
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
for Co-authored-by: tag.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-07 14:05 ` Jason Gunthorpe
@ 2024-10-16 5:12 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-16 5:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 10/7/2024 9:05 PM, Jason Gunthorpe wrote:
> On Mon, Oct 07, 2024 at 04:13:50AM +0000, Suravee Suthikulpanit wrote:
>> +static void make_clear_dte(struct amd_iommu *iommu, struct dev_table_entry *dte,
>> + struct dev_table_entry *new)
>> +{
>> + new->data[0] = DTE_FLAG_V;
>> +
>> + /* Apply erratum 63 */
>> + if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
>> + new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
>> +
>> + if (!amd_iommu_snp_en)
>> + new->data[0] |= DTE_FLAG_TV;
>
> It would be nice to have a comment here..
I am moving this check. See description below...
> clear_dte() must create a blocking configuration as several callers
> depend on that.
Right, I missed that. I'll rework this function.
> Why is blocking with TV=1,Mode=0,IW=0,IR=0 used sometimes but
> sometimes TV=0 is used instead?
Originally, when DTE[Mode]=0, the TV bit is set.
Then, the commit b9f0043e1ea6 "iommu/amd: Set translation valid bit only
when IO page tables are in use" clears the TV ONLY when running on
SNP-enabled system. We didn't clear the bit for all cases since there
was a concern whether it would cause regression on older platforms.
However, I am considering clearing the TV flag for all cases to simplify
the logic since it should not violate the spec.
>> + /* Need to preserve interrupt remapping information in DTE[128:255] */
>> + new->data128[1] = dte->data128[1];
>
> It doesn't need to preserve.. write_dte_upper128() does the
> preservation automatically under the right lock. Any bits in
> DTE_DATA2_INTR_MASK should be 0 for the input DTE because they will be
> ignored by the masking:
>
> + new->data[2] &= ~DTE_DATA2_INTR_MASK;
> + new->data[2] |= old.data[2] & (DTE_DATA2_INTR_MASK | DTE_DATA2_RESV_MASK);
>
> Also this shouldn't preserve the top Guest related 64 bit for a 'clear
> dte' either.
>
> So, I think this can just be
>
> new->data128[1] = 0;
>
> ?
Good point. I'll clean this up.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-07 14:17 ` Jason Gunthorpe
@ 2024-10-16 5:15 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-16 5:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 10/7/2024 9:17 PM, Jason Gunthorpe wrote:
> On Mon, Oct 07, 2024 at 04:13:50AM +0000, Suravee Suthikulpanit wrote:
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index deb19af48a3e..6fa2f5bb5156 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -1956,90 +1956,114 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
>> return ret;
>> }
>>
>> +static void make_clear_dte(struct amd_iommu *iommu, struct dev_table_entry *dte,
>> + struct dev_table_entry *new)
>> +{
>> + new->data[0] = DTE_FLAG_V;
>> +
>> + /* Apply erratum 63 */
>> + if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
>> + new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
>
> Doesn't this need to be
>
> /* Preserve set_dev_entry_from_acpi(), including erratum 64 */
> new->data[1] |= dte->data[1] & DTE_SYSMGT_MASK;
> if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
> new->data[0] |= BIT_ULL(DEV_ENTRY_IW);
>
> And this has a significant security issue, we can't set IW here
> because clear_dte must generate a blocked DTE, so TV=1,Mode=0,IW=1 is
> not an OK setting!!
I am reworking this part. I am going to introduce a variable in struct
dev_data to store persistent value (during driver initialization) so
that we can simply recreate the DTE from scratch w/o having to depend on
existing configuration in the table. And it should address your concern
regarding the IW bit during blocked DTE.
Suravee
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-10-16 5:15 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07 4:13 [PATCH v5 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
2024-10-07 14:06 ` Jason Gunthorpe
2024-10-11 5:25 ` Suthikulpanit, Suravee
2024-10-07 14:42 ` Uros Bizjak
2024-10-11 10:22 ` Suthikulpanit, Suravee
2024-10-11 11:05 ` Uros Bizjak
2024-10-07 4:13 ` [PATCH v5 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-10-07 14:05 ` Jason Gunthorpe
2024-10-16 5:12 ` Suthikulpanit, Suravee
2024-10-07 14:17 ` Jason Gunthorpe
2024-10-16 5:15 ` Suthikulpanit, Suravee
2024-10-07 4:13 ` [PATCH v5 4/6] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
2024-10-07 4:13 ` [PATCH v5 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
2024-10-07 14:10 ` Jason Gunthorpe
2024-10-07 4:13 ` [PATCH v5 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox