* [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE
@ 2024-11-14 19:44 Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 1/9] iommu/amd: Misc ACPI IVRS debug info clean up Suravee Suthikulpanit
` (8 more replies)
0 siblings, 9 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
This series modifies current implementation to use 128-bit cmpxchg to
update DTE when needed as specified in the AMD I/O Virtualization
Techonology (IOMMU) Specification.
Please note that I have verified with the hardware designer, and they have
confirmed that the IOMMU hardware has always been implemented with 256-bit
read. The next revision of the IOMMU spec will be updated to correctly
describe this part. Therefore, I have updated the implementation to avoid
unnecessary flushing.
Also, this has been a long series. I would like to thank several folks who
have helped review and provide suggestions along the way :)
Changes in v11:
* Remove the patch to introduce __READ_ONCE() for 128-bit data type
since all 128-bit DTE access is currently done under per-DTE spin_lock.
This is to help avoid complicating __unqual_scalar_typeof() further (Per Arnd).
* Patch 4, 6:
- Replace spin_lock/unlock() with spin_lock_irqsave/spin_unlock_irqrestore()
due to the following dmesg warning:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.12.0-rc5+ #29 Not tainted
-----------------------------------------------------
cc1/145047 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff93737e255cb8 (&dev_data->dte_lock){+.+.}-{2:2}, at: update_dte256+0x5f/0x1b0
and this task is already holding:
ffff937371782150 (&domain->lock){-.-.}-{2:2}, at: alloc_pte.constprop.0+0x175/0x5e0
which would create a new lock dependency:
(&domain->lock){-.-.}-{2:2} -> (&dev_data->dte_lock){+.+.}-{2:2}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&domain->lock){-.-.}-{2:2}
... which became HARDIRQ-irq-safe at:
__lock_acquire+0x399/0xbb0
lock_acquire.part.0+0xb0/0x250
__raw_spin_lock_irqsave+0x49/0x90
amd_iommu_flush_iotlb_all+0x1b/0x50
fq_flush_iotlb+0x22/0x40
queue_iova+0x12d/0x150
__iommu_dma_unmap+0xc2/0x140
iommu_dma_unmap_page+0x44/0x90
dma_unmap_page_attrs+0x202/0x240
nvme_pci_complete_batch+0xb3/0xd0 [nvme]
nvme_irq+0x7f/0x90 [nvme]
__handle_irq_event_percpu+0x81/0x270
handle_irq_event+0x34/0x70
handle_edge_irq+0x9f/0x240
__common_interrupt+0x70/0x140
common_interrupt+0xb2/0xd0
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0x11d/0x540
cpuidle_enter+0x29/0x40
cpuidle_idle_call+0x100/0x170
do_idle+0x96/0xf0
cpu_startup_entry+0x25/0x30
start_secondary+0x11d/0x140
common_startup_64+0x13e/0x141
to a HARDIRQ-irq-unsafe lock:
(&dev_data->dte_lock){+.+.}-{2:2}
... which became HARDIRQ-irq-unsafe at:
...
__lock_acquire+0x399/0xbb0
lock_acquire.part.0+0xb0/0x250
_raw_spin_lock+0x34/0x80
update_dte256+0x5f/0x1b0
set_dte_entry+0x1d1/0x290
dev_update_dte+0x53/0x120
attach_device.isra.0+0x120/0x4f0
amd_iommu_attach_device+0x83/0xd0
__iommu_attach_device+0x1d/0xd0
__iommu_device_set_domain+0x5b/0xb0
__iommu_group_set_domain_internal+0x68/0x120
iommu_setup_default_domain+0x204/0x350
iommu_device_register+0x156/0x250
iommu_init_pci+0x18f/0x570
amd_iommu_init_pci+0xcb/0x2b0
state_next+0x7e5/0x8e0
amd_iommu_init+0x1f/0x80
pci_iommu_init+0xe/0x40
do_one_initcall+0x5f/0x2c0
do_initcalls+0xb9/0x180
kernel_init_freeable+0x149/0x230
kernel_init+0x16/0x1c0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x1a/0x30
* Patch 4:
- Introduce helper function iommu_atomic128_set() (Per Uros)
- In write_dte_update128(), remove unnecessary do-while() loop for
try-cmpxchg and remove __READ_ONCE() since the function is called
under DTE spin_lock;
* Patch 6: In get_dte256(), remove __READ_ONCE(), since the 128-bit data read
is inside DTE spin_lock.
* Patch 8: Remove READ/WRITE_ONCE() from amd_iommu_set_dirty_tracking()
since called inside DTE spin_lock.
v10: https://lore.kernel.org/lkml/20241113120327.5239-1-suravee.suthikulpanit@amd.com/
v9: https://lore.kernel.org/lkml/20241101162304.4688-1-suravee.suthikulpanit@amd.com/
v8: https://lore.kernel.org/lkml/20241031184243.4184-1-suravee.suthikulpanit@amd.com/
v7: https://lore.kernel.org/lkml/20241031091624.4895-1-suravee.suthikulpanit@amd.com/
v6: https://lore.kernel.org/lkml/20241016051756.4317-1-suravee.suthikulpanit@amd.com/
v5: https://lore.kernel.org/lkml/20241007041353.4756-1-suravee.suthikulpanit@amd.com/
v4: https://lore.kernel.org/lkml/20240916171805.324292-1-suravee.suthikulpanit@amd.com/
v3: https://lore.kernel.org/lkml/20240906121308.5013-1-suravee.suthikulpanit@amd.com/
v2: https://lore.kernel.org/lkml/20240829180726.5022-1-suravee.suthikulpanit@amd.com/
v1: https://lore.kernel.org/lkml/20240819161839.4657-1-suravee.suthikulpanit@amd.com/
Thanks,
Suravee
Suravee Suthikulpanit (9):
iommu/amd: Misc ACPI IVRS debug info clean up
iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE
flags
iommu/amd: Introduce helper function to update 256-bit DTE
iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
iommu/amd: Introduce helper function get_dte256()
iommu/amd: Modify clear_dte_entry() to avoid in-place update
iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
iommu/amd: Remove amd_iommu_apply_erratum_63()
drivers/iommu/amd/amd_iommu.h | 4 +-
drivers/iommu/amd/amd_iommu_types.h | 41 ++-
drivers/iommu/amd/init.c | 229 +++++++++--------
drivers/iommu/amd/iommu.c | 378 +++++++++++++++++++++-------
4 files changed, 440 insertions(+), 212 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v11 1/9] iommu/amd: Misc ACPI IVRS debug info clean up
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 2/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
* Remove redundant AMD-Vi prefix.
* Print IVHD device entry settings field using hex value.
* Print root device of IVHD ACPI device entry using hex value.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 +-
drivers/iommu/amd/init.c | 35 +++++++++++++----------------
2 files changed, 16 insertions(+), 21 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index fdb0357e0bb9..af87b1d094c1 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -468,7 +468,7 @@ extern bool amd_iommu_dump;
#define DUMP_printk(format, arg...) \
do { \
if (amd_iommu_dump) \
- pr_info("AMD-Vi: " format, ## arg); \
+ pr_info(format, ## arg); \
} while(0);
/* global flag if IOMMUs cache non-present entries */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 0e0a531042ac..3a7b2b0472fa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1239,7 +1239,7 @@ static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u32 *devid,
entry->cmd_line = cmd_line;
entry->root_devid = (entry->devid & (~0x7));
- pr_info("%s, add hid:%s, uid:%s, rdevid:%d\n",
+ pr_info("%s, add hid:%s, uid:%s, rdevid:%#x\n",
entry->cmd_line ? "cmd" : "ivrs",
entry->hid, entry->uid, entry->root_devid);
@@ -1331,15 +1331,14 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
switch (e->type) {
case IVHD_DEV_ALL:
- DUMP_printk(" DEV_ALL\t\t\tflags: %02x\n", e->flags);
+ DUMP_printk(" DEV_ALL\t\t\tsetting: %#02x\n", e->flags);
for (dev_i = 0; dev_i <= pci_seg->last_bdf; ++dev_i)
set_dev_entry_from_acpi(iommu, dev_i, e->flags, 0);
break;
case IVHD_DEV_SELECT:
- DUMP_printk(" DEV_SELECT\t\t\t devid: %04x:%02x:%02x.%x "
- "flags: %02x\n",
+ DUMP_printk(" DEV_SELECT\t\t\tdevid: %04x:%02x:%02x.%x flags: %#02x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
@@ -1350,8 +1349,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
break;
case IVHD_DEV_SELECT_RANGE_START:
- DUMP_printk(" DEV_SELECT_RANGE_START\t "
- "devid: %04x:%02x:%02x.%x flags: %02x\n",
+ DUMP_printk(" DEV_SELECT_RANGE_START\tdevid: %04x:%02x:%02x.%x flags: %#02x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
@@ -1364,8 +1362,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
break;
case IVHD_DEV_ALIAS:
- DUMP_printk(" DEV_ALIAS\t\t\t devid: %04x:%02x:%02x.%x "
- "flags: %02x devid_to: %02x:%02x.%x\n",
+ DUMP_printk(" DEV_ALIAS\t\t\tdevid: %04x:%02x:%02x.%x flags: %#02x devid_to: %02x:%02x.%x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
@@ -1382,9 +1379,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
break;
case IVHD_DEV_ALIAS_RANGE:
- DUMP_printk(" DEV_ALIAS_RANGE\t\t "
- "devid: %04x:%02x:%02x.%x flags: %02x "
- "devid_to: %04x:%02x:%02x.%x\n",
+ DUMP_printk(" DEV_ALIAS_RANGE\t\tdevid: %04x:%02x:%02x.%x flags: %#02x devid_to: %04x:%02x:%02x.%x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
@@ -1401,8 +1396,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
break;
case IVHD_DEV_EXT_SELECT:
- DUMP_printk(" DEV_EXT_SELECT\t\t devid: %04x:%02x:%02x.%x "
- "flags: %02x ext: %08x\n",
+ DUMP_printk(" DEV_EXT_SELECT\t\tdevid: %04x:%02x:%02x.%x flags: %#02x ext: %08x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
@@ -1414,8 +1408,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
break;
case IVHD_DEV_EXT_SELECT_RANGE:
- DUMP_printk(" DEV_EXT_SELECT_RANGE\t devid: "
- "%04x:%02x:%02x.%x flags: %02x ext: %08x\n",
+ DUMP_printk(" DEV_EXT_SELECT_RANGE\tdevid: %04x:%02x:%02x.%x flags: %#02x ext: %08x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
@@ -1428,7 +1421,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
break;
case IVHD_DEV_RANGE_END:
- DUMP_printk(" DEV_RANGE_END\t\t devid: %04x:%02x:%02x.%x\n",
+ DUMP_printk(" DEV_RANGE_END\t\tdevid: %04x:%02x:%02x.%x\n",
seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid));
@@ -1461,11 +1454,12 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
else
var = "UNKNOWN";
- DUMP_printk(" DEV_SPECIAL(%s[%d])\t\tdevid: %04x:%02x:%02x.%x\n",
+ DUMP_printk(" DEV_SPECIAL(%s[%d])\t\tdevid: %04x:%02x:%02x.%x, flags: %#02x\n",
var, (int)handle,
seg_id, PCI_BUS_NUM(devid),
PCI_SLOT(devid),
- PCI_FUNC(devid));
+ PCI_FUNC(devid),
+ e->flags);
ret = add_special_device(type, handle, &devid, false);
if (ret)
@@ -1525,11 +1519,12 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
}
devid = PCI_SEG_DEVID_TO_SBDF(seg_id, e->devid);
- DUMP_printk(" DEV_ACPI_HID(%s[%s])\t\tdevid: %04x:%02x:%02x.%x\n",
+ DUMP_printk(" DEV_ACPI_HID(%s[%s])\t\tdevid: %04x:%02x:%02x.%x, flags: %#02x\n",
hid, uid, seg_id,
PCI_BUS_NUM(devid),
PCI_SLOT(devid),
- PCI_FUNC(devid));
+ PCI_FUNC(devid),
+ e->flags);
flags = e->flags;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 2/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 1/9] iommu/amd: Misc ACPI IVRS debug info clean up Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 3/9] iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags Suravee Suthikulpanit
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
According to the AMD IOMMU spec, IOMMU hardware reads the entire DTE
in a single 256-bit transaction. It is recommended to update DTE using
128-bit operation followed by an INVALIDATE_DEVTAB_ENTYRY command when
the IV=1b or V=1b before the change.
According to the AMD BIOS and Kernel Developer's Guide (BDKG) dated back
to family 10h Processor [1], which is the first introduction of AMD IOMMU,
AMD processor always has CPUID Fn0000_0001_ECX[CMPXCHG16B]=1.
Therefore, it is safe to assume cmpxchg128 is available with all AMD
processor w/ IOMMU.
In addition, the CMPXCHG16B feature has already been checked separately
before enabling the GA, XT, and GAM modes. Consolidate the detection logic,
and fail the IOMMU initialization if the feature is not supported.
[1] https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/init.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 3a7b2b0472fa..c1607b29ebf4 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1752,13 +1752,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling it.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+ /* GAM requires GA mode. */
+ if ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1768,13 +1763,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * XT, GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling them.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
+ /* XT and GAM require GA mode. */
+ if ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0) {
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
}
@@ -3028,6 +3018,11 @@ static int __init early_amd_iommu_init(void)
return -EINVAL;
}
+ if (!boot_cpu_has(X86_FEATURE_CX16)) {
+ pr_err("Failed to initialize. The CMPXCHG16B feature is required.\n");
+ return -EINVAL;
+ }
+
/*
* Validate checksum here so we don't need to do it when
* we actually parse the table
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 3/9] iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 1/9] iommu/amd: Misc ACPI IVRS debug info clean up Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 2/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 4/9] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
During early initialization, the driver parses IVRS IVHD block to get list
of downstream devices along with their DTE flags (i.e INITPass, EIntPass,
NMIPass, SysMgt, Lint0Pass, Lint1Pass). This information is currently
store in the device DTE, and needs to be preserved when clearing
and configuring each DTE, which makes it difficult to manage.
Introduce struct ivhd_dte_flags to store IVHD DTE settings for a device or
range of devices, which are stored in the amd_ivhd_dev_flags_list during
initial IVHD parsing.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 16 ++++
drivers/iommu/amd/init.c | 113 +++++++++++++++++++++-------
2 files changed, 100 insertions(+), 29 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index af87b1d094c1..ae5f1e031722 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -220,6 +220,8 @@
#define DEV_ENTRY_EX 0x67
#define DEV_ENTRY_SYSMGT1 0x68
#define DEV_ENTRY_SYSMGT2 0x69
+#define DTE_DATA1_SYSMGT_MASK GENMASK_ULL(41, 40)
+
#define DEV_ENTRY_IRQ_TBL_EN 0x80
#define DEV_ENTRY_INIT_PASS 0xb8
#define DEV_ENTRY_EINT_PASS 0xb9
@@ -516,6 +518,9 @@ extern struct kmem_cache *amd_iommu_irq_cache;
#define for_each_pdom_dev_data_safe(pdom_dev_data, next, pdom) \
list_for_each_entry_safe((pdom_dev_data), (next), &pdom->dev_data_list, list)
+#define for_each_ivhd_dte_flags(entry) \
+ list_for_each_entry((entry), &amd_ivhd_dev_flags_list, list)
+
struct amd_iommu;
struct iommu_domain;
struct irq_domain;
@@ -884,6 +889,17 @@ struct dev_table_entry {
u64 data[4];
};
+/*
+ * Structure to sture persistent DTE flags from IVHD
+ */
+struct ivhd_dte_flags {
+ struct list_head list;
+ u16 segid;
+ u16 devid_first;
+ u16 devid_last;
+ struct dev_table_entry dte;
+};
+
/*
* One entry for unity mappings parsed out of the ACPI table.
*/
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c1607b29ebf4..015c9b045685 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -174,8 +174,8 @@ bool amd_iommu_snp_en;
EXPORT_SYMBOL(amd_iommu_snp_en);
LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
-LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
- system */
+LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the system */
+LIST_HEAD(amd_ivhd_dev_flags_list); /* list of all IVHD device entry settings */
/* Number of IOMMUs present in the system */
static int amd_iommus_present;
@@ -984,6 +984,14 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
}
/* sets a specific bit in the device table entry. */
+static void set_dte_bit(struct dev_table_entry *dte, u8 bit)
+{
+ int i = (bit >> 6) & 0x03;
+ int _bit = bit & 0x3f;
+
+ dte->data[i] |= (1UL << _bit);
+}
+
static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
u16 devid, u8 bit)
{
@@ -1136,6 +1144,19 @@ static bool copy_device_table(void)
return true;
}
+static bool search_ivhd_dte_flags(u16 segid, u16 first, u16 last)
+{
+ struct ivhd_dte_flags *e;
+
+ for_each_ivhd_dte_flags(e) {
+ if ((e->segid == segid) &&
+ (e->devid_first == first) &&
+ (e->devid_last == last))
+ return true;
+ }
+ return false;
+}
+
void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
{
int sysmgt;
@@ -1151,27 +1172,66 @@ void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
* This function takes the device specific flags read from the ACPI
* table and sets up the device table entry with that information
*/
-static void __init set_dev_entry_from_acpi(struct amd_iommu *iommu,
- u16 devid, u32 flags, u32 ext_flags)
+static void __init
+set_dev_entry_from_acpi_range(struct amd_iommu *iommu, u16 first, u16 last,
+ u32 flags, u32 ext_flags)
{
- if (flags & ACPI_DEVFLAG_INITPASS)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
- if (flags & ACPI_DEVFLAG_EXTINT)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
- if (flags & ACPI_DEVFLAG_NMI)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
- if (flags & ACPI_DEVFLAG_SYSMGT1)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
- if (flags & ACPI_DEVFLAG_SYSMGT2)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
- if (flags & ACPI_DEVFLAG_LINT0)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
- if (flags & ACPI_DEVFLAG_LINT1)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
+ int i;
+ struct dev_table_entry dte = {};
+
+ /* Parse IVHD DTE setting flags and store information */
+ if (flags) {
+ struct ivhd_dte_flags *d;
- amd_iommu_apply_erratum_63(iommu, devid);
+ if (search_ivhd_dte_flags(iommu->pci_seg->id, first, last))
+ return;
- amd_iommu_set_rlookup_table(iommu, devid);
+ d = kzalloc(sizeof(struct ivhd_dte_flags), GFP_KERNEL);
+ if (!d)
+ return;
+
+ pr_debug("%s: devid range %#x:%#x\n", __func__, first, last);
+
+ if (flags & ACPI_DEVFLAG_INITPASS)
+ set_dte_bit(&dte, DEV_ENTRY_INIT_PASS);
+ if (flags & ACPI_DEVFLAG_EXTINT)
+ set_dte_bit(&dte, DEV_ENTRY_EINT_PASS);
+ if (flags & ACPI_DEVFLAG_NMI)
+ set_dte_bit(&dte, DEV_ENTRY_NMI_PASS);
+ if (flags & ACPI_DEVFLAG_SYSMGT1)
+ set_dte_bit(&dte, DEV_ENTRY_SYSMGT1);
+ if (flags & ACPI_DEVFLAG_SYSMGT2)
+ set_dte_bit(&dte, DEV_ENTRY_SYSMGT2);
+ if (flags & ACPI_DEVFLAG_LINT0)
+ set_dte_bit(&dte, DEV_ENTRY_LINT0_PASS);
+ if (flags & ACPI_DEVFLAG_LINT1)
+ set_dte_bit(&dte, DEV_ENTRY_LINT1_PASS);
+
+ /* Apply erratum 63, which needs info in initial_dte */
+ if (FIELD_GET(DTE_DATA1_SYSMGT_MASK, dte.data[1]) == 0x1)
+ dte.data[0] |= DTE_FLAG_IW;
+
+ memcpy(&d->dte, &dte, sizeof(dte));
+ d->segid = iommu->pci_seg->id;
+ d->devid_first = first;
+ d->devid_last = last;
+ list_add_tail(&d->list, &amd_ivhd_dev_flags_list);
+ }
+
+ for (i = first; i <= last; i++) {
+ if (flags) {
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+ memcpy(&dev_table[i], &dte, sizeof(dte));
+ }
+ amd_iommu_set_rlookup_table(iommu, i);
+ }
+}
+
+static void __init set_dev_entry_from_acpi(struct amd_iommu *iommu,
+ u16 devid, u32 flags, u32 ext_flags)
+{
+ set_dev_entry_from_acpi_range(iommu, devid, devid, flags, ext_flags);
}
int __init add_special_device(u8 type, u8 id, u32 *devid, bool cmd_line)
@@ -1332,9 +1392,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
case IVHD_DEV_ALL:
DUMP_printk(" DEV_ALL\t\t\tsetting: %#02x\n", e->flags);
-
- for (dev_i = 0; dev_i <= pci_seg->last_bdf; ++dev_i)
- set_dev_entry_from_acpi(iommu, dev_i, e->flags, 0);
+ set_dev_entry_from_acpi_range(iommu, 0, pci_seg->last_bdf, e->flags, 0);
break;
case IVHD_DEV_SELECT:
@@ -1428,14 +1486,11 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu,
devid = e->devid;
for (dev_i = devid_start; dev_i <= devid; ++dev_i) {
- if (alias) {
+ if (alias)
pci_seg->alias_table[dev_i] = devid_to;
- set_dev_entry_from_acpi(iommu,
- devid_to, flags, ext_flags);
- }
- set_dev_entry_from_acpi(iommu, dev_i,
- flags, ext_flags);
}
+ set_dev_entry_from_acpi_range(iommu, devid_start, devid, flags, ext_flags);
+ set_dev_entry_from_acpi(iommu, devid_to, flags, ext_flags);
break;
case IVHD_DEV_SPECIAL: {
u8 handle, type;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 4/9] iommu/amd: Introduce helper function to update 256-bit DTE
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (2 preceding siblings ...)
2024-11-14 19:44 ` [PATCH v11 3/9] iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 20:13 ` Uros Bizjak
2024-11-14 19:44 ` [PATCH v11 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
` (4 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
The current implementation does not follow 128-bit write requirement
to update DTE as specified in the AMD I/O Virtualization Techonology
(IOMMU) Specification.
Therefore, modify the struct dev_table_entry to contain union of u128 data
array, and introduce a helper functions update_dte256() to update DTE using
two 128-bit cmpxchg operations to update 256-bit DTE with the modified
structure, and take into account the DTE[V, GV] bits when programming
the DTE to ensure proper order of DTE programming and flushing.
In addition, introduce a per-DTE spin_lock struct dev_data.dte_lock to
provide synchronization when updating the DTE to prevent cmpxchg128
failure.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 10 ++-
drivers/iommu/amd/iommu.c | 123 ++++++++++++++++++++++++++++
2 files changed, 132 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index ae5f1e031722..ea7922b06325 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -427,9 +427,13 @@
#define DTE_GCR3_SHIFT_C 43
#define DTE_GPT_LEVEL_SHIFT 54
+#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
#define GCR3_VALID 0x01ULL
+/* DTE[128:179] | DTE[184:191] */
+#define DTE_DATA2_INTR_MASK ~GENMASK_ULL(55, 52)
+
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
@@ -842,6 +846,7 @@ struct devid_map {
struct iommu_dev_data {
/*Protect against attach/detach races */
struct mutex mutex;
+ spinlock_t dte_lock; /* DTE lock for 256-bit access */
struct list_head list; /* For domain->dev_list */
struct llist_node dev_data_list; /* For global dev_data_list */
@@ -886,7 +891,10 @@ extern struct list_head amd_iommu_list;
* Structure defining one entry in the device table
*/
struct dev_table_entry {
- u64 data[4];
+ union {
+ u64 data[4];
+ u128 data128[2];
+ };
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 5ce8e6504ba7..7e0b62f2268a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -83,12 +83,125 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data);
+static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid);
+
/****************************************************************************
*
* Helper functions
*
****************************************************************************/
+static __always_inline void amd_iommu_atomic128_set(__int128 *ptr, __int128 val)
+{
+ /*
+ * Note:
+ * We use arch_try_cmpxchg128_local() because:
+ * - Need cmpxchg16b instruction mainly for 128-bit store to DTE
+ * (not necessary for cmpxchg since this function is already
+ * protected by a spin_lock for this DTE).
+ * - Neither need LOCK_PREFIX nor try loop because of the spin_lock.
+ */
+ arch_try_cmpxchg128_local(ptr, ptr, val);
+}
+
+static void write_dte_upper128(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old;
+
+ old.data128[1] = ptr->data128[1];
+ /*
+ * Preserve DTE_DATA2_INTR_MASK. This needs to be
+ * done here since it requires to be inside
+ * spin_lock(&dev_data->dte_lock) context.
+ */
+ new->data[2] &= ~DTE_DATA2_INTR_MASK;
+ new->data[2] |= old.data[2] & DTE_DATA2_INTR_MASK;
+
+ amd_iommu_atomic128_set(&ptr->data128[1], new->data128[1]);
+}
+
+static void write_dte_lower128(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ amd_iommu_atomic128_set(&ptr->data128[0], new->data128[0]);
+}
+
+/*
+ * Note:
+ * IOMMU reads the entire Device Table entry in a single 256-bit transaction
+ * but the driver is programming DTE using 2 128-bit cmpxchg. So, the driver
+ * need to ensure the following:
+ * - DTE[V|GV] bit is being written last when setting.
+ * - DTE[V|GV] bit is being written first when clearing.
+ *
+ * This function is used only by code, which updates DMA translation part of the DTE.
+ * So, only consider control bits related to DMA when updating the entry.
+ */
+static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
+{
+ unsigned long flags;
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry *ptr = &dev_table[dev_data->devid];
+
+ spin_lock_irqsave(&dev_data->dte_lock, flags);
+
+ if (!(ptr->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is not valid. */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!(new->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is valid. New DTE is not valid. */
+ write_dte_lower128(ptr, new);
+ write_dte_upper128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
+ /*
+ * Both DTEs are valid.
+ * Existing DTE has no guest page table.
+ */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
+ /*
+ * Both DTEs are valid.
+ * Existing DTE has guest page table,
+ * new DTE has no guest page table,
+ */
+ write_dte_lower128(ptr, new);
+ write_dte_upper128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else if (FIELD_GET(DTE_GPT_LEVEL_MASK, ptr->data[2]) !=
+ FIELD_GET(DTE_GPT_LEVEL_MASK, new->data[2])) {
+ /*
+ * Both DTEs are valid and have guest page table,
+ * but have different number of levels. So, we need
+ * to upadte both upper and lower 128-bit value, which
+ * require disabling and flushing.
+ */
+ struct dev_table_entry clear = {};
+
+ /* First disable DTE */
+ write_dte_lower128(ptr, &clear);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+
+ /* Then update DTE */
+ write_dte_upper128(ptr, new);
+ write_dte_lower128(ptr, new);
+ iommu_flush_dte_sync(iommu, dev_data->devid);
+ } else {
+ /*
+ * Both DTEs are valid and have guest page table,
+ * and same number of levels. We just need to only
+ * update the lower 128-bit. So no need to disable DTE.
+ */
+ write_dte_lower128(ptr, new);
+ }
+
+ spin_unlock_irqrestore(&dev_data->dte_lock, flags);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -209,6 +322,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
return NULL;
mutex_init(&dev_data->mutex);
+ spin_lock_init(&dev_data->dte_lock);
dev_data->devid = devid;
ratelimit_default_init(&dev_data->rs);
@@ -1261,6 +1375,15 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
return iommu_queue_command(iommu, &cmd);
}
+static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid)
+{
+ int ret;
+
+ ret = iommu_flush_dte(iommu, devid);
+ if (!ret)
+ iommu_completion_wait(iommu);
+}
+
static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
{
u32 devid;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (3 preceding siblings ...)
2024-11-14 19:44 ` [PATCH v11 4/9] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 20:26 ` Jason Gunthorpe
2024-11-14 19:44 ` [PATCH v11 6/9] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
` (3 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
Also, the set_dte_entry() is used to program several DTE fields (e.g.
stage1 table, stage2 table, domain id, and etc.), which is difficult
to keep track with current implementation.
Therefore, separate logic for clearing DTE (i.e. make_clear_dte) and
another function for setting up the GCR3 Table Root Pointer, GIOV, GV,
GLX, and GuestPagingMode into another function set_dte_gcr3_table().
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 2 +
drivers/iommu/amd/amd_iommu_types.h | 13 +--
drivers/iommu/amd/init.c | 30 ++++++-
drivers/iommu/amd/iommu.c | 129 ++++++++++++++++------------
4 files changed, 106 insertions(+), 68 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 38509e1019e9..f9260bb8fc85 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -183,3 +183,5 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
#endif
+
+struct dev_table_entry *amd_iommu_get_ivhd_dte_flags(u16 segid, u16 devid);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index ea7922b06325..0bbda60d3cdc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -409,8 +409,7 @@
#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
-#define DTE_GLX_SHIFT (56)
-#define DTE_GLX_MASK (3)
+#define DTE_GLX GENMASK_ULL(57, 56)
#define DTE_FLAG_IR BIT_ULL(61)
#define DTE_FLAG_IW BIT_ULL(62)
@@ -418,13 +417,9 @@
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DEV_DOMID_MASK 0xffffULL
-#define DTE_GCR3_VAL_A(x) (((x) >> 12) & 0x00007ULL)
-#define DTE_GCR3_VAL_B(x) (((x) >> 15) & 0x0ffffULL)
-#define DTE_GCR3_VAL_C(x) (((x) >> 31) & 0x1fffffULL)
-
-#define DTE_GCR3_SHIFT_A 58
-#define DTE_GCR3_SHIFT_B 16
-#define DTE_GCR3_SHIFT_C 43
+#define DTE_GCR3_14_12 GENMASK_ULL(60, 58)
+#define DTE_GCR3_30_15 GENMASK_ULL(31, 16)
+#define DTE_GCR3_51_31 GENMASK_ULL(63, 43)
#define DTE_GPT_LEVEL_SHIFT 54
#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 015c9b045685..1e4b8040c374 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1089,11 +1089,9 @@ static bool __copy_device_table(struct amd_iommu *iommu)
}
/* If gcr3 table existed, mask it out */
if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
- tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
- tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+ tmp = (DTE_GCR3_30_15 | DTE_GCR3_51_31);
pci_seg->old_dev_tbl_cpy[devid].data[1] &= ~tmp;
- tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
- tmp |= DTE_FLAG_GV;
+ tmp = (DTE_GCR3_14_12 | DTE_FLAG_GV);
pci_seg->old_dev_tbl_cpy[devid].data[0] &= ~tmp;
}
}
@@ -1144,6 +1142,30 @@ static bool copy_device_table(void)
return true;
}
+struct dev_table_entry *amd_iommu_get_ivhd_dte_flags(u16 segid, u16 devid)
+{
+ struct ivhd_dte_flags *e;
+ unsigned int best_len = UINT_MAX;
+ struct dev_table_entry *dte = NULL;
+
+ for_each_ivhd_dte_flags(e) {
+ /*
+ * Need to go through the whole list to find the smallest range,
+ * which contains the devid.
+ */
+ if ((e->segid == segid) &&
+ (e->devid_first <= devid) && (devid <= e->devid_last)) {
+ unsigned int len = e->devid_last - e->devid_first;
+
+ if (len < best_len) {
+ dte = &(e->dte);
+ best_len = len;
+ }
+ }
+ }
+ return dte;
+}
+
static bool search_ivhd_dte_flags(u16 segid, u16 first, u16 last)
{
struct ivhd_dte_flags *e;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7e0b62f2268a..cc778056bcd5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1944,90 +1944,109 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
+static void make_clear_dte(struct iommu_dev_data *dev_data, struct dev_table_entry *ptr,
+ struct dev_table_entry *new)
+{
+ /* All existing DTE must have V bit set */
+ new->data128[0] = DTE_FLAG_V;
+ new->data128[1] = 0;
+}
+
+/*
+ * Note:
+ * The old value for GCR3 table and GPT have been cleared from caller.
+ */
+static void set_dte_gcr3_table(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *target)
+{
+ struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ u64 gcr3;
+
+ if (!gcr3_info->gcr3_tbl)
+ return;
+
+ pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
+ __func__, dev_data->devid, gcr3_info->glx,
+ (unsigned long long)gcr3_info->gcr3_tbl);
+
+ gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+
+ target->data[0] |= DTE_FLAG_GV |
+ FIELD_PREP(DTE_GLX, gcr3_info->glx) |
+ FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12);
+ if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+ target->data[0] |= DTE_FLAG_GIOV;
+
+ target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
+ FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31);
+
+ /* Guest page table can only support 4 and 5 levels */
+ if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
+ target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
+ else
+ target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
+}
+
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data)
{
- u64 pte_root = 0;
- u64 flags = 0;
- u32 old_domid;
- u16 devid = dev_data->devid;
u16 domid;
+ u32 old_domid;
+ struct dev_table_entry *initial_dte;
+ struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
else
domid = domain->id;
+ make_clear_dte(dev_data, dte, &new);
+
if (domain->iop.mode != PAGE_MODE_NONE)
- pte_root = iommu_virt_to_phys(domain->iop.root);
+ new.data[0] = iommu_virt_to_phys(domain->iop.root);
- pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
+ new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
+ new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
/*
- * When SNP is enabled, Only set TV bit when IOMMU
- * page translation is in use.
+ * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
+ * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
+ * do_iommu_domain_alloc().
*/
- if (!amd_iommu_snp_en || (domid != 0))
- pte_root |= DTE_FLAG_TV;
-
- flags = dev_table[devid].data[1];
-
- if (dev_data->ats_enabled)
- flags |= DTE_FLAG_IOTLB;
+ WARN_ON(amd_iommu_snp_en && (domid == 0));
+ new.data[0] |= DTE_FLAG_TV;
if (dev_data->ppr)
- pte_root |= 1ULL << DEV_ENTRY_PPR;
+ new.data[0] |= 1ULL << DEV_ENTRY_PPR;
if (domain->dirty_tracking)
- pte_root |= DTE_FLAG_HAD;
-
- if (gcr3_info && gcr3_info->gcr3_tbl) {
- u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- u64 glx = gcr3_info->glx;
- u64 tmp;
-
- pte_root |= DTE_FLAG_GV;
- pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
+ new.data[0] |= DTE_FLAG_HAD;
- /* First mask out possible old values for GCR3 table */
- tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
- flags &= ~tmp;
-
- tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
- flags &= ~tmp;
-
- /* Encode GCR3 table into DTE */
- tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
- pte_root |= tmp;
-
- tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
- flags |= tmp;
-
- tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
- flags |= tmp;
+ if (dev_data->ats_enabled)
+ new.data[1] |= DTE_FLAG_IOTLB;
- if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
- dev_table[devid].data[2] |=
- ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
- }
+ old_domid = READ_ONCE(dte->data[1]) & DEV_DOMID_MASK;
+ new.data[1] |= domid;
- /* GIOV is supported with V2 page table mode only */
- if (pdom_is_v2_pgtbl_mode(domain))
- pte_root |= DTE_FLAG_GIOV;
+ /*
+ * Restore cached persistent DTE bits, which can be set by information
+ * in IVRS table. See set_dev_entry_from_acpi().
+ */
+ initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
+ if (initial_dte) {
+ new.data128[0] |= initial_dte->data128[0];
+ new.data128[1] |= initial_dte->data128[1];
}
- flags &= ~DEV_DOMID_MASK;
- flags |= domid;
+ set_dte_gcr3_table(iommu, dev_data, &new);
- old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
- dev_table[devid].data[1] = flags;
- dev_table[devid].data[0] = pte_root;
+ update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 6/9] iommu/amd: Introduce helper function get_dte256()
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (4 preceding siblings ...)
2024-11-14 19:44 ` [PATCH v11 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 7/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
And use it in clone_alias() along with update_dte256().
Also use get_dte256() in dump_dte_entry().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 62 ++++++++++++++++++++++++++++++++-------
1 file changed, 51 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cc778056bcd5..8f1de15c07a2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -85,6 +85,8 @@ static void set_dte_entry(struct amd_iommu *iommu,
static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid);
+static struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid);
+
/****************************************************************************
*
* Helper functions
@@ -202,6 +204,21 @@ static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_da
spin_unlock_irqrestore(&dev_data->dte_lock, flags);
}
+static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *dte)
+{
+ unsigned long flags;
+ struct dev_table_entry *ptr;
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+ ptr = &dev_table[dev_data->devid];
+
+ spin_lock_irqsave(&dev_data->dte_lock, flags);
+ dte->data128[0] = ptr->data128[0];
+ dte->data128[1] = ptr->data128[1];
+ spin_unlock_irqrestore(&dev_data->dte_lock, flags);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -350,9 +367,11 @@ static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid
static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
{
+ struct dev_table_entry new;
struct amd_iommu *iommu;
- struct dev_table_entry *dev_table;
+ struct iommu_dev_data *dev_data, *alias_data;
u16 devid = pci_dev_id(pdev);
+ int ret = 0;
if (devid == alias)
return 0;
@@ -361,13 +380,27 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
if (!iommu)
return 0;
- amd_iommu_set_rlookup_table(iommu, alias);
- dev_table = get_dev_table(iommu);
- memcpy(dev_table[alias].data,
- dev_table[devid].data,
- sizeof(dev_table[alias].data));
+ /* Copy the data from pdev */
+ dev_data = dev_iommu_priv_get(&pdev->dev);
+ if (!dev_data) {
+ pr_err("%s : Failed to get dev_data for 0x%x\n", __func__, devid);
+ ret = -EINVAL;
+ goto out;
+ }
+ get_dte256(iommu, dev_data, &new);
- return 0;
+ /* Setup alias */
+ alias_data = find_dev_data(iommu, alias);
+ if (!alias_data) {
+ pr_err("%s : Failed to get alias dev_data for 0x%x\n", __func__, alias);
+ ret = -EINVAL;
+ goto out;
+ }
+ update_dte256(iommu, alias_data, &new);
+
+ amd_iommu_set_rlookup_table(iommu, alias);
+out:
+ return ret;
}
static void clone_aliases(struct amd_iommu *iommu, struct device *dev)
@@ -640,6 +673,12 @@ static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
return -ENOMEM;
dev_data->dev = dev;
+
+ /*
+ * The dev_iommu_priv_set() needes to be called before setup_aliases.
+ * Otherwise, subsequent call to dev_iommu_priv_get() will fail.
+ */
+ dev_iommu_priv_set(dev, dev_data);
setup_aliases(iommu, dev);
/*
@@ -653,8 +692,6 @@ static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
dev_data->flags = pdev_get_caps(to_pci_dev(dev));
}
- dev_iommu_priv_set(dev, dev_data);
-
return 0;
}
@@ -685,10 +722,13 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
{
int i;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry dte;
+ struct iommu_dev_data *dev_data = find_dev_data(iommu, devid);
+
+ get_dte256(iommu, dev_data, &dte);
for (i = 0; i < 4; ++i)
- pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
+ pr_err("DTE[%d]: %016llx\n", i, dte.data[i]);
}
static void dump_command(unsigned long phys_addr)
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 7/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (5 preceding siblings ...)
2024-11-14 19:44 ` [PATCH v11 6/9] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 8/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 9/9] iommu/amd: Remove amd_iommu_apply_erratum_63() Suravee Suthikulpanit
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
By reusing the make_clear_dte() and update_dte256().
Also, there is no need to set TV bit for non-SNP system when clearing DTE
for blocked domain, and no longer need to apply erratum 63 in clear_dte()
since it is already stored in struct ivhd_dte_flags and apply in
set_dte_entry().
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 8f1de15c07a2..70002fe3994f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2098,19 +2098,16 @@ static void set_dte_entry(struct amd_iommu *iommu,
}
}
-static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
+/*
+ * Clear DMA-remap related flags to block all DMA (blockeded domain)
+ */
+static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- /* remove entry from the device table seen by the hardware */
- dev_table[devid].data[0] = DTE_FLAG_V;
-
- if (!amd_iommu_snp_en)
- dev_table[devid].data[0] |= DTE_FLAG_TV;
-
- dev_table[devid].data[1] &= DTE_FLAG_MASK;
+ struct dev_table_entry new = {};
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
- amd_iommu_apply_erratum_63(iommu, devid);
+ make_clear_dte(dev_data, dte, &new);
+ update_dte256(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
@@ -2121,7 +2118,7 @@ static void dev_update_dte(struct iommu_dev_data *dev_data, bool set)
if (set)
set_dte_entry(iommu, dev_data);
else
- clear_dte_entry(iommu, dev_data->devid);
+ clear_dte_entry(iommu, dev_data);
clone_aliases(iommu, dev_data->dev);
device_flush_dte(dev_data);
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 8/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (6 preceding siblings ...)
2024-11-14 19:44 ` [PATCH v11 7/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 9/9] iommu/amd: Remove amd_iommu_apply_erratum_63() Suravee Suthikulpanit
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
When updating only within a 64-bit tuple of a DTE, just lock the DTE and
use WRITE_ONCE() because it is writing to memory read back by HW.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/iommu.c | 43 +++++++++++++++++++----------------
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f9260bb8fc85..7b43894f6b90 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -185,3 +185,4 @@ struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
#endif
struct dev_table_entry *amd_iommu_get_ivhd_dte_flags(u16 segid, u16 devid);
+struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 70002fe3994f..65225100cb23 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -347,7 +347,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
return dev_data;
}
-static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid)
+struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid)
{
struct iommu_dev_data *dev_data;
struct llist_node *node;
@@ -2838,12 +2838,12 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
struct protection_domain *pdomain = to_pdomain(domain);
- struct dev_table_entry *dev_table;
+ struct dev_table_entry *dte;
struct iommu_dev_data *dev_data;
bool domain_flush = false;
struct amd_iommu *iommu;
unsigned long flags;
- u64 pte_root;
+ u64 new;
spin_lock_irqsave(&pdomain->lock, flags);
if (!(pdomain->dirty_tracking ^ enable)) {
@@ -2852,16 +2852,15 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
}
list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+ spin_lock(&dev_data->dte_lock);
iommu = get_amd_iommu_from_dev_data(dev_data);
-
- dev_table = get_dev_table(iommu);
- pte_root = dev_table[dev_data->devid].data[0];
-
- pte_root = (enable ? pte_root | DTE_FLAG_HAD :
- pte_root & ~DTE_FLAG_HAD);
+ dte = &get_dev_table(iommu)[dev_data->devid];
+ new = dte->data[0];
+ new = (enable ? new | DTE_FLAG_HAD : new & ~DTE_FLAG_HAD);
+ dte->data[0] = new;
+ spin_unlock(&dev_data->dte_lock);
/* Flush device DTE */
- dev_table[dev_data->devid].data[0] = pte_root;
device_flush_dte(dev_data);
domain_flush = true;
}
@@ -3128,17 +3127,23 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
struct irq_remap_table *table)
{
- u64 dte;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ u64 new;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[devid];
+ struct iommu_dev_data *dev_data = search_dev_data(iommu, devid);
+
+ if (dev_data)
+ spin_lock(&dev_data->dte_lock);
- dte = dev_table[devid].data[2];
- dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
- dte |= iommu_virt_to_phys(table->table);
- dte |= DTE_IRQ_REMAP_INTCTL;
- dte |= DTE_INTTABLEN;
- dte |= DTE_IRQ_REMAP_ENABLE;
+ new = READ_ONCE(dte->data[2]);
+ new &= ~DTE_IRQ_PHYS_ADDR_MASK;
+ new |= iommu_virt_to_phys(table->table);
+ new |= DTE_IRQ_REMAP_INTCTL;
+ new |= DTE_INTTABLEN;
+ new |= DTE_IRQ_REMAP_ENABLE;
+ WRITE_ONCE(dte->data[2], new);
- dev_table[devid].data[2] = dte;
+ if (dev_data)
+ spin_unlock(&dev_data->dte_lock);
}
static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 devid)
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v11 9/9] iommu/amd: Remove amd_iommu_apply_erratum_63()
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (7 preceding siblings ...)
2024-11-14 19:44 ` [PATCH v11 8/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
@ 2024-11-14 19:44 ` Suravee Suthikulpanit
8 siblings, 0 replies; 12+ messages in thread
From: Suravee Suthikulpanit @ 2024-11-14 19:44 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, arnd, ubizjak, linux-arch, jgg,
kevin.tian, jon.grimm, santosh.shukla, pandoh, kumaranand,
Suravee Suthikulpanit
Also replace __set_dev_entry_bit() with set_dte_bit() and remove unused
helper functions.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 1 -
drivers/iommu/amd/init.c | 50 +++--------------------------------
2 files changed, 3 insertions(+), 48 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 7b43894f6b90..621ffb5d8669 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -16,7 +16,6 @@ irqreturn_t amd_iommu_int_thread_evtlog(int irq, void *data);
irqreturn_t amd_iommu_int_thread_pprlog(int irq, void *data);
irqreturn_t amd_iommu_int_thread_galog(int irq, void *data);
irqreturn_t amd_iommu_int_handler(int irq, void *data);
-void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type,
u8 cntrl_intr, u8 cntrl_log,
u32 status_run_mask, u32 status_overflow_mask);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1e4b8040c374..41294807452d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -992,38 +992,6 @@ static void set_dte_bit(struct dev_table_entry *dte, u8 bit)
dte->data[i] |= (1UL << _bit);
}
-static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
- u16 devid, u8 bit)
-{
- int i = (bit >> 6) & 0x03;
- int _bit = bit & 0x3f;
-
- dev_table[devid].data[i] |= (1UL << _bit);
-}
-
-static void set_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
-{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- return __set_dev_entry_bit(dev_table, devid, bit);
-}
-
-static int __get_dev_entry_bit(struct dev_table_entry *dev_table,
- u16 devid, u8 bit)
-{
- int i = (bit >> 6) & 0x03;
- int _bit = bit & 0x3f;
-
- return (dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
-}
-
-static int get_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
-{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
-
- return __get_dev_entry_bit(dev_table, devid, bit);
-}
-
static bool __copy_device_table(struct amd_iommu *iommu)
{
u64 int_ctl, int_tab_len, entry = 0;
@@ -1179,17 +1147,6 @@ static bool search_ivhd_dte_flags(u16 segid, u16 first, u16 last)
return false;
}
-void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
-{
- int sysmgt;
-
- sysmgt = get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1) |
- (get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2) << 1);
-
- if (sysmgt == 0x01)
- set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
-}
-
/*
* This function takes the device specific flags read from the ACPI
* table and sets up the device table entry with that information
@@ -2637,9 +2594,9 @@ static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
return;
for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
- __set_dev_entry_bit(dev_table, devid, DEV_ENTRY_VALID);
+ set_dte_bit(&dev_table[devid], DEV_ENTRY_VALID);
if (!amd_iommu_snp_en)
- __set_dev_entry_bit(dev_table, devid, DEV_ENTRY_TRANSLATION);
+ set_dte_bit(&dev_table[devid], DEV_ENTRY_TRANSLATION);
}
}
@@ -2667,8 +2624,7 @@ static void init_device_table(void)
for_each_pci_segment(pci_seg) {
for (devid = 0; devid <= pci_seg->last_bdf; ++devid)
- __set_dev_entry_bit(pci_seg->dev_table,
- devid, DEV_ENTRY_IRQ_TBL_EN);
+ set_dte_bit(&pci_seg->dev_table[devid], DEV_ENTRY_IRQ_TBL_EN);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v11 4/9] iommu/amd: Introduce helper function to update 256-bit DTE
2024-11-14 19:44 ` [PATCH v11 4/9] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-11-14 20:13 ` Uros Bizjak
0 siblings, 0 replies; 12+ messages in thread
From: Uros Bizjak @ 2024-11-14 20:13 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, arnd,
linux-arch, jgg, kevin.tian, jon.grimm, santosh.shukla, pandoh,
kumaranand
On Thu, Nov 14, 2024 at 8:45 PM Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
>
> The current implementation does not follow 128-bit write requirement
> to update DTE as specified in the AMD I/O Virtualization Techonology
> (IOMMU) Specification.
>
> Therefore, modify the struct dev_table_entry to contain union of u128 data
> array, and introduce a helper functions update_dte256() to update DTE using
> two 128-bit cmpxchg operations to update 256-bit DTE with the modified
> structure, and take into account the DTE[V, GV] bits when programming
> the DTE to ensure proper order of DTE programming and flushing.
>
> In addition, introduce a per-DTE spin_lock struct dev_data.dte_lock to
> provide synchronization when updating the DTE to prevent cmpxchg128
> failure.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Uros Bizjak <ubizjak@gmail.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu_types.h | 10 ++-
> drivers/iommu/amd/iommu.c | 123 ++++++++++++++++++++++++++++
> 2 files changed, 132 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index ae5f1e031722..ea7922b06325 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -427,9 +427,13 @@
> #define DTE_GCR3_SHIFT_C 43
>
> #define DTE_GPT_LEVEL_SHIFT 54
> +#define DTE_GPT_LEVEL_MASK GENMASK_ULL(55, 54)
>
> #define GCR3_VALID 0x01ULL
>
> +/* DTE[128:179] | DTE[184:191] */
> +#define DTE_DATA2_INTR_MASK ~GENMASK_ULL(55, 52)
> +
> #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
> #define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
> #define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
> @@ -842,6 +846,7 @@ struct devid_map {
> struct iommu_dev_data {
> /*Protect against attach/detach races */
> struct mutex mutex;
> + spinlock_t dte_lock; /* DTE lock for 256-bit access */
>
> struct list_head list; /* For domain->dev_list */
> struct llist_node dev_data_list; /* For global dev_data_list */
> @@ -886,7 +891,10 @@ extern struct list_head amd_iommu_list;
> * Structure defining one entry in the device table
> */
> struct dev_table_entry {
> - u64 data[4];
> + union {
> + u64 data[4];
> + u128 data128[2];
> + };
> };
>
> /*
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 5ce8e6504ba7..7e0b62f2268a 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -83,12 +83,125 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
> static void set_dte_entry(struct amd_iommu *iommu,
> struct iommu_dev_data *dev_data);
>
> +static void iommu_flush_dte_sync(struct amd_iommu *iommu, u16 devid);
> +
> /****************************************************************************
> *
> * Helper functions
> *
> ****************************************************************************/
>
> +static __always_inline void amd_iommu_atomic128_set(__int128 *ptr, __int128 val)
> +{
> + /*
> + * Note:
> + * We use arch_try_cmpxchg128_local() because:
> + * - Need cmpxchg16b instruction mainly for 128-bit store to DTE
> + * (not necessary for cmpxchg since this function is already
> + * protected by a spin_lock for this DTE).
> + * - Neither need LOCK_PREFIX nor try loop because of the spin_lock.
> + */
> + arch_try_cmpxchg128_local(ptr, ptr, val);
Just use arch_cmpxchg128_local() here.
With that:
Acked-by: Uros Bizjak <ubizjak@gmail.com>.
Uros.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-11-14 19:44 ` [PATCH v11 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-11-14 20:26 ` Jason Gunthorpe
0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2024-11-14 20:26 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, arnd,
ubizjak, linux-arch, kevin.tian, jon.grimm, santosh.shukla,
pandoh, kumaranand
On Thu, Nov 14, 2024 at 07:44:32PM +0000, Suravee Suthikulpanit wrote:
> Also, the set_dte_entry() is used to program several DTE fields (e.g.
> stage1 table, stage2 table, domain id, and etc.), which is difficult
> to keep track with current implementation.
>
> Therefore, separate logic for clearing DTE (i.e. make_clear_dte) and
> another function for setting up the GCR3 Table Root Pointer, GIOV, GV,
> GLX, and GuestPagingMode into another function set_dte_gcr3_table().
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/amd_iommu.h | 2 +
> drivers/iommu/amd/amd_iommu_types.h | 13 +--
> drivers/iommu/amd/init.c | 30 ++++++-
> drivers/iommu/amd/iommu.c | 129 ++++++++++++++++------------
> 4 files changed, 106 insertions(+), 68 deletions(-)
Got lost on v10:
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-11-14 20:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-14 19:44 [PATCH v11 0/9] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 1/9] iommu/amd: Misc ACPI IVRS debug info clean up Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 2/9] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 3/9] iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE flags Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 4/9] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
2024-11-14 20:13 ` Uros Bizjak
2024-11-14 19:44 ` [PATCH v11 5/9] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-11-14 20:26 ` Jason Gunthorpe
2024-11-14 19:44 ` [PATCH v11 6/9] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 7/9] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 8/9] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
2024-11-14 19:44 ` [PATCH v11 9/9] iommu/amd: Remove amd_iommu_apply_erratum_63() Suravee Suthikulpanit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).