* [PATCH v4 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
@ 2024-09-16 17:18 ` Suravee Suthikulpanit
2024-09-16 17:18 ` [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
` (5 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-16 17:18 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
According to the AMD IOMMU spec, IOMMU hardware reads the entire DTE
in a single 256-bit transaction. It is recommended to update DTE using
128-bit operation followed by an INVALIDATE_DEVTAB_ENTYRY command when
the IV=1b or V=1b before the change.
According to the AMD BIOS and Kernel Developer's Guide (BDKG) dated back
to family 10h Processor [1], which is the first introduction of AMD IOMMU,
AMD processor always has CPUID Fn0000_0001_ECX[CMPXCHG16B]=1.
Therefore, it is safe to assume cmpxchg128 is available with all AMD
processor w/ IOMMU.
In addition, the CMPXCHG16B feature has already been checked separately
before enabling the GA, XT, and GAM modes. Consolidate the detection logic,
and fail the IOMMU initialization if the feature is not supported.
[1] https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/init.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6b15ce09e78d..983c09898a10 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1762,13 +1762,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling it.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+ /* GAM requires GA mode. */
+ if ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1778,13 +1773,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
- /*
- * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
- * XT, GAM also requires GA mode. Therefore, we need to
- * check cmpxchg16b support before enabling them.
- */
- if (!boot_cpu_has(X86_FEATURE_CX16) ||
- ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
+ /* XT and GAM require GA mode. */
+ if ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0) {
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
}
@@ -3049,6 +3039,11 @@ static int __init early_amd_iommu_init(void)
return -EINVAL;
}
+ if (!boot_cpu_has(X86_FEATURE_CX16)) {
+ pr_err("Failed to initialize. The CMPXCHG16B feature is required.\n");
+ return -EINVAL;
+ }
+
/*
* Validate checksum here so we don't need to do it when
* we actually parse the table
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-09-16 17:18 ` [PATCH v4 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
@ 2024-09-16 17:18 ` Suravee Suthikulpanit
2024-09-26 19:46 ` Jason Gunthorpe
2024-09-16 17:18 ` [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-16 17:18 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
The current implementation does not follow 128-bit write requirement
to update DTE as specified in the AMD I/O Virtualization Techonology
(IOMMU) Specification.
Therefore, modify the struct dev_table_entry to contain union of u128 data
array, and introduce a helper functions update_dte256() to update DTE using
two 128-bit cmpxchg operations to update 256-bit DTE with the modified
structure, and take into account the DTE[V, GV] bits when programming
the DTE to ensure proper order of DTE programming and flushing.
In addition, introduce a per-DTE spin_lock struct dev_data.dte_lock to
provide synchronization when updating the DTE to prevent cmpxchg128
failure.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu.h | 2 +
drivers/iommu/amd/amd_iommu_types.h | 8 ++-
drivers/iommu/amd/iommu.c | 96 +++++++++++++++++++++++++++++
3 files changed, 105 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 52e18b5f99fd..14a153c7bc12 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -28,6 +28,8 @@ void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
gfp_t gfp, size_t size);
+int iommu_flush_sync_dte(struct amd_iommu *iommu, u16 devid);
+
#ifdef CONFIG_AMD_IOMMU_DEBUGFS
void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
#else
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index c9f9a598eb82..fea7544f8c55 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -427,6 +427,8 @@
#define GCR3_VALID 0x01ULL
+#define DTE_INTR_MASK (~GENMASK_ULL(55, 52))
+
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
@@ -833,6 +835,7 @@ struct devid_map {
struct iommu_dev_data {
/*Protect against attach/detach races */
spinlock_t lock;
+ spinlock_t dte_lock; /* DTE lock for 256-bit access */
struct list_head list; /* For domain->dev_list */
struct llist_node dev_data_list; /* For global dev_data_list */
@@ -883,7 +886,10 @@ extern struct amd_iommu *amd_iommus[MAX_IOMMUS];
* Structure defining one entry in the device table
*/
struct dev_table_entry {
- u64 data[4];
+ union {
+ u64 data[4];
+ u128 data128[2];
+ };
};
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 87c5385ce3f2..48a721d10f06 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -85,6 +85,91 @@ static void set_dte_entry(struct amd_iommu *iommu,
*
****************************************************************************/
+static void write_upper(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old = {};
+
+ do {
+ old.data128[1] = ptr->data128[1];
+ new->data[2] &= ~DTE_INTR_MASK;
+ new->data[2] |= (old.data[2] & DTE_INTR_MASK);
+ } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1], new->data128[1]));
+}
+
+static void write_lower(struct dev_table_entry *ptr, struct dev_table_entry *new)
+{
+ struct dev_table_entry old = {};
+
+ do {
+ old.data128[0] = ptr->data128[0];
+ } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0]));
+}
+
+/*
+ * Note:
+ * IOMMU reads the entire Device Table entry in a single 256-bit transaction
+ * but the driver is programming DTE using 2 128-bit cmpxchg. So, the driver
+ * need to ensure the following:
+ * - DTE[V|GV] bit is being written last when setting.
+ * - DTE[V|GV] bit is being written first when clearing.
+ *
+ * This function is used only by code, which updates DMA translation part of the DTE.
+ * So, only consider control bits related to DMA when updating the entry.
+ */
+static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *new)
+{
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry *ptr = &dev_table[dev_data->devid];
+
+ spin_lock(&dev_data->dte_lock);
+
+ if (!(ptr->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is not valid. */
+ write_upper(ptr, new);
+ write_lower(ptr, new);
+ iommu_flush_sync_dte(iommu, dev_data->devid);
+ } else if (!(new->data[0] & DTE_FLAG_V)) {
+ /* Existing DTE is valid. New DTE is not valid. */
+ write_lower(ptr, new);
+ write_upper(ptr, new);
+ iommu_flush_sync_dte(iommu, dev_data->devid);
+ } else {
+ /* Existing & new DTEs are valid. */
+ if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
+ /* Existing DTE has no guest page table. */
+ write_upper(ptr, new);
+ write_lower(ptr, new);
+ iommu_flush_sync_dte(iommu, dev_data->devid);
+ } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
+ /*
+ * Existing DTE has guest page table,
+ * new DTE has no guest page table,
+ */
+ write_lower(ptr, new);
+ write_upper(ptr, new);
+ iommu_flush_sync_dte(iommu, dev_data->devid);
+ } else {
+ /*
+ * Existing DTE has guest page table,
+ * new DTE has guest page table.
+ */
+ struct dev_table_entry clear = {};
+
+ /* First disable DTE */
+ write_lower(ptr, &clear);
+ iommu_flush_sync_dte(iommu, dev_data->devid);
+
+ /* Then update DTE */
+ write_upper(ptr, new);
+ write_lower(ptr, new);
+ iommu_flush_sync_dte(iommu, dev_data->devid);
+ }
+ }
+
+ spin_unlock(&dev_data->dte_lock);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -205,6 +290,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
return NULL;
spin_lock_init(&dev_data->lock);
+ spin_lock_init(&dev_data->dte_lock);
dev_data->devid = devid;
ratelimit_default_init(&dev_data->rs);
@@ -1256,6 +1342,16 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
return iommu_queue_command(iommu, &cmd);
}
+int iommu_flush_sync_dte(struct amd_iommu *iommu, u16 devid)
+{
+ int ret;
+
+ ret = iommu_flush_dte(iommu, devid);
+ if (!ret)
+ iommu_completion_wait(iommu);
+ return ret;
+}
+
static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
{
u32 devid;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-09-16 17:18 ` [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-09-26 19:46 ` Jason Gunthorpe
2024-10-03 16:15 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 19:46 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Sep 16, 2024 at 05:18:01PM +0000, Suravee Suthikulpanit wrote:
> +static void write_lower(struct dev_table_entry *ptr, struct dev_table_entry *new)
> +{
> + struct dev_table_entry old = {};
> +
> + do {
> + old.data128[0] = ptr->data128[0];
> + } while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0]));
> +}
> +
> +/*
> + * Note:
> + * IOMMU reads the entire Device Table entry in a single 256-bit transaction
> + * but the driver is programming DTE using 2 128-bit cmpxchg. So, the driver
> + * need to ensure the following:
I wonder if the intention here was to use a SSE operation to do the
256 bit store from the CPU side too? Just thinking aloud
> + if (!(ptr->data[0] & DTE_FLAG_V)) {
> + /* Existing DTE is not valid. */
> + write_upper(ptr, new);
> + write_lower(ptr, new);
> + iommu_flush_sync_dte(iommu, dev_data->devid);
> + } else if (!(new->data[0] & DTE_FLAG_V)) {
> + /* Existing DTE is valid. New DTE is not valid. */
> + write_lower(ptr, new);
> + write_upper(ptr, new);
> + iommu_flush_sync_dte(iommu, dev_data->devid);
> + } else {
> + /* Existing & new DTEs are valid. */
> + if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
> + /* Existing DTE has no guest page table. */
> + write_upper(ptr, new);
> + write_lower(ptr, new);
> + iommu_flush_sync_dte(iommu, dev_data->devid);
> + } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
> + /*
> + * Existing DTE has guest page table,
> + * new DTE has no guest page table,
> + */
> + write_lower(ptr, new);
> + write_upper(ptr, new);
> + iommu_flush_sync_dte(iommu, dev_data->devid);
> + } else {
> + /*
> + * Existing DTE has guest page table,
> + * new DTE has guest page table.
> + */
> + struct dev_table_entry clear = {};
> +
> + /* First disable DTE */
> + write_lower(ptr, &clear);
> + iommu_flush_sync_dte(iommu, dev_data->devid);
> +
> + /* Then update DTE */
> + write_upper(ptr, new);
> + write_lower(ptr, new);
> + iommu_flush_sync_dte(iommu, dev_data->devid);
> + }
There is one branch missing where GV is valid in both and the [1]
doesn't change. Ie atomic replace of a GCR3 table.
And maybe this will need more branches later for the viommu stuff?
But otherwise yes this captures what is needed just fine.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> @@ -1256,6 +1342,16 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
> +int iommu_flush_sync_dte(struct amd_iommu *iommu, u16 devid)
> +{
> + int ret;
> +
> + ret = iommu_flush_dte(iommu, devid);
> + if (!ret)
> + iommu_completion_wait(iommu);
> + return ret;
> +}
Maybe this doesn't need to return an error since we can't handle
failure to flush DTE tables..
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-09-26 19:46 ` Jason Gunthorpe
@ 2024-10-03 16:15 ` Suthikulpanit, Suravee
2024-10-03 18:54 ` Jason Gunthorpe
0 siblings, 1 reply; 18+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-03 16:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 9/27/2024 2:46 AM, Jason Gunthorpe wrote:
> On Mon, Sep 16, 2024 at 05:18:01PM +0000, Suravee Suthikulpanit wrote:
>
> ....
>
>> + if (!(ptr->data[0] & DTE_FLAG_V)) {
>> + /* Existing DTE is not valid. */
>> + write_upper(ptr, new);
>> + write_lower(ptr, new);
>> + iommu_flush_sync_dte(iommu, dev_data->devid);
>> + } else if (!(new->data[0] & DTE_FLAG_V)) {
>> + /* Existing DTE is valid. New DTE is not valid. */
>> + write_lower(ptr, new);
>> + write_upper(ptr, new);
>> + iommu_flush_sync_dte(iommu, dev_data->devid);
>> + } else {
>> + /* Existing & new DTEs are valid. */
>> + if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
>> + /* Existing DTE has no guest page table. */
>> + write_upper(ptr, new);
>> + write_lower(ptr, new);
>> + iommu_flush_sync_dte(iommu, dev_data->devid);
>> + } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
>> + /*
>> + * Existing DTE has guest page table,
>> + * new DTE has no guest page table,
>> + */
>> + write_lower(ptr, new);
>> + write_upper(ptr, new);
>> + iommu_flush_sync_dte(iommu, dev_data->devid);
>> + } else {
>> + /*
>> + * Existing DTE has guest page table,
>> + * new DTE has guest page table.
>> + */
>> + struct dev_table_entry clear = {};
>> +
>> + /* First disable DTE */
>> + write_lower(ptr, &clear);
>> + iommu_flush_sync_dte(iommu, dev_data->devid);
>> +
>> + /* Then update DTE */
>> + write_upper(ptr, new);
>> + write_lower(ptr, new);
>> + iommu_flush_sync_dte(iommu, dev_data->devid);
>> + }
>
> There is one branch missing where GV is valid in both and the [1]
> doesn't change. Ie atomic replace of a GCR3 table.
Not sure if I follow this.
> And maybe this will need more branches later for the viommu stuff?
I will take care of this later once we introduce the change for vIOMMU
stuff.
> But otherwise yes this captures what is needed just fine.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>
>> @@ -1256,6 +1342,16 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
>> +int iommu_flush_sync_dte(struct amd_iommu *iommu, u16 devid)
>> +{
>> + int ret;
>> +
>> + ret = iommu_flush_dte(iommu, devid);
>> + if (!ret)
>> + iommu_completion_wait(iommu);
>> + return ret;
>> +}
>
> Maybe this doesn't need to return an error since we can't handle
> failure to flush DTE tables..
Okey.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE
2024-10-03 16:15 ` Suthikulpanit, Suravee
@ 2024-10-03 18:54 ` Jason Gunthorpe
0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2024-10-03 18:54 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Thu, Oct 03, 2024 at 11:15:53PM +0700, Suthikulpanit, Suravee wrote:
> On 9/27/2024 2:46 AM, Jason Gunthorpe wrote:
> > On Mon, Sep 16, 2024 at 05:18:01PM +0000, Suravee Suthikulpanit wrote:
> >
> > ....
> >
> > > + if (!(ptr->data[0] & DTE_FLAG_V)) {
> > > + /* Existing DTE is not valid. */
> > > + write_upper(ptr, new);
> > > + write_lower(ptr, new);
> > > + iommu_flush_sync_dte(iommu, dev_data->devid);
> > > + } else if (!(new->data[0] & DTE_FLAG_V)) {
> > > + /* Existing DTE is valid. New DTE is not valid. */
> > > + write_lower(ptr, new);
> > > + write_upper(ptr, new);
> > > + iommu_flush_sync_dte(iommu, dev_data->devid);
> > > + } else {
> > > + /* Existing & new DTEs are valid. */
> > > + if (!FIELD_GET(DTE_FLAG_GV, ptr->data[0])) {
> > > + /* Existing DTE has no guest page table. */
> > > + write_upper(ptr, new);
> > > + write_lower(ptr, new);
> > > + iommu_flush_sync_dte(iommu, dev_data->devid);
> > > + } else if (!FIELD_GET(DTE_FLAG_GV, new->data[0])) {
> > > + /*
> > > + * Existing DTE has guest page table,
> > > + * new DTE has no guest page table,
> > > + */
> > > + write_lower(ptr, new);
> > > + write_upper(ptr, new);
> > > + iommu_flush_sync_dte(iommu, dev_data->devid);
> > > + } else {
> > > + /*
> > > + * Existing DTE has guest page table,
> > > + * new DTE has guest page table.
> > > + */
> > > + struct dev_table_entry clear = {};
> > > +
> > > + /* First disable DTE */
> > > + write_lower(ptr, &clear);
> > > + iommu_flush_sync_dte(iommu, dev_data->devid);
> > > +
> > > + /* Then update DTE */
> > > + write_upper(ptr, new);
> > > + write_lower(ptr, new);
> > > + iommu_flush_sync_dte(iommu, dev_data->devid);
> > > + }
> >
> > There is one branch missing where GV is valid in both and the [1]
> > doesn't change. Ie atomic replace of a GCR3 table.
>
> Not sure if I follow this.
Something like:
if (FIELD_GET(DTE_FLAG_GV, ptr->data[0]) &&
FIELD_GET(DTE_FLAG_GV, new->data[0]) &&
(ptr->data[2] & DTE_INTR_MASK) == (new->data[2] & DTE_INTR_MASK)) {
/* GCR3 table has changed, but the same number of levels, no need to disable DTE */
write_lower(ptr, new);
iommu_flush_sync_dte(iommu, dev_data->devid);
}
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-09-16 17:18 ` [PATCH v4 1/6] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-09-16 17:18 ` [PATCH v4 2/6] iommu/amd: Introduce helper function to update 256-bit DTE Suravee Suthikulpanit
@ 2024-09-16 17:18 ` Suravee Suthikulpanit
2024-09-26 19:56 ` Jason Gunthorpe
2024-09-16 17:18 ` [PATCH v4 4/6] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-16 17:18 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
Also, the set_dte_entry() is used to program several DTE fields (e.g.
stage1 table, stage2 table, domain id, and etc.), which is difficult
to keep track with current implementation.
Therefore, separate logic for setting up the GCR3 Table Root Pointer,
GIOV, GV, GLX, and GuestPagingMode into another helper function
set_dte_gcr3_table().
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 117 +++++++++++++++++++++-----------------
1 file changed, 65 insertions(+), 52 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 48a721d10f06..12f27061680d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1947,17 +1947,58 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
return ret;
}
+static void set_dte_gcr3_table(struct amd_iommu *iommu,
+ struct iommu_dev_data *dev_data,
+ struct dev_table_entry *target)
+{
+ struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ u64 tmp, gcr3;
+
+ if (!gcr3_info->gcr3_tbl)
+ return;
+
+ pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
+ __func__, dev_data->devid, gcr3_info->glx,
+ (unsigned long long)gcr3_info->gcr3_tbl);
+
+ tmp = gcr3_info->glx;
+ target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
+ if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+ target->data[0] |= DTE_FLAG_GIOV;
+ target->data[0] |= DTE_FLAG_GV;
+
+ /* First mask out possible old values for GCR3 table */
+ tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+ target->data[0] &= ~tmp;
+ tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+ target->data[1] &= ~tmp;
+
+ gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+
+ /* Encode GCR3 table into DTE */
+ tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
+ target->data[0] |= tmp;
+ tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
+ target->data[1] |= tmp;
+
+ /* Mask out old values for GuestPagingMode */
+ target->data[2] &= ~(0x3ULL << DTE_GPT_LEVEL_SHIFT);
+ /* Guest page table can only support 4 and 5 levels */
+ if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
+ target->data[2] |= ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
+}
+
static void set_dte_entry(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data)
{
- u64 pte_root = 0;
- u64 flags = 0;
- u32 old_domid;
- u16 devid = dev_data->devid;
u16 domid;
+ u32 old_domid;
+ struct dev_table_entry new = {};
struct protection_domain *domain = dev_data->domain;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
if (gcr3_info && gcr3_info->gcr3_tbl)
domid = dev_data->gcr3_info.domid;
@@ -1965,72 +2006,44 @@ static void set_dte_entry(struct amd_iommu *iommu,
domid = domain->id;
if (domain->iop.mode != PAGE_MODE_NONE)
- pte_root = iommu_virt_to_phys(domain->iop.root);
+ new.data[0] = iommu_virt_to_phys(domain->iop.root);
- pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
+ new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
+ new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
/*
* When SNP is enabled, Only set TV bit when IOMMU
* page translation is in use.
*/
if (!amd_iommu_snp_en || (domid != 0))
- pte_root |= DTE_FLAG_TV;
-
- flags = dev_table[devid].data[1];
-
- if (dev_data->ats_enabled)
- flags |= DTE_FLAG_IOTLB;
+ new.data[0] |= DTE_FLAG_TV;
if (dev_data->ppr)
- pte_root |= 1ULL << DEV_ENTRY_PPR;
+ new.data[0] |= 1ULL << DEV_ENTRY_PPR;
if (domain->dirty_tracking)
- pte_root |= DTE_FLAG_HAD;
-
- if (gcr3_info && gcr3_info->gcr3_tbl) {
- u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
- u64 glx = gcr3_info->glx;
- u64 tmp;
-
- pte_root |= DTE_FLAG_GV;
- pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
-
- /* First mask out possible old values for GCR3 table */
- tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
- flags &= ~tmp;
+ new.data[0] |= DTE_FLAG_HAD;
- tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
- flags &= ~tmp;
-
- /* Encode GCR3 table into DTE */
- tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
- pte_root |= tmp;
-
- tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
- flags |= tmp;
+ if (dev_data->ats_enabled)
+ new.data[1] |= DTE_FLAG_IOTLB;
+ else
+ new.data[1] &= ~DTE_FLAG_IOTLB;
- tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
- flags |= tmp;
+ old_domid = new.data[1] & DEV_DOMID_MASK;
+ new.data[1] &= ~DEV_DOMID_MASK;
+ new.data[1] |= domid;
- if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
- dev_table[devid].data[2] |=
- ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
- }
+ /* Need to preserve DTE[96:106] */
+ new.data[1] |= dte->data[1] & DTE_FLAG_MASK;
- /* GIOV is supported with V2 page table mode only */
- if (pdom_is_v2_pgtbl_mode(domain))
- pte_root |= DTE_FLAG_GIOV;
- }
+ /* Need to preserve interrupt remapping information in DTE[128:255] */
+ new.data128[1] = dte->data128[1];
- flags &= ~DEV_DOMID_MASK;
- flags |= domid;
+ set_dte_gcr3_table(iommu, dev_data, &new);
- old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
- dev_table[devid].data[1] = flags;
- dev_table[devid].data[0] = pte_root;
+ update_dte256(iommu, dev_data, &new);
/*
* A kdump kernel might be replacing a domain ID that was copied from
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-09-16 17:18 ` [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-09-26 19:56 ` Jason Gunthorpe
2024-10-03 16:16 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 19:56 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Sep 16, 2024 at 05:18:02PM +0000, Suravee Suthikulpanit wrote:
> Also, the set_dte_entry() is used to program several DTE fields (e.g.
> stage1 table, stage2 table, domain id, and etc.), which is difficult
> to keep track with current implementation.
>
> Therefore, separate logic for setting up the GCR3 Table Root Pointer,
> GIOV, GV, GLX, and GuestPagingMode into another helper function
> set_dte_gcr3_table().
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 117 +++++++++++++++++++++-----------------
> 1 file changed, 65 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 48a721d10f06..12f27061680d 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1947,17 +1947,58 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
> return ret;
> }
>
> +static void set_dte_gcr3_table(struct amd_iommu *iommu,
> + struct iommu_dev_data *dev_data,
> + struct dev_table_entry *target)
> +{
> + struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> + u64 tmp, gcr3;
> +
> + if (!gcr3_info->gcr3_tbl)
> + return;
> +
> + pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
> + __func__, dev_data->devid, gcr3_info->glx,
> + (unsigned long long)gcr3_info->gcr3_tbl);
> +
> + tmp = gcr3_info->glx;
> + target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
> + if (pdom_is_v2_pgtbl_mode(dev_data->domain))
> + target->data[0] |= DTE_FLAG_GIOV;
When does this get called to install a gcr3 table without a v2 domain?
Other than my remark on patch 5 this looks Ok to me and making a
helper function for the gcr3 case is a good step forward.
Suggest you follow up with helper functions for blocking, identity and
v1 as well :) Then it will be really easy to follow.
Thanks,
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-09-26 19:56 ` Jason Gunthorpe
@ 2024-10-03 16:16 ` Suthikulpanit, Suravee
2024-10-03 18:49 ` Jason Gunthorpe
0 siblings, 1 reply; 18+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-03 16:16 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 9/27/2024 2:56 AM, Jason Gunthorpe wrote:
> On Mon, Sep 16, 2024 at 05:18:02PM +0000, Suravee Suthikulpanit wrote:
>> Also, the set_dte_entry() is used to program several DTE fields (e.g.
>> stage1 table, stage2 table, domain id, and etc.), which is difficult
>> to keep track with current implementation.
>>
>> Therefore, separate logic for setting up the GCR3 Table Root Pointer,
>> GIOV, GV, GLX, and GuestPagingMode into another helper function
>> set_dte_gcr3_table().
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> drivers/iommu/amd/iommu.c | 117 +++++++++++++++++++++-----------------
>> 1 file changed, 65 insertions(+), 52 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index 48a721d10f06..12f27061680d 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -1947,17 +1947,58 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
>> return ret;
>> }
>>
>> +static void set_dte_gcr3_table(struct amd_iommu *iommu,
>> + struct iommu_dev_data *dev_data,
>> + struct dev_table_entry *target)
>> +{
>> + struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
>> + u64 tmp, gcr3;
>> +
>> + if (!gcr3_info->gcr3_tbl)
>> + return;
>> +
>> + pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
>> + __func__, dev_data->devid, gcr3_info->glx,
>> + (unsigned long long)gcr3_info->gcr3_tbl);
>> +
>> + tmp = gcr3_info->glx;
>> + target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
>> + if (pdom_is_v2_pgtbl_mode(dev_data->domain))
>> + target->data[0] |= DTE_FLAG_GIOV;
>
> When does this get called to install a gcr3 table without a v2 domain?
The GCR3 table is also used when we setup v2 table for SVA stuff. In
such case, we would be setting up w/ PASID. Therefore, the GIOV bit is
not needed.
> Other than my remark on patch 5 this looks Ok to me and making a
> helper function for the gcr3 case is a good step forward.
>
> Suggest you follow up with helper functions for blocking, identity and
> v1 as well :) Then it will be really easy to follow.
We will look to simplify the code in the future.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
2024-10-03 16:16 ` Suthikulpanit, Suravee
@ 2024-10-03 18:49 ` Jason Gunthorpe
0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2024-10-03 18:49 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Thu, Oct 03, 2024 at 11:16:19PM +0700, Suthikulpanit, Suravee wrote:
> > > + tmp = gcr3_info->glx;
> > > + target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
> > > + if (pdom_is_v2_pgtbl_mode(dev_data->domain))
> > > + target->data[0] |= DTE_FLAG_GIOV;
> >
> > When does this get called to install a gcr3 table without a v2 domain?
>
> The GCR3 table is also used when we setup v2 table for SVA stuff. In such
> case, we would be setting up w/ PASID. Therefore, the GIOV bit is not
> needed.
If I understand the manual right this should be written as:
if (dev_data->domain->type != IDENTITY)
target->data[0] |= DTE_FLAG_GIOV;
Ie everything on the RID except identity should be translated through
PASID 0 and if PASID 0 is a V2 page table then it will translate and
if PASID 0 is non-valid then it will block?
Identity needs to not use GIOV otherwise it will be blocking?
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 4/6] iommu/amd: Introduce helper function get_dte256()
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (2 preceding siblings ...)
2024-09-16 17:18 ` [PATCH v4 3/6] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-09-16 17:18 ` Suravee Suthikulpanit
2024-09-26 19:49 ` Jason Gunthorpe
2024-09-16 17:18 ` [PATCH v4 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
` (2 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-16 17:18 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
And use it in clone_alias() along with update_dte256().
Also use get_dte256() in dump_dte_entry().
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 49 ++++++++++++++++++++++++++++++++-------
1 file changed, 40 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 12f27061680d..676742d6f19a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -170,6 +170,20 @@ static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_da
spin_unlock(&dev_data->dte_lock);
}
+static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+ struct dev_table_entry *dte)
+{
+ struct dev_table_entry *ptr;
+ struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+ ptr = &dev_table[dev_data->devid];
+
+ spin_lock(&dev_data->dte_lock);
+ dte->data128[0] = ptr->data128[0];
+ dte->data128[1] = ptr->data128[1];
+ spin_unlock(&dev_data->dte_lock);
+}
+
static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
{
return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -318,9 +332,11 @@ static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid
static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
{
+ struct dev_table_entry new;
struct amd_iommu *iommu;
- struct dev_table_entry *dev_table;
+ struct iommu_dev_data *dev_data, *alias_data;
u16 devid = pci_dev_id(pdev);
+ int ret = 0;
if (devid == alias)
return 0;
@@ -329,13 +345,25 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
if (!iommu)
return 0;
- amd_iommu_set_rlookup_table(iommu, alias);
- dev_table = get_dev_table(iommu);
- memcpy(dev_table[alias].data,
- dev_table[devid].data,
- sizeof(dev_table[alias].data));
+ /* Copy the data from pdev */
+ dev_data = dev_iommu_priv_get(&pdev->dev);
+ if (!dev_data) {
+ ret = -EINVAL;
+ goto out;
+ }
+ get_dte256(iommu, dev_data, &new);
- return 0;
+ /* Setup alias */
+ alias_data = search_dev_data(iommu, alias);
+ if (!alias_data) {
+ ret = -EINVAL;
+ goto out;
+ }
+ update_dte256(iommu, alias_data, &new);
+
+ amd_iommu_set_rlookup_table(iommu, alias);
+out:
+ return ret;
}
static void clone_aliases(struct amd_iommu *iommu, struct device *dev)
@@ -669,10 +697,13 @@ static void amd_iommu_uninit_device(struct device *dev)
static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
{
int i;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry dte;
+ struct iommu_dev_data *dev_data = find_dev_data(iommu, devid);
+
+ get_dte256(iommu, dev_data, &dte);
for (i = 0; i < 4; ++i)
- pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
+ pr_err("DTE[%d]: %016llx\n", i, dte.data[i]);
}
static void dump_command(unsigned long phys_addr)
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v4 4/6] iommu/amd: Introduce helper function get_dte256()
2024-09-16 17:18 ` [PATCH v4 4/6] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
@ 2024-09-26 19:49 ` Jason Gunthorpe
0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 19:49 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Sep 16, 2024 at 05:18:03PM +0000, Suravee Suthikulpanit wrote:
> And use it in clone_alias() along with update_dte256().
> Also use get_dte256() in dump_dte_entry().
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 49 ++++++++++++++++++++++++++++++++-------
> 1 file changed, 40 insertions(+), 9 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (3 preceding siblings ...)
2024-09-16 17:18 ` [PATCH v4 4/6] iommu/amd: Introduce helper function get_dte256() Suravee Suthikulpanit
@ 2024-09-16 17:18 ` Suravee Suthikulpanit
2024-09-26 19:54 ` Jason Gunthorpe
2024-09-16 17:18 ` [PATCH v4 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
2024-09-23 15:03 ` [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suthikulpanit, Suravee
6 siblings, 1 reply; 18+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-16 17:18 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
Lock DTE and copy value to a temporary storage before update using
cmpxchg128.
Also, refactor the function to simplify logic for applying erratum 63.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/amd_iommu_types.h | 2 ++
drivers/iommu/amd/iommu.c | 28 ++++++++++++++++++++--------
2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index fea7544f8c55..db3ee094a144 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -425,6 +425,8 @@
#define DTE_GPT_LEVEL_SHIFT 54
+#define DTE_SYSMGT_MASK GENMASK_ULL(41, 40)
+
#define GCR3_VALID 0x01ULL
#define DTE_INTR_MASK (~GENMASK_ULL(55, 52))
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 676742d6f19a..2df679eb61c9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2086,19 +2086,31 @@ static void set_dte_entry(struct amd_iommu *iommu,
}
}
-static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
{
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ struct dev_table_entry new;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
+
+ /*
+ * Need to preserve DTE[96:106] because certain fields are
+ * programmed using value in IVRS table from early init phase.
+ */
+ new.data[0] = DTE_FLAG_V;
- /* remove entry from the device table seen by the hardware */
- dev_table[devid].data[0] = DTE_FLAG_V;
+ /* Apply erratum 63 */
+ if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
+ new.data[0] |= BIT_ULL(DEV_ENTRY_IW);
if (!amd_iommu_snp_en)
- dev_table[devid].data[0] |= DTE_FLAG_TV;
+ new.data[0] |= DTE_FLAG_TV;
+
+ /* Need to preserve DTE[96:106] */
+ new.data[1] = dte->data[1] & DTE_FLAG_MASK;
- dev_table[devid].data[1] &= DTE_FLAG_MASK;
+ /* Need to preserve interrupt remapping information in DTE[128:255] */
+ new.data128[1] = dte->data128[1];
- amd_iommu_apply_erratum_63(iommu, devid);
+ update_dte256(iommu, dev_data, &new);
}
/* Update and flush DTE for the given device */
@@ -2109,7 +2121,7 @@ void amd_iommu_dev_update_dte(struct iommu_dev_data *dev_data, bool set)
if (set)
set_dte_entry(iommu, dev_data);
else
- clear_dte_entry(iommu, dev_data->devid);
+ clear_dte_entry(iommu, dev_data);
clone_aliases(iommu, dev_data->dev);
device_flush_dte(dev_data);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v4 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-09-16 17:18 ` [PATCH v4 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-09-26 19:54 ` Jason Gunthorpe
2024-10-03 16:15 ` Suthikulpanit, Suravee
0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 19:54 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Sep 16, 2024 at 05:18:04PM +0000, Suravee Suthikulpanit wrote:
> -static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
> +static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
> {
> - struct dev_table_entry *dev_table = get_dev_table(iommu);
> + struct dev_table_entry new;
> + struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
> +
> + /*
> + * Need to preserve DTE[96:106] because certain fields are
> + * programmed using value in IVRS table from early init phase.
> + */
> + new.data[0] = DTE_FLAG_V;
>
> - /* remove entry from the device table seen by the hardware */
> - dev_table[devid].data[0] = DTE_FLAG_V;
> + /* Apply erratum 63 */
> + if (FIELD_GET(DTE_SYSMGT_MASK, dte->data[1]) == 0x01)
> + new.data[0] |= BIT_ULL(DEV_ENTRY_IW);
>
> if (!amd_iommu_snp_en)
> - dev_table[devid].data[0] |= DTE_FLAG_TV;
> + new.data[0] |= DTE_FLAG_TV;
> +
> + /* Need to preserve DTE[96:106] */
> + new.data[1] = dte->data[1] & DTE_FLAG_MASK;
>
> - dev_table[devid].data[1] &= DTE_FLAG_MASK;
> + /* Need to preserve interrupt remapping information in DTE[128:255] */
> + new.data128[1] = dte->data128[1];
It doesn't, update_dte256() does this automatically. Just leave it
zero here.
> - amd_iommu_apply_erratum_63(iommu, devid);
> + update_dte256(iommu, dev_data, &new);
> }
I suggest you change this slightly so the flow is more like
make_clear_dte(..., struct dev_table_entry *entry) {..}
Which would have most of the above. Then:
clear_dte_entry()
{
struct dev_table_entry target;
make_clear_dte(.., &target);
update_dte256(iommu, dev_data, &new);
}
And then in the prior patches you can write like:
static void make_dte_gcr3_table(struct amd_iommu *iommu,
struct iommu_dev_data *dev_data,
struct dev_table_entry *target)
{
make_clear_dte(.., &target);
...
}
And drop all the wild masking:
+ /* First mask out possible old values for GCR3 table */
+ tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+ target->data[0] &= ~tmp;
+ tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+ target->data[1] &= ~tmp;
Since make_clear_dte() already zero'd these fields.
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v4 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update
2024-09-26 19:54 ` Jason Gunthorpe
@ 2024-10-03 16:15 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 18+ messages in thread
From: Suthikulpanit, Suravee @ 2024-10-03 16:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On 9/27/2024 2:54 AM, Jason Gunthorpe wrote:
> On Mon, Sep 16, 2024 at 05:18:04PM +0000, Suravee Suthikulpanit wrote:
>
> ....
>
> I suggest you change this slightly so the flow is more like
>
> make_clear_dte(..., struct dev_table_entry *entry) {..}
>
> Which would have most of the above. Then:
>
> clear_dte_entry()
> {
> struct dev_table_entry target;
>
> make_clear_dte(.., &target);
> update_dte256(iommu, dev_data, &new);
> }
>
> And then in the prior patches you can write like:
>
> static void make_dte_gcr3_table(struct amd_iommu *iommu,
> struct iommu_dev_data *dev_data,
> struct dev_table_entry *target)
> {
> make_clear_dte(.., &target);
> ...
> }
>
> And drop all the wild masking:
>
> + /* First mask out possible old values for GCR3 table */
> + tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
> + target->data[0] &= ~tmp;
> + tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
> + tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
> + target->data[1] &= ~tmp;
>
> Since make_clear_dte() already zero'd these fields.
Thanks for suggestion. I'll clean this up.
Suravee
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (4 preceding siblings ...)
2024-09-16 17:18 ` [PATCH v4 5/6] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-09-16 17:18 ` Suravee Suthikulpanit
2024-09-26 19:58 ` Jason Gunthorpe
2024-09-23 15:03 ` [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suthikulpanit, Suravee
6 siblings, 1 reply; 18+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-16 17:18 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit
When updating only within a 64-bit tuple of a DTE, just lock the DTE and
use WRITE_ONCE() because it is writing to memory read back by HW.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
drivers/iommu/amd/iommu.c | 42 ++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2df679eb61c9..d69b0d41e93f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2813,12 +2813,12 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
struct protection_domain *pdomain = to_pdomain(domain);
- struct dev_table_entry *dev_table;
+ struct dev_table_entry *dte;
struct iommu_dev_data *dev_data;
bool domain_flush = false;
struct amd_iommu *iommu;
unsigned long flags;
- u64 pte_root;
+ u64 new;
spin_lock_irqsave(&pdomain->lock, flags);
if (!(pdomain->dirty_tracking ^ enable)) {
@@ -2827,16 +2827,15 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
}
list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+ spin_lock(&dev_data->dte_lock);
iommu = get_amd_iommu_from_dev_data(dev_data);
-
- dev_table = get_dev_table(iommu);
- pte_root = dev_table[dev_data->devid].data[0];
-
- pte_root = (enable ? pte_root | DTE_FLAG_HAD :
- pte_root & ~DTE_FLAG_HAD);
+ dte = &get_dev_table(iommu)[dev_data->devid];
+ new = dte->data[0];
+ new = (enable ? new | DTE_FLAG_HAD : new & ~DTE_FLAG_HAD);
+ WRITE_ONCE(dte->data[0], new);
+ spin_unlock(&dev_data->dte_lock);
/* Flush device DTE */
- dev_table[dev_data->devid].data[0] = pte_root;
device_flush_dte(dev_data);
domain_flush = true;
}
@@ -3102,17 +3101,24 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
struct irq_remap_table *table)
{
- u64 dte;
- struct dev_table_entry *dev_table = get_dev_table(iommu);
+ u64 new;
+ struct dev_table_entry *dte = &get_dev_table(iommu)[devid];
+ struct iommu_dev_data *dev_data = search_dev_data(iommu, devid);
+
+ if (dev_data)
+ spin_lock(&dev_data->dte_lock);
+
+ new = dte->data[2];
+ new &= ~DTE_IRQ_PHYS_ADDR_MASK;
+ new |= iommu_virt_to_phys(table->table);
+ new |= DTE_IRQ_REMAP_INTCTL;
+ new |= DTE_INTTABLEN;
+ new |= DTE_IRQ_REMAP_ENABLE;
- dte = dev_table[devid].data[2];
- dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
- dte |= iommu_virt_to_phys(table->table);
- dte |= DTE_IRQ_REMAP_INTCTL;
- dte |= DTE_INTTABLEN;
- dte |= DTE_IRQ_REMAP_ENABLE;
+ WRITE_ONCE(dte->data[2], new);
- dev_table[devid].data[2] = dte;
+ if (dev_data)
+ spin_unlock(&dev_data->dte_lock);
}
static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 devid)
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v4 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
2024-09-16 17:18 ` [PATCH v4 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
@ 2024-09-26 19:58 ` Jason Gunthorpe
0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 19:58 UTC (permalink / raw)
To: Suravee Suthikulpanit
Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, kevin.tian,
jon.grimm, santosh.shukla, pandoh, kumaranand
On Mon, Sep 16, 2024 at 05:18:05PM +0000, Suravee Suthikulpanit wrote:
> When updating only within a 64-bit tuple of a DTE, just lock the DTE and
> use WRITE_ONCE() because it is writing to memory read back by HW.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> drivers/iommu/amd/iommu.c | 42 ++++++++++++++++++++++-----------------
> 1 file changed, 24 insertions(+), 18 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE
2024-09-16 17:17 [PATCH v4 0/6] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
` (5 preceding siblings ...)
2024-09-16 17:18 ` [PATCH v4 6/6] iommu/amd: Lock DTE before updating the entry with WRITE_ONCE() Suravee Suthikulpanit
@ 2024-09-23 15:03 ` Suthikulpanit, Suravee
6 siblings, 0 replies; 18+ messages in thread
From: Suthikulpanit, Suravee @ 2024-09-23 15:03 UTC (permalink / raw)
To: linux-kernel, iommu
Cc: joro, robin.murphy, vasant.hegde, jgg, kevin.tian, jon.grimm,
santosh.shukla, pandoh, kumaranand
On 9/17/2024 12:17 AM, Suravee Suthikulpanit wrote:
> This series modifies current implementation to use 128-bit cmpxchg to
> update DTE when needed as specified in the AMD I/O Virtualization
> Techonology (IOMMU) Specification.
>
> Please note that I have verified with the hardware designer, and they have
> confirmed that the IOMMU hardware has always been implemented with 256-bit
> read. The next revision of the IOMMU spec will be updated to correctly
> describe this part. Therefore, I have updated the implementation to avoid
> unnecessary flushing.
>
> Changes in v4:
> * Patch 1: Update commit message
> * Patch 2:
> - Move get_dte256, clone_alias, dump_dte to patch 4
> - Introduce write_upper, write_lower
> - Introduce iommu_flush_sync_dte
> - Reimplement update_dte256
> * Patch 3: Remove spinlock since it is moved inside update_dte256()
> * Patch 4: Update clone_alias() and dump_dte()
> * Patch 5: Update clear_dte_entry()
>
> v3: https://lore.kernel.org/lkml/20240906121308.5013-1-suravee.suthikulpanit@amd.com/
> v2: https://lore.kernel.org/lkml/20240829180726.5022-1-suravee.suthikulpanit@amd.com/
> v1: https://lore.kernel.org/lkml/20240819161839.4657-1-suravee.suthikulpanit@amd.com/
>
> Thanks,
> Suravee
>
> Suravee Suthikulpanit (6):
> iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
> iommu/amd: Introduce helper function to update 256-bit DTE
> iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
> iommu/amd: Introduce helper function get_dte256()
> iommu/amd: Modify clear_dte_entry() to avoid in-place update
> iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
>
> drivers/iommu/amd/amd_iommu.h | 2 +
> drivers/iommu/amd/amd_iommu_types.h | 10 +-
> drivers/iommu/amd/init.c | 23 +-
> drivers/iommu/amd/iommu.c | 332 ++++++++++++++++++++--------
> 4 files changed, 265 insertions(+), 102 deletions(-)
>
I am rebasing and will resend this patch series.
Suravee
^ permalink raw reply [flat|nested] 18+ messages in thread