All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF
@ 2024-01-18  7:33 Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 01/14] iommu/amd: Rename amd_iommu_v2_supported() as amd_iommu_pasid_supported() Vasant Hegde
                   ` (14 more replies)
  0 siblings, 15 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

This is part 4 of the 4-part series to introduce Share Virtual Address
(SVA) support for devices, which can support PCI ATS, PASID and PRI
capabilities. These devices are referred to as SVA-capable devices in
this series.

It contains the following enhancements:

* Patch 1 - 3:
  Rename, add support to enable/disable features, update DTE etc.

* Patch 4 - 10:
  Add IOPF support

* Patch 11 - 14:
  Introduce SVA support


This patch series is based on top of SVA Part 3 v5.
  https://lore.kernel.org/linux-iommu/20240116165335.6043-1-vasant.hegde@amd.com/T/#t

Also depends on :
  1) Baolu's IOPF improvement
     https://lore.kernel.org/linux-iommu/20231220012332.168188-1-baolu.lu@linux.intel.com/

This is also available at github :
  https://github.com/AMDESE/linux/tree/iommu_sva_part4_v4_v6.7_rc8

Thanks everyone who reviewed previous version and provided valuable feedbacks.


Changes from v4 -> v5:
  - Rebased on top of v6.7-rc8 + SVA Part3 patches
  - Few minor improvements like renaming structure name for better, introducing macros, etc

V4: https://lore.kernel.org/linux-iommu/20231221111558.64652-1-vasant.hegde@amd.com/

Changes from v3 -> v4:
  - Moved amd_iommu_dev_update_dte() after set/clear_dte() so that we can avoid
    forward declaration
  - Dropped "iommu/amd: Do not override PASID entry in GCR3 table"
  - Added patch to fix PPR interrupt processing logic
  - Renamed enable_iommus_v2() -> enable_iommus_ppr()
  - Added ops->domain_alloc_sva()
  - Added domain_alloc_sva() support and reorganize SVA patches
  - In error path iommu_call_iopf_notifier() calls amd_iommu_complete_ppr()
    instead of amd_iommu_page_response()


v3: https://lore.kernel.org/linux-iommu/20231016104351.5749-1-vasant.hegde@amd.com/T/#t

Changes from v2 -> v3:
  - Rename sva.c -> pasid.c
  - Changed amd_iommu_sva_supported() -> amd_iommu_pasid_supported()
  - Added patch to update/flush DTE
  - Rework part of SVA support
  - Move IOPF enablement to PASID bind time

v2: https://lore.kernel.org/linux-iommu/20230911121046.1025732-1-vasant.hegde@amd.com/T/#u

Changes from v1 -> v2:
  - Added new patch to fix PASID override issue in GCR3 table
  - Complete rework of SVA code on top of Tina's SVA series.
  - Rework SVA enable code
  - Reworked IOPF handler code on top of Baolu's IOPF improvement series.


v1: https://lore.kernel.org/linux-iommu/20230823140415.729050-1-vasant.hegde@amd.com/T/#t

Thank you,
Vasant / Suravee

Jason Gunthorpe (1):
  iommu: Add ops->domain_alloc_sva()

Suravee Suthikulpanit (5):
  iommu/amd: Move PPR-related functions into ppr.c
  iommu/amd: Define per-IOMMU iopf_queue
  iommu/amd: Add support for page response
  iommu/amd: Introduce logic to enable/disable IOPF
  iommu/amd: Add GCR3 [un]initialization function

Vasant Hegde (5):
  iommu/amd: Rename amd_iommu_v2_supported() as
    amd_iommu_pasid_supported()
  iommu/amd: Introduce per device DTE update function
  iommu/amd: Fix PPR interrupt processing logic
  iommu/amd: Initial SVA support for AMD IOMMU
  iommu/amd: Add SVA domain support

Wei Huang (3):
  iommu/amd: Add support for enabling/disabling IOMMU features
  iommu/amd: Add support for add/remove device for IOPF
  iommu/amd: Add IO page fault notifier handler

 drivers/iommu/amd/Kconfig           |   2 +
 drivers/iommu/amd/Makefile          |   2 +-
 drivers/iommu/amd/amd_iommu.h       |  51 +++-
 drivers/iommu/amd/amd_iommu_types.h |  32 +++
 drivers/iommu/amd/init.c            |  79 ++-----
 drivers/iommu/amd/iommu.c           | 190 +++++++++------
 drivers/iommu/amd/pasid.c           | 271 +++++++++++++++++++++
 drivers/iommu/amd/ppr.c             | 354 ++++++++++++++++++++++++++++
 drivers/iommu/iommu-sva.c           |  16 +-
 include/linux/iommu.h               |   3 +
 10 files changed, 858 insertions(+), 142 deletions(-)
 create mode 100644 drivers/iommu/amd/pasid.c
 create mode 100644 drivers/iommu/amd/ppr.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5 01/14] iommu/amd: Rename amd_iommu_v2_supported() as amd_iommu_pasid_supported()
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 02/14] iommu/amd: Introduce per device DTE update function Vasant Hegde
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde,
	Jason Gunthorpe

To reflect its usage. No functional changes intended.

Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/amd_iommu.h | 2 +-
 drivers/iommu/amd/init.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index b61293f2aef6..334394d14753 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -38,7 +38,7 @@ extern int amd_iommu_guest_ir;
 extern enum io_pgtable_fmt amd_iommu_pgtable;
 extern int amd_iommu_gpt_level;
 
-bool amd_iommu_v2_supported(void);
+bool amd_iommu_pasid_supported(void);
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 u8 amd_iommu_pc_get_max_banks(unsigned int idx);
 bool amd_iommu_pc_supported(void);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 959820ccfbcc..62f695c519b5 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3652,7 +3652,7 @@ __setup("ivrs_ioapic",		parse_ivrs_ioapic);
 __setup("ivrs_hpet",		parse_ivrs_hpet);
 __setup("ivrs_acpihid",		parse_ivrs_acpihid);
 
-bool amd_iommu_v2_supported(void)
+bool amd_iommu_pasid_supported(void)
 {
 	/* CPU page table size should match IOMMU guest page table size */
 	if (cpu_feature_enabled(X86_FEATURE_LA57) &&
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 02/14] iommu/amd: Introduce per device DTE update function
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 01/14] iommu/amd: Rename amd_iommu_v2_supported() as amd_iommu_pasid_supported() Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:29   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 03/14] iommu/amd: Add support for enabling/disabling IOMMU features Vasant Hegde
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

Consolidate per device update and flush logic into separate function.
Also make it as global function as it will be used in subsequent series
to update the DTE.

Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  1 +
 drivers/iommu/amd/iommu.c     | 26 ++++++++++++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 334394d14753..24a04731ac8b 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -64,6 +64,7 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid);
 void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
 void amd_iommu_update_and_flush_device_table(struct protection_domain *domain);
 void amd_iommu_domain_update(struct protection_domain *domain);
+void amd_iommu_dev_update_dte(struct iommu_dev_data *dev_data, bool set);
 void amd_iommu_domain_flush_complete(struct protection_domain *domain);
 void amd_iommu_domain_flush_pages(struct protection_domain *domain,
 				  u64 address, size_t size);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 256b5507272f..68fd1251b981 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2000,6 +2000,21 @@ static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
 	amd_iommu_apply_erratum_63(iommu, devid);
 }
 
+/* Update and flush DTE for the given device */
+void amd_iommu_dev_update_dte(struct iommu_dev_data *dev_data, bool set)
+{
+	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev);
+
+	if (set)
+		set_dte_entry(iommu, dev_data);
+	else
+		clear_dte_entry(iommu, dev_data->devid);
+
+	clone_aliases(iommu, dev_data->dev);
+	device_flush_dte(dev_data);
+	iommu_completion_wait(iommu);
+}
+
 static int do_attach(struct iommu_dev_data *dev_data,
 		     struct protection_domain *domain)
 {
@@ -2039,10 +2054,7 @@ static int do_attach(struct iommu_dev_data *dev_data,
 	}
 
 	/* Update device table */
-	set_dte_entry(iommu, dev_data);
-	clone_aliases(iommu, dev_data->dev);
-
-	device_flush_dte(dev_data);
+	amd_iommu_dev_update_dte(dev_data, true);
 
 	return ret;
 }
@@ -2061,11 +2073,9 @@ static void do_detach(struct iommu_dev_data *dev_data)
 	/* Update data structures */
 	dev_data->domain = NULL;
 	list_del(&dev_data->list);
-	clear_dte_entry(iommu, dev_data->devid);
-	clone_aliases(iommu, dev_data->dev);
 
-	/* Flush the DTE entry */
-	device_flush_dte(dev_data);
+	/* Clear DTE and flush the entry */
+	amd_iommu_dev_update_dte(dev_data, false);
 
 	/* Flush IOTLB and wait for the flushes to finish */
 	amd_iommu_domain_flush_all(domain);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 03/14] iommu/amd: Add support for enabling/disabling IOMMU features
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 01/14] iommu/amd: Rename amd_iommu_v2_supported() as amd_iommu_pasid_supported() Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 02/14] iommu/amd: Introduce per device DTE update function Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 04/14] iommu/amd: Move PPR-related functions into ppr.c Vasant Hegde
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde,
	Jason Gunthorpe

From: Wei Huang <wei.huang2@amd.com>

Add support for struct iommu_ops.dev_{enable/disable}_feat. Please note
that the empty feature switches will be populated by subsequent patches.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/iommu.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 68fd1251b981..ca4dd90c875d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2799,6 +2799,32 @@ static const struct iommu_dirty_ops amd_dirty_ops = {
 	.read_and_clear_dirty = amd_iommu_read_and_clear_dirty,
 };
 
+static int amd_iommu_dev_enable_feature(struct device *dev,
+					enum iommu_dev_features feat)
+{
+	int ret;
+
+	switch (feat) {
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
+static int amd_iommu_dev_disable_feature(struct device *dev,
+					 enum iommu_dev_features feat)
+{
+	int ret;
+
+	switch (feat) {
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+
 const struct iommu_ops amd_iommu_ops = {
 	.capable = amd_iommu_capable,
 	.domain_alloc = amd_iommu_domain_alloc,
@@ -2811,6 +2837,8 @@ const struct iommu_ops amd_iommu_ops = {
 	.is_attach_deferred = amd_iommu_is_attach_deferred,
 	.pgsize_bitmap	= AMD_IOMMU_PGSIZES,
 	.def_domain_type = amd_iommu_def_domain_type,
+	.dev_enable_feat = amd_iommu_dev_enable_feature,
+	.dev_disable_feat = amd_iommu_dev_disable_feature,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= amd_iommu_attach_device,
 		.map_pages	= amd_iommu_map_pages,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 04/14] iommu/amd: Move PPR-related functions into ppr.c
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (2 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 03/14] iommu/amd: Add support for enabling/disabling IOMMU features Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:29   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 05/14] iommu/amd: Fix PPR interrupt processing logic Vasant Hegde
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

In preparation to subsequent PPR-related patches, and also remove static
declaration for certain helper functions so that it can be reused in other
files.

Also rename below functions:
  alloc_ppr_log        -> amd_iommu_alloc_ppr_log
  iommu_enable_ppr_log -> amd_iommu_enable_ppr_log
  free_ppr_log         -> amd_iommu_free_ppr_log
  iommu_poll_ppr_log   -> amd_iommu_poll_ppr_log

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/Makefile    |   2 +-
 drivers/iommu/amd/amd_iommu.h |  17 +++++-
 drivers/iommu/amd/init.c      |  65 +++-----------------
 drivers/iommu/amd/iommu.c     |  55 +----------------
 drivers/iommu/amd/ppr.c       | 112 ++++++++++++++++++++++++++++++++++
 5 files changed, 137 insertions(+), 114 deletions(-)
 create mode 100644 drivers/iommu/amd/ppr.c

diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index f454fbb1569e..93b11b6d764f 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 24a04731ac8b..353d59111cb2 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -17,10 +17,16 @@ irqreturn_t amd_iommu_int_thread_pprlog(int irq, void *data);
 irqreturn_t amd_iommu_int_thread_galog(int irq, void *data);
 irqreturn_t amd_iommu_int_handler(int irq, void *data);
 void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
+void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type,
+			   u8 cntrl_intr, u8 cntrl_log,
+			   u32 status_run_mask, u32 status_overflow_mask);
 void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 void amd_iommu_restart_ga_log(struct amd_iommu *iommu);
 void amd_iommu_restart_ppr_log(struct amd_iommu *iommu);
 void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
+void iommu_feature_enable(struct amd_iommu *iommu, u8 bit);
+void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
+				  gfp_t gfp, size_t size);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
@@ -57,6 +63,14 @@ int amd_iommu_set_gcr3(struct iommu_dev_data *dev_data,
 		       ioasid_t pasid, unsigned long gcr3);
 int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid);
 
+/* PPR */
+int __init amd_iommu_alloc_ppr_log(struct amd_iommu *iommu);
+void __init amd_iommu_free_ppr_log(struct amd_iommu *iommu);
+void amd_iommu_enable_ppr_log(struct amd_iommu *iommu);
+void amd_iommu_poll_ppr_log(struct amd_iommu *iommu);
+int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid,
+			   int status, int tag);
+
 /*
  * This function flushes all internal caches of
  * the IOMMU used by this driver.
@@ -86,9 +100,6 @@ static inline int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 #define PPR_INVALID			0x1
 #define PPR_FAILURE			0xf
 
-int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid,
-			   int status, int tag);
-
 static inline bool is_rd890_iommu(struct pci_dev *pdev)
 {
 	return (pdev->vendor == PCI_VENDOR_ID_ATI) &&
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 62f695c519b5..d25a706adcfc 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -418,7 +418,7 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
 }
 
 /* Generic functions to enable/disable certain features of the IOMMU. */
-static void iommu_feature_enable(struct amd_iommu *iommu, u8 bit)
+void iommu_feature_enable(struct amd_iommu *iommu, u8 bit)
 {
 	u64 ctrl;
 
@@ -745,9 +745,9 @@ static int __init alloc_command_buffer(struct amd_iommu *iommu)
  * Interrupt handler has processed all pending events and adjusted head
  * and tail pointer. Reset overflow mask and restart logging again.
  */
-static void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type,
-				  u8 cntrl_intr, u8 cntrl_log,
-				  u32 status_run_mask, u32 status_overflow_mask)
+void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type,
+			   u8 cntrl_intr, u8 cntrl_log,
+			   u32 status_run_mask, u32 status_overflow_mask)
 {
 	u32 status;
 
@@ -788,17 +788,6 @@ void amd_iommu_restart_ga_log(struct amd_iommu *iommu)
 			      MMIO_STATUS_GALOG_OVERFLOW_MASK);
 }
 
-/*
- * This function restarts ppr logging in case the IOMMU experienced
- * PPR log overflow.
- */
-void amd_iommu_restart_ppr_log(struct amd_iommu *iommu)
-{
-	amd_iommu_restart_log(iommu, "PPR", CONTROL_PPRINT_EN,
-			      CONTROL_PPRLOG_EN, MMIO_STATUS_PPR_RUN_MASK,
-			      MMIO_STATUS_PPR_OVERFLOW_MASK);
-}
-
 /*
  * This function resets the command buffer if the IOMMU stopped fetching
  * commands from it.
@@ -847,8 +836,8 @@ static void __init free_command_buffer(struct amd_iommu *iommu)
 	free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
 }
 
-static void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
-					 gfp_t gfp, size_t size)
+void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp,
+				  size_t size)
 {
 	int order = get_order(size);
 	void *buf = (void *)__get_free_pages(gfp, order);
@@ -903,42 +892,6 @@ static void __init free_event_buffer(struct amd_iommu *iommu)
 	free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
 }
 
-/* allocates the memory where the IOMMU will log its events to */
-static int __init alloc_ppr_log(struct amd_iommu *iommu)
-{
-	iommu->ppr_log = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO,
-					      PPR_LOG_SIZE);
-
-	return iommu->ppr_log ? 0 : -ENOMEM;
-}
-
-static void iommu_enable_ppr_log(struct amd_iommu *iommu)
-{
-	u64 entry;
-
-	if (iommu->ppr_log == NULL)
-		return;
-
-	iommu_feature_enable(iommu, CONTROL_PPR_EN);
-
-	entry = iommu_virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512;
-
-	memcpy_toio(iommu->mmio_base + MMIO_PPR_LOG_OFFSET,
-		    &entry, sizeof(entry));
-
-	/* set head and tail to zero manually */
-	writel(0x00, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
-	writel(0x00, iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
-
-	iommu_feature_enable(iommu, CONTROL_PPRLOG_EN);
-	iommu_feature_enable(iommu, CONTROL_PPRINT_EN);
-}
-
-static void __init free_ppr_log(struct amd_iommu *iommu)
-{
-	free_pages((unsigned long)iommu->ppr_log, get_order(PPR_LOG_SIZE));
-}
-
 static void free_ga_log(struct amd_iommu *iommu)
 {
 #ifdef CONFIG_IRQ_REMAP
@@ -1682,7 +1635,7 @@ static void __init free_iommu_one(struct amd_iommu *iommu)
 	free_cwwb_sem(iommu);
 	free_command_buffer(iommu);
 	free_event_buffer(iommu);
-	free_ppr_log(iommu);
+	amd_iommu_free_ppr_log(iommu);
 	free_ga_log(iommu);
 	iommu_unmap_mmio_space(iommu);
 }
@@ -2095,7 +2048,7 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
 			amd_iommu_max_glx_val = min(amd_iommu_max_glx_val, glxval);
 	}
 
-	if (check_feature(FEATURE_PPR) && alloc_ppr_log(iommu))
+	if (check_feature(FEATURE_PPR) && amd_iommu_alloc_ppr_log(iommu))
 		return -ENOMEM;
 
 	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
@@ -2841,7 +2794,7 @@ static void enable_iommus_v2(void)
 	struct amd_iommu *iommu;
 
 	for_each_iommu(iommu)
-		iommu_enable_ppr_log(iommu);
+		amd_iommu_enable_ppr_log(iommu);
 }
 
 static void enable_iommus_vapic(void)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ca4dd90c875d..405609bd9290 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -830,59 +830,6 @@ static void iommu_poll_events(struct amd_iommu *iommu)
 	writel(head, iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
 }
 
-static void iommu_poll_ppr_log(struct amd_iommu *iommu)
-{
-	u32 head, tail;
-
-	if (iommu->ppr_log == NULL)
-		return;
-
-	head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
-	tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
-
-	while (head != tail) {
-		volatile u64 *raw;
-		u64 entry[2];
-		int i;
-
-		raw = (u64 *)(iommu->ppr_log + head);
-
-		/*
-		 * Hardware bug: Interrupt may arrive before the entry is
-		 * written to memory. If this happens we need to wait for the
-		 * entry to arrive.
-		 */
-		for (i = 0; i < LOOP_TIMEOUT; ++i) {
-			if (PPR_REQ_TYPE(raw[0]) != 0)
-				break;
-			udelay(1);
-		}
-
-		/* Avoid memcpy function-call overhead */
-		entry[0] = raw[0];
-		entry[1] = raw[1];
-
-		/*
-		 * To detect the hardware errata 733 we need to clear the
-		 * entry back to zero. This issue does not exist on SNP
-		 * enabled system. Also this buffer is not writeable on
-		 * SNP enabled system.
-		 */
-		if (!amd_iommu_snp_en)
-			raw[0] = raw[1] = 0UL;
-
-		/* Update head pointer of hardware ring-buffer */
-		head = (head + PPR_ENTRY_SIZE) % PPR_LOG_SIZE;
-		writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
-
-		/* TODO: PPR Handler will be added when we add IOPF support */
-
-		/* Refresh ring-buffer information */
-		head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
-		tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
-	}
-}
-
 #ifdef CONFIG_IRQ_REMAP
 static int (*iommu_ga_log_notifier)(u32);
 
@@ -1003,7 +950,7 @@ irqreturn_t amd_iommu_int_thread_pprlog(int irq, void *data)
 {
 	amd_iommu_handle_irq(data, "PPR", MMIO_STATUS_PPR_INT_MASK,
 			     MMIO_STATUS_PPR_OVERFLOW_MASK,
-			     iommu_poll_ppr_log, amd_iommu_restart_ppr_log);
+			     amd_iommu_poll_ppr_log, amd_iommu_restart_ppr_log);
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
new file mode 100644
index 000000000000..673fcc30f9dc
--- /dev/null
+++ b/drivers/iommu/amd/ppr.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ */
+
+#define pr_fmt(fmt)     "AMD-Vi: " fmt
+#define dev_fmt(fmt)    pr_fmt(fmt)
+
+#include <linux/amd-iommu.h>
+#include <linux/delay.h>
+#include <linux/mmu_notifier.h>
+
+#include "amd_iommu.h"
+#include "amd_iommu_types.h"
+
+int __init amd_iommu_alloc_ppr_log(struct amd_iommu *iommu)
+{
+	iommu->ppr_log = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO,
+					      PPR_LOG_SIZE);
+	return iommu->ppr_log ? 0 : -ENOMEM;
+}
+
+void amd_iommu_enable_ppr_log(struct amd_iommu *iommu)
+{
+	u64 entry;
+
+	if (iommu->ppr_log == NULL)
+		return;
+
+	iommu_feature_enable(iommu, CONTROL_PPR_EN);
+
+	entry = iommu_virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512;
+
+	memcpy_toio(iommu->mmio_base + MMIO_PPR_LOG_OFFSET,
+		    &entry, sizeof(entry));
+
+	/* set head and tail to zero manually */
+	writel(0x00, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
+	writel(0x00, iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
+
+	iommu_feature_enable(iommu, CONTROL_PPRINT_EN);
+	iommu_feature_enable(iommu, CONTROL_PPRLOG_EN);
+}
+
+void __init amd_iommu_free_ppr_log(struct amd_iommu *iommu)
+{
+	free_pages((unsigned long)iommu->ppr_log, get_order(PPR_LOG_SIZE));
+}
+
+/*
+ * This function restarts ppr logging in case the IOMMU experienced
+ * PPR log overflow.
+ */
+void amd_iommu_restart_ppr_log(struct amd_iommu *iommu)
+{
+	amd_iommu_restart_log(iommu, "PPR", CONTROL_PPRINT_EN,
+			      CONTROL_PPRLOG_EN, MMIO_STATUS_PPR_RUN_MASK,
+			      MMIO_STATUS_PPR_OVERFLOW_MASK);
+}
+
+void amd_iommu_poll_ppr_log(struct amd_iommu *iommu)
+{
+	u32 head, tail;
+
+	if (iommu->ppr_log == NULL)
+		return;
+
+	head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
+	tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
+
+	while (head != tail) {
+		volatile u64 *raw;
+		u64 entry[2];
+		int i;
+
+		raw = (u64 *)(iommu->ppr_log + head);
+
+		/*
+		 * Hardware bug: Interrupt may arrive before the entry is
+		 * written to memory. If this happens we need to wait for the
+		 * entry to arrive.
+		 */
+		for (i = 0; i < LOOP_TIMEOUT; ++i) {
+			if (PPR_REQ_TYPE(raw[0]) != 0)
+				break;
+			udelay(1);
+		}
+
+		/* Avoid memcpy function-call overhead */
+		entry[0] = raw[0];
+		entry[1] = raw[1];
+
+		/*
+		 * To detect the hardware errata 733 we need to clear the
+		 * entry back to zero. This issue does not exist on SNP
+		 * enabled system. Also this buffer is not writeable on
+		 * SNP enabled system.
+		 */
+		if (!amd_iommu_snp_en)
+			raw[0] = raw[1] = 0UL;
+
+		/* Update head pointer of hardware ring-buffer */
+		head = (head + PPR_ENTRY_SIZE) % PPR_LOG_SIZE;
+		writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
+
+		/* TODO: PPR Handler will be added when we add IOPF support */
+
+		/* Refresh ring-buffer information */
+		head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
+		tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
+	}
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 05/14] iommu/amd: Fix PPR interrupt processing logic
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (3 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 04/14] iommu/amd: Move PPR-related functions into ppr.c Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:30   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 06/14] iommu/amd: Define per-IOMMU iopf_queue Vasant Hegde
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde,
	Tom Lendacky

* Do not re-read ppr head pointer as its just updated by the driver.

* Do not read PPR buffer tail pointer inside while loop. If IOMMU
  generates PPR events continuously then completing interrupt processing
  takes long time. In worst case it may cause infinite loop.

Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/ppr.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index 673fcc30f9dc..d43a616c0c36 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -104,9 +104,5 @@ void amd_iommu_poll_ppr_log(struct amd_iommu *iommu)
 		writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
 
 		/* TODO: PPR Handler will be added when we add IOPF support */
-
-		/* Refresh ring-buffer information */
-		head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
-		tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);
 	}
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 06/14] iommu/amd: Define per-IOMMU iopf_queue
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (4 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 05/14] iommu/amd: Fix PPR interrupt processing logic Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:30   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 07/14] iommu/amd: Add support for page response Vasant Hegde
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

AMD IOMMU hardware supports PCI Peripheral Paging Request (PPR) using
a PPR log, which is a circular buffer containing requests from downstream
end-point devices.

There is one PPR log per IOMMU instance. Therefore, allocate an iopf_queue
per IOMMU instance during driver initialization, and free the queue during
driver deinitialization.

Also rename enable_iommus_v2() -> enable_iommus_ppr() to reflect its
usage.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu.h       |  5 ++++
 drivers/iommu/amd/amd_iommu_types.h |  4 +++
 drivers/iommu/amd/init.c            | 12 ++++++---
 drivers/iommu/amd/ppr.c             | 42 +++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 353d59111cb2..f9e921cf97ca 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -45,6 +45,11 @@ extern enum io_pgtable_fmt amd_iommu_pgtable;
 extern int amd_iommu_gpt_level;
 
 bool amd_iommu_pasid_supported(void);
+
+/* IOPF */
+int amd_iommu_iopf_init(struct amd_iommu *iommu);
+void amd_iommu_iopf_uninit(struct amd_iommu *iommu);
+
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 u8 amd_iommu_pc_get_max_banks(unsigned int idx);
 bool amd_iommu_pc_supported(void);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index d5786c33aee6..f3a0c3756d0d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -770,6 +770,10 @@ struct amd_iommu {
 	/* DebugFS Info */
 	struct dentry *debugfs;
 #endif
+
+	/* IOPF support */
+	struct iopf_queue *iopf_queue;
+	unsigned char iopfq_name[32];
 };
 
 static inline struct amd_iommu *dev_to_amd_iommu(struct device *dev)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d25a706adcfc..426db9ea3718 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1638,6 +1638,7 @@ static void __init free_iommu_one(struct amd_iommu *iommu)
 	amd_iommu_free_ppr_log(iommu);
 	free_ga_log(iommu);
 	iommu_unmap_mmio_space(iommu);
+	amd_iommu_iopf_uninit(iommu);
 }
 
 static void __init free_iommu_all(void)
@@ -2789,12 +2790,17 @@ static void early_enable_iommus(void)
 	}
 }
 
-static void enable_iommus_v2(void)
+static void enable_iommus_ppr(void)
 {
 	struct amd_iommu *iommu;
 
-	for_each_iommu(iommu)
+	if (!amd_iommu_gt_ppr_supported())
+		return;
+
+	for_each_iommu(iommu) {
 		amd_iommu_enable_ppr_log(iommu);
+		amd_iommu_iopf_init(iommu);
+	}
 }
 
 static void enable_iommus_vapic(void)
@@ -3130,7 +3136,7 @@ static int amd_iommu_enable_interrupts(void)
 	 * PPR and GA log interrupt for all IOMMUs.
 	 */
 	enable_iommus_vapic();
-	enable_iommus_v2();
+	enable_iommus_ppr();
 
 out:
 	return ret;
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index d43a616c0c36..1eb0b737afb5 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -106,3 +106,45 @@ void amd_iommu_poll_ppr_log(struct amd_iommu *iommu)
 		/* TODO: PPR Handler will be added when we add IOPF support */
 	}
 }
+
+/**************************************************************
+ *
+ * IOPF handling stuff
+ */
+
+/* Setup per-IOMMU IOPF queue if not exist. */
+int amd_iommu_iopf_init(struct amd_iommu *iommu)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	raw_spin_lock_irqsave(&iommu->lock, flags);
+
+	if (iommu->iopf_queue)
+		goto out;
+
+	snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name),
+		 "amdiommu-%#x-iopfq",
+		 PCI_SEG_DEVID_TO_SBDF(iommu->pci_seg->id, iommu->devid));
+
+	iommu->iopf_queue = iopf_queue_alloc(iommu->iopfq_name);
+	if (!iommu->iopf_queue)
+		ret = -ENOMEM;
+
+out:
+	raw_spin_unlock_irqrestore(&iommu->lock, flags);
+	return ret;
+}
+
+/* Destroy per-IOMMU IOPF queue if no longer needed. */
+void amd_iommu_iopf_uninit(struct amd_iommu *iommu)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&iommu->lock, flags);
+
+	iopf_queue_free(iommu->iopf_queue);
+	iommu->iopf_queue = NULL;
+
+	raw_spin_unlock_irqrestore(&iommu->lock, flags);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 07/14] iommu/amd: Add support for page response
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (5 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 06/14] iommu/amd: Define per-IOMMU iopf_queue Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-01 20:20   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF Vasant Hegde
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

This generates AMD IOMMU COMPLETE_PPR_REQUEST for the specified device
with the specified PRI Response Code.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  3 +++
 drivers/iommu/amd/iommu.c     |  1 +
 drivers/iommu/amd/ppr.c       | 15 +++++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f9e921cf97ca..c8deb22955d7 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -49,6 +49,9 @@ bool amd_iommu_pasid_supported(void);
 /* IOPF */
 int amd_iommu_iopf_init(struct amd_iommu *iommu);
 void amd_iommu_iopf_uninit(struct amd_iommu *iommu);
+int amd_iommu_page_response(struct device *dev,
+			    struct iopf_fault *evt,
+			    struct iommu_page_response *resp);
 
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 u8 amd_iommu_pc_get_max_banks(unsigned int idx);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 405609bd9290..d300456e8984 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2786,6 +2786,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.def_domain_type = amd_iommu_def_domain_type,
 	.dev_enable_feat = amd_iommu_dev_enable_feature,
 	.dev_disable_feat = amd_iommu_dev_disable_feature,
+	.page_response = amd_iommu_page_response,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= amd_iommu_attach_device,
 		.map_pages	= amd_iommu_map_pages,
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index 1eb0b737afb5..b1751eb495fd 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -148,3 +148,18 @@ void amd_iommu_iopf_uninit(struct amd_iommu *iommu)
 
 	raw_spin_unlock_irqrestore(&iommu->lock, flags);
 }
+
+int amd_iommu_page_response(struct device *dev,
+			    struct iopf_fault *evt,
+			    struct iommu_page_response *resp)
+{
+	struct pci_dev *pdev;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	pdev = to_pci_dev(dev);
+
+	return amd_iommu_complete_ppr(pdev, resp->pasid, resp->code,
+				      resp->grpid);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (6 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 07/14] iommu/amd: Add support for page response Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-01 21:46   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 09/14] iommu/amd: Add IO page fault notifier handler Vasant Hegde
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Wei Huang <wei.huang2@amd.com>

Which adds/removes the device to the corresponding per-IOMMU iopf_queue.
It also registers device fault handler. These interfaces are called
from IOPF feature enable/disable path.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  2 ++
 drivers/iommu/amd/ppr.c       | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index c8deb22955d7..ce23ce1a1a16 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -49,6 +49,8 @@ bool amd_iommu_pasid_supported(void);
 /* IOPF */
 int amd_iommu_iopf_init(struct amd_iommu *iommu);
 void amd_iommu_iopf_uninit(struct amd_iommu *iommu);
+int amd_iommu_iopf_add_device(struct amd_iommu *iommu, struct device *dev);
+int amd_iommu_iopf_remove_device(struct amd_iommu *iommu, struct device *dev);
 int amd_iommu_page_response(struct device *dev,
 			    struct iopf_fault *evt,
 			    struct iommu_page_response *resp);
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index b1751eb495fd..6b1fdda95514 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -163,3 +163,38 @@ int amd_iommu_page_response(struct device *dev,
 	return amd_iommu_complete_ppr(pdev, resp->pasid, resp->code,
 				      resp->grpid);
 }
+
+int amd_iommu_iopf_add_device(struct amd_iommu *iommu, struct device *dev)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	raw_spin_lock_irqsave(&iommu->lock, flags);
+
+	if (!iommu->iopf_queue) {
+		raw_spin_unlock_irqrestore(&iommu->lock, flags);
+		return ret;
+	}
+
+	ret = iopf_queue_add_device(iommu->iopf_queue, dev);
+
+	raw_spin_unlock_irqrestore(&iommu->lock, flags);
+	return ret;
+}
+
+int amd_iommu_iopf_remove_device(struct amd_iommu *iommu, struct device *dev)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&iommu->lock, flags);
+
+	if (!iommu->iopf_queue) {
+		raw_spin_unlock_irqrestore(&iommu->lock, flags);
+		return -EINVAL;
+	}
+
+	iopf_queue_remove_device(iommu->iopf_queue, dev);
+
+	raw_spin_unlock_irqrestore(&iommu->lock, flags);
+	return 0;
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 09/14] iommu/amd: Add IO page fault notifier handler
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (7 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF Vasant Hegde
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Wei Huang <wei.huang2@amd.com>

Whenever there is a page fault IOMMU logs entry to ppr log and sends
interrupt to host. We have to handle the page fault and respond to IOMMU.

Add support to validate page fault request and hook it to core iommu
page fault handler.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  8 +++
 drivers/iommu/amd/ppr.c             | 99 ++++++++++++++++++++++++++++-
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index f3a0c3756d0d..feafa868bcbd 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -251,6 +251,14 @@
 #define PPR_ENTRY_SIZE		16
 #define PPR_LOG_SIZE		(PPR_ENTRY_SIZE * PPR_LOG_ENTRIES)
 
+/* PAGE_SERVICE_REQUEST PPR Log Buffer Entry flags */
+#define PPR_FLAG_EXEC		0x002	/* Execute permission requested */
+#define PPR_FLAG_READ		0x004	/* Read permission requested */
+#define PPR_FLAG_WRITE		0x020	/* Write permission requested */
+#define PPR_FLAG_US		0x040	/* 1: User, 0: Supervisor */
+#define PPR_FLAG_RVSD		0x080	/* Reserved bit not zero */
+#define PPR_FLAG_GN		0x100	/* GVA and PASID is valid */
+
 #define PPR_REQ_TYPE(x)		(((x) >> 60) & 0xfULL)
 #define PPR_FLAGS(x)		(((x) >> 48) & 0xfffULL)
 #define PPR_DEVID(x)		((x) & 0xffffULL)
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index 6b1fdda95514..02d912b9f618 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -58,6 +58,102 @@ void amd_iommu_restart_ppr_log(struct amd_iommu *iommu)
 			      MMIO_STATUS_PPR_OVERFLOW_MASK);
 }
 
+static inline u32 ppr_flag_to_fault_perm(u16 flag)
+{
+	int perm = 0;
+
+	if (flag & PPR_FLAG_READ)
+		perm |= IOMMU_FAULT_PERM_READ;
+	if (flag & PPR_FLAG_WRITE)
+		perm |= IOMMU_FAULT_PERM_WRITE;
+	if (flag & PPR_FLAG_EXEC)
+		perm |= IOMMU_FAULT_PERM_EXEC;
+	if (!(flag & PPR_FLAG_US))
+		perm |= IOMMU_FAULT_PERM_PRIV;
+
+	return perm;
+}
+
+static bool ppr_is_valid(struct amd_iommu *iommu, u64 *raw)
+{
+	struct device *dev = iommu->iommu.dev;
+	u16 devid = PPR_DEVID(raw[0]);
+
+	if (!(PPR_FLAGS(raw[0]) & PPR_FLAG_GN)) {
+		dev_dbg(dev, "PPR logged [Request ignored due to GN=0 (device=%04x:%02x:%02x.%x "
+			"pasid=0x%05llx address=0x%llx flags=0x%04llx tag=0x%03llx]\n",
+			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+			PPR_PASID(raw[0]), raw[1], PPR_FLAGS(raw[0]), PPR_TAG(raw[0]));
+		return false;
+	}
+
+	if (PPR_FLAGS(raw[0]) & PPR_FLAG_RVSD) {
+		dev_dbg(dev, "PPR logged [Invalid request format (device=%04x:%02x:%02x.%x "
+			"pasid=0x%05llx address=0x%llx flags=0x%04llx tag=0x%03llx]\n",
+			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+			PPR_PASID(raw[0]), raw[1], PPR_FLAGS(raw[0]), PPR_TAG(raw[0]));
+		return false;
+	}
+
+	return true;
+}
+
+static void iommu_call_iopf_notifier(struct amd_iommu *iommu, u64 *raw)
+{
+	struct iopf_fault event;
+	struct pci_dev *pdev;
+	int ret = -EINVAL;
+	u16 devid = PPR_DEVID(raw[0]);
+
+	if (PPR_REQ_TYPE(raw[0]) != PPR_REQ_FAULT) {
+		pr_info_ratelimited("Unknown PPR request received\n");
+		return;
+	}
+
+	pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+					   PCI_BUS_NUM(devid), devid & 0xff);
+	if (!pdev)
+		return;
+
+	if (!ppr_is_valid(iommu, raw))
+		goto out;
+
+	memset(&event, 0, sizeof(struct iopf_fault));
+
+	event.fault.type = IOMMU_FAULT_PAGE_REQ;
+	event.fault.prm.perm = ppr_flag_to_fault_perm(PPR_FLAGS(raw[0]));
+	event.fault.prm.addr = (u64)(raw[1] & PAGE_MASK);
+	event.fault.prm.pasid = PPR_PASID(raw[0]);
+	event.fault.prm.grpid = PPR_TAG(raw[0]) & 0x1FF;
+
+	/*
+	 * PASID zero is used for requests from the I/O device without
+	 * a PASID
+	 */
+	if (event.fault.prm.pasid == 0 ||
+	    event.fault.prm.pasid >= pdev->dev.iommu->max_pasids) {
+		pr_info_ratelimited("Invalid PASID : 0x%x, device : 0x%x\n",
+				    event.fault.prm.pasid, pdev->dev.id);
+		goto out;
+	}
+
+	event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+	event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	if (PPR_TAG(raw[0]) & 0x200)
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+
+	/* Submit event */
+	ret = iommu_report_device_fault(&pdev->dev, &event);
+
+out:
+	if (ret) {
+		/* Nobody cared, abort */
+		amd_iommu_complete_ppr(pdev, PPR_PASID(raw[0]),
+				       IOMMU_PAGE_RESP_FAILURE,
+				       PPR_TAG(raw[0]) & 0x1FF);
+	}
+}
+
 void amd_iommu_poll_ppr_log(struct amd_iommu *iommu)
 {
 	u32 head, tail;
@@ -103,7 +199,8 @@ void amd_iommu_poll_ppr_log(struct amd_iommu *iommu)
 		head = (head + PPR_ENTRY_SIZE) % PPR_LOG_SIZE;
 		writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
 
-		/* TODO: PPR Handler will be added when we add IOPF support */
+		/* Handle PPR entry */
+		iommu_call_iopf_notifier(iommu, entry);
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (8 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 09/14] iommu/amd: Add IO page fault notifier handler Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-01 21:49   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function Vasant Hegde
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Add support to enable/disable IOPF feature. So that PRI capable device
can make use of page fault handler in SVA mode.

Note that enable_feature(IOPF) just returns success, as this interface
is going away. IOPF handler will be setup in PASID bind path. Following
patch will add this support.

Also add IOMMU_IOPF as dependency to AMD_IOMMU driver.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/Kconfig     |  1 +
 drivers/iommu/amd/amd_iommu.h |  2 ++
 drivers/iommu/amd/iommu.c     |  8 +++--
 drivers/iommu/amd/ppr.c       | 57 +++++++++++++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index 443b2c13c37b..d563f6d496ca 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -10,6 +10,7 @@ config AMD_IOMMU
 	select IOMMU_API
 	select IOMMU_IOVA
 	select IOMMU_IO_PGTABLE
+	select IOMMU_IOPF
 	select IOMMUFD_DRIVER if IOMMUFD
 	depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
 	help
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index ce23ce1a1a16..d5e6315ef9dd 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -54,6 +54,8 @@ int amd_iommu_iopf_remove_device(struct amd_iommu *iommu, struct device *dev);
 int amd_iommu_page_response(struct device *dev,
 			    struct iopf_fault *evt,
 			    struct iommu_page_response *resp);
+int amd_iommu_iopf_enable_device(struct device *dev);
+int amd_iommu_iopf_disable_device(struct device *dev);
 
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 u8 amd_iommu_pc_get_max_banks(unsigned int idx);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d300456e8984..36a5458a7946 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2749,9 +2749,11 @@ static const struct iommu_dirty_ops amd_dirty_ops = {
 static int amd_iommu_dev_enable_feature(struct device *dev,
 					enum iommu_dev_features feat)
 {
-	int ret;
+	int ret = 0;
 
 	switch (feat) {
+	case IOMMU_DEV_FEAT_IOPF:
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -2762,9 +2764,11 @@ static int amd_iommu_dev_enable_feature(struct device *dev,
 static int amd_iommu_dev_disable_feature(struct device *dev,
 					 enum iommu_dev_features feat)
 {
-	int ret;
+	int ret = 0;
 
 	switch (feat) {
+	case IOMMU_DEV_FEAT_IOPF:
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
index 02d912b9f618..e1b3d579772f 100644
--- a/drivers/iommu/amd/ppr.c
+++ b/drivers/iommu/amd/ppr.c
@@ -295,3 +295,60 @@ int amd_iommu_iopf_remove_device(struct amd_iommu *iommu, struct device *dev)
 	raw_spin_unlock_irqrestore(&iommu->lock, flags);
 	return 0;
 }
+
+static int iopf_update_device(struct device *dev, bool enable)
+{
+	struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL;
+	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+	struct protection_domain *pdom = dev_data->domain;
+	int ret = 0;
+
+	if (!pdev || !pdom)
+		return -EINVAL;
+
+	spin_lock(&dev_data->lock);
+
+	if (enable) {
+		if (dev_data->ppr)
+			goto out;
+
+		ret = amd_iommu_iopf_add_device(iommu, dev);
+		if (ret)
+			goto out;
+
+		dev_data->ppr = true;
+	} else {
+		ret = amd_iommu_iopf_remove_device(iommu, dev);
+		dev_data->ppr = false;
+	}
+
+	amd_iommu_dev_update_dte(dev_data, true);
+
+out:
+	spin_unlock(&dev_data->lock);
+	return ret;
+}
+
+int amd_iommu_iopf_enable_device(struct device *dev)
+{
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+
+	if (!dev_data)
+		return -EINVAL;
+
+	if (!dev_data->ats_enabled || !dev_data->pri_enabled)
+		return -EINVAL;
+
+	return iopf_update_device(dev, true);
+}
+
+int amd_iommu_iopf_disable_device(struct device *dev)
+{
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+
+	if (!dev_data || !dev_data->pri_enabled)
+		return -EINVAL;
+
+	return iopf_update_device(dev, false);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (9 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:17   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU Vasant Hegde
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

These functions are used in PASID to device bind path. In this path
it checks whether device GCR3 table is setup or not. If not it will
setup GCR3 table.

Also in attach device path change default GCR3 table to use MAX
supported PASIDs. Ideally it should use 1 level PASID table as its
using PASID zero only. But we don't have support to extend PASID table
yet. We will fix this later.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu.h |  3 ++
 drivers/iommu/amd/iommu.c     | 52 +++++++++++++++++++++++++++++++++--
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index d5e6315ef9dd..2d099c54b941 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -44,7 +44,10 @@ extern int amd_iommu_guest_ir;
 extern enum io_pgtable_fmt amd_iommu_pgtable;
 extern int amd_iommu_gpt_level;
 
+/* SVA/PASID */
 bool amd_iommu_pasid_supported(void);
+int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids);
+void amd_iommu_gcr3_uninit(struct iommu_dev_data *dev_data);
 
 /* IOPF */
 int amd_iommu_iopf_init(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 36a5458a7946..6f5900333946 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -93,6 +93,11 @@ static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
 	return (pdom && (pdom->pd_mode == PD_MODE_V2));
 }
 
+static inline bool pdom_is_in_pt_mode(struct protection_domain *pdom)
+{
+	return (pdom->domain.type == IOMMU_DOMAIN_IDENTITY);
+}
+
 /*
  * Allocate per device domain ID when using V2 page table
  */
@@ -1837,6 +1842,46 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
 	return ret;
 }
 
+int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids)
+{
+	struct protection_domain *pdom = dev_data->domain;
+	int ret = 0;
+
+	lockdep_assert_held(&dev_data->lock);
+
+	/*
+	 * We cannot support PASID w/ existing v1 page table in the same domain
+	 * since it will be nested. However, existing domain w/ v2 page table
+	 * can be used for PASID.
+	 */
+	if (!pdom_is_v2_pgtbl_mode(pdom) && !pdom_is_in_pt_mode(pdom))
+		return -EOPNOTSUPP;
+
+	/* Allocate GCR3 table */
+	if (pdom_is_in_pt_mode(pdom) && dev_data->gcr3_info.gcr3_tbl == NULL) {
+		ret = setup_gcr3_table(&dev_data->gcr3_info,
+				       dev_to_node(dev_data->dev), pasids);
+
+		/* Update device table */
+		amd_iommu_dev_update_dte(dev_data, true);
+	}
+
+	return ret;
+}
+
+void amd_iommu_gcr3_uninit(struct iommu_dev_data *dev_data)
+{
+	lockdep_assert_held(&dev_data->lock);
+
+	/* Free GCR3 table */
+	if (pdom_is_in_pt_mode(dev_data->domain)) {
+		free_gcr3_table(&dev_data->gcr3_info);
+
+		/* Update device table */
+		amd_iommu_dev_update_dte(dev_data, true);
+	}
+}
+
 static void set_dte_entry(struct amd_iommu *iommu,
 			  struct iommu_dev_data *dev_data)
 {
@@ -1986,9 +2031,12 @@ static int do_attach(struct iommu_dev_data *dev_data,
 
 	/* Init GCR3 table and update device table */
 	if (domain->pd_mode == PD_MODE_V2) {
-		/* By default, setup GCR3 table to support single PASID */
+		/*
+		 * By default, setup GCR3 table to support MAX PASIDs
+		 * supported by the IOMMU HW.
+		 */
 		ret = setup_gcr3_table(&dev_data->gcr3_info,
-				       dev_to_node(dev_data->dev), 1);
+				       dev_to_node(dev_data->dev), -1);
 		if (ret)
 			return ret;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (10 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:25   ` Jason Gunthorpe
  2024-01-18  7:33 ` [PATCH v5 13/14] iommu: Add ops->domain_alloc_sva() Vasant Hegde
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

This includes :
  - Add data structure to track per protection domain dev/pasid binding details
    protection_domain->dev_data_list will track attached list of
    dev_data/PASIDs.

  - Add support to check PASID is supported or not. Also setup gcr3
    table and if PRI is supported then enable IOPF.

  - Move 'to_pdomain()' to header file

  - Add iommu_sva_set_dev_pasid(). It will check whether PASID is supported
    or not. Also adds PASID to SVA protection domain list as well as to
    device GCR3 table.

  - Add iommu_ops.remove_dev_pasid support. It will unbind PASID from
    device. Also remove pasid data from protection domain.

  - Add IOMMU_SVA as dependency to AMD_IOMMU driver

For a given PASID, iommu_set_dev_pasid() will bind all devices to same
SVA protection domain (1 PASID : 1 SVA protection domain : N devices).
This protection domain is different from device protection domain (one
that's mapped in attach_device() path). IOMMU uses domain ID for caching,
invalidation, etc. In SVA mode it will use per-device-domain-ID. Hence in
invalidation path we retrieve domain ID from gcr3_info_table structure and
use that for invalidation.

Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/Kconfig           |   1 +
 drivers/iommu/amd/Makefile          |   2 +-
 drivers/iommu/amd/amd_iommu.h       |  11 ++
 drivers/iommu/amd/amd_iommu_types.h |  19 +++
 drivers/iommu/amd/iommu.c           |  14 +-
 drivers/iommu/amd/pasid.c           | 191 ++++++++++++++++++++++++++++
 6 files changed, 232 insertions(+), 6 deletions(-)
 create mode 100644 drivers/iommu/amd/pasid.c

diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index d563f6d496ca..68d8fc107cb9 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -10,6 +10,7 @@ config AMD_IOMMU
 	select IOMMU_API
 	select IOMMU_IOVA
 	select IOMMU_IO_PGTABLE
+	select IOMMU_SVA
 	select IOMMU_IOPF
 	select IOMMUFD_DRIVER if IOMMUFD
 	depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index 93b11b6d764f..9de33b2d42f5 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 2d099c54b941..1344b48caa46 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -44,6 +44,11 @@ extern int amd_iommu_guest_ir;
 extern enum io_pgtable_fmt amd_iommu_pgtable;
 extern int amd_iommu_gpt_level;
 
+/* Protection domain ops */
+int iommu_sva_set_dev_pasid(struct iommu_domain *domain,
+			    struct device *dev, ioasid_t pasid);
+void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid);
+
 /* SVA/PASID */
 bool amd_iommu_pasid_supported(void);
 int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids);
@@ -72,6 +77,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr,
 /* Device capabilities */
 int amd_iommu_pdev_enable_cap_pri(struct pci_dev *pdev);
 void amd_iommu_pdev_disable_cap_pri(struct pci_dev *pdev);
+bool amd_iommu_pdev_pri_supported(struct pci_dev *pdev);
 
 /* GCR3 setup */
 int amd_iommu_set_gcr3(struct iommu_dev_data *dev_data,
@@ -196,6 +202,11 @@ static inline struct amd_iommu *get_amd_iommu_from_dev_data(struct iommu_dev_dat
 	return iommu_get_iommu_dev(dev_data->dev, struct amd_iommu, iommu);
 }
 
+static inline struct protection_domain *to_pdomain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct protection_domain, domain);
+}
+
 bool translation_pre_enabled(struct amd_iommu *iommu);
 bool amd_iommu_is_attach_deferred(struct device *dev);
 int __init add_special_device(u8 type, u8 id, u32 *devid, bool cmd_line);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index feafa868bcbd..7609fc1c5aee 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -8,7 +8,9 @@
 #ifndef _ASM_X86_AMD_IOMMU_TYPES_H
 #define _ASM_X86_AMD_IOMMU_TYPES_H
 
+#include <linux/iommu.h>
 #include <linux/types.h>
+#include <linux/mmu_notifier.h>
 #include <linux/mutex.h>
 #include <linux/msi.h>
 #include <linux/list.h>
@@ -511,6 +513,11 @@ extern struct kmem_cache *amd_iommu_irq_cache;
 	list_for_each_entry((iommu), &amd_iommu_list, list)
 #define for_each_iommu_safe(iommu, next) \
 	list_for_each_entry_safe((iommu), (next), &amd_iommu_list, list)
+/* Making iterating over protection_domain->dev_data_list easier */
+#define for_each_pdom_dev_data(pdom_dev_data, pdom) \
+	list_for_each_entry(pdom_dev_data, &pdom->dev_data_list, list)
+#define for_each_pdom_dev_data_safe(pdom_dev_data, next, pdom) \
+	list_for_each_entry_safe((pdom_dev_data), (next), &pdom->dev_data_list, list)
 
 #define APERTURE_RANGE_SHIFT	27	/* 128 MB */
 #define APERTURE_RANGE_SIZE	(1ULL << APERTURE_RANGE_SHIFT)
@@ -560,6 +567,16 @@ enum protection_domain_mode {
 	PD_MODE_V2,
 };
 
+/* Track dev_data/PASID list for the protection domain */
+struct pdom_dev_data {
+	/* Points to attached device data */
+	struct iommu_dev_data *dev_data;
+	/* PASID attached to the protection domain */
+	ioasid_t pasid;
+	/* For protection_domain->dev_data_list */
+	struct list_head list;
+};
+
 /*
  * This structure contains generic data for  IOMMU protection domains
  * independent of their use.
@@ -576,6 +593,8 @@ struct protection_domain {
 	bool dirty_tracking;	/* dirty tracking is enabled in the domain */
 	unsigned dev_cnt;	/* devices assigned to this domain */
 	unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
+
+	struct list_head dev_data_list; /* List of pdom_dev_data */
 };
 
 /*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6f5900333946..703f2c081ed0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -196,11 +196,6 @@ static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
 	return __rlookup_amd_iommu(seg, PCI_SBDF_TO_DEVID(devid));
 }
 
-static struct protection_domain *to_pdomain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct protection_domain, domain);
-}
-
 static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
 {
 	struct iommu_dev_data *dev_data;
@@ -346,6 +341,13 @@ static inline bool pdev_pasid_supported(struct iommu_dev_data *dev_data)
 	return (dev_data->flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP);
 }
 
+inline bool amd_iommu_pdev_pri_supported(struct pci_dev *pdev)
+{
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
+
+	return (dev_data->flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP);
+}
+
 static u32 pdev_get_caps(struct pci_dev *pdev)
 {
 	int features;
@@ -2309,6 +2311,7 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 
 	spin_lock_init(&domain->lock);
 	INIT_LIST_HEAD(&domain->dev_list);
+	INIT_LIST_HEAD(&domain->dev_data_list);
 	domain->nid = NUMA_NO_NODE;
 
 	switch (type) {
@@ -2838,6 +2841,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.def_domain_type = amd_iommu_def_domain_type,
 	.dev_enable_feat = amd_iommu_dev_enable_feature,
 	.dev_disable_feat = amd_iommu_dev_disable_feature,
+	.remove_dev_pasid = amd_iommu_remove_dev_pasid,
 	.page_response = amd_iommu_page_response,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= amd_iommu_attach_device,
diff --git a/drivers/iommu/amd/pasid.c b/drivers/iommu/amd/pasid.c
new file mode 100644
index 000000000000..6b9d1917aa4c
--- /dev/null
+++ b/drivers/iommu/amd/pasid.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Advanced Micro Devices, Inc.
+ */
+
+#define pr_fmt(fmt)     "AMD-Vi: " fmt
+#define dev_fmt(fmt)    pr_fmt(fmt)
+
+#include <linux/iommu.h>
+#include <linux/mm_types.h>
+
+#include "amd_iommu.h"
+
+static inline bool is_gcr3_table_empty(struct iommu_dev_data *dev_data)
+{
+	return (dev_data->gcr3_info.pasid_cnt == 0);
+}
+
+static inline bool is_pasid_enabled(struct iommu_dev_data *dev_data)
+{
+	if (dev_data->gcr3_info.gcr3_tbl != NULL &&
+	    !is_gcr3_table_empty(dev_data)) {
+		return true;
+	}
+
+	return false;
+}
+
+static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
+{
+	struct device *dev = dev_data->dev;
+	int ret = 0;
+
+	spin_lock(&dev_data->lock);
+
+	if (is_pasid_enabled(dev_data))
+		goto out;
+
+	if (!amd_iommu_pasid_supported()) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	/* attach_device path enables device PASID feature */
+	if (!dev_data->pasid_enabled) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = amd_iommu_gcr3_init(dev_data, dev->iommu->max_pasids);
+
+out:
+	spin_unlock(&dev_data->lock);
+	return ret;
+}
+
+static void iommu_pasid_disable(struct iommu_dev_data *dev_data)
+{
+	spin_lock(&dev_data->lock);
+
+	if (!is_gcr3_table_empty(dev_data))
+		goto out;
+
+	if (dev_data->gcr3_info.gcr3_tbl == NULL)
+		goto out;
+
+	amd_iommu_gcr3_uninit(dev_data);
+
+out:
+	spin_unlock(&dev_data->lock);
+}
+
+static int iommu_setup_pasid_pri(struct iommu_dev_data *dev_data)
+{
+	struct pci_dev *pdev;
+	int ret;
+
+	if (is_pasid_enabled(dev_data))
+		return 0;
+
+	ret = iommu_pasid_enable(dev_data);
+	if (ret)
+		return ret;
+
+	pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
+	if (!pdev || !amd_iommu_pdev_pri_supported(pdev))
+		return 0;
+
+	if (!dev_data->pri_enabled)
+		return -EINVAL;
+
+	ret = amd_iommu_iopf_enable_device(dev_data->dev);
+
+	return ret;
+}
+
+static void remove_dev_pasid(struct pdom_dev_data *pdom_dev_data)
+{
+	/* Update GCR3 table and flush IOTLB */
+	amd_iommu_clear_gcr3(pdom_dev_data->dev_data, pdom_dev_data->pasid);
+
+	list_del(&pdom_dev_data->list);
+	kfree(pdom_dev_data);
+}
+
+/* Clear PASID from device GCR3 table and remove pdom_dev_data from list */
+static void remove_pdom_dev_pasid(struct protection_domain *pdom,
+				  struct device *dev, ioasid_t pasid)
+{
+	struct pdom_dev_data *pdom_dev_data;
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+
+	lockdep_assert_held(&pdom->lock);
+
+	for_each_pdom_dev_data(pdom_dev_data, pdom) {
+		if (pdom_dev_data->dev_data == dev_data &&
+		    pdom_dev_data->pasid == pasid) {
+			remove_dev_pasid(pdom_dev_data);
+			break;
+		}
+	}
+}
+
+int iommu_sva_set_dev_pasid(struct iommu_domain *domain,
+			    struct device *dev, ioasid_t pasid)
+{
+	struct pdom_dev_data *pdom_dev_data;
+	struct protection_domain *sva_pdom = to_pdomain(domain);
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	/* PASID zero is used for requests from the I/O device without PASID */
+	if (pasid == 0 || pasid >= dev->iommu->max_pasids)
+		return ret;
+
+	/* Make sure PASID/PRI is enabled */
+	ret = iommu_setup_pasid_pri(dev_data);
+	if (ret)
+		return ret;
+
+	/* Add PASID to protection domain pasid list */
+	pdom_dev_data = kzalloc(sizeof(*pdom_dev_data), GFP_KERNEL);
+	if (pdom_dev_data == NULL)
+		return ret;
+
+	pdom_dev_data->pasid = pasid;
+	pdom_dev_data->dev_data = dev_data;
+
+	/* Setup GCR3 table */
+	ret = amd_iommu_set_gcr3(dev_data, pasid,
+				 iommu_virt_to_phys(domain->mm->pgd));
+	if (ret) {
+		kfree(pdom_dev_data);
+		return ret;
+	}
+
+	spin_lock_irqsave(&sva_pdom->lock, flags);
+	list_add(&pdom_dev_data->list, &sva_pdom->dev_data_list);
+	spin_unlock_irqrestore(&sva_pdom->lock, flags);
+
+	return ret;
+}
+
+void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+	struct protection_domain *sva_pdom;
+	struct iommu_domain *domain;
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+	unsigned long flags;
+
+	if (pasid == 0 || pasid >= dev->iommu->max_pasids)
+		return;
+
+	/* Get protection domain */
+	domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA);
+	if (!domain)
+		return;
+	sva_pdom = to_pdomain(domain);
+
+	spin_lock_irqsave(&sva_pdom->lock, flags);
+
+	/* Remove PASID from dev_data_list */
+	remove_pdom_dev_pasid(sva_pdom, dev, pasid);
+
+	/* Remove GCR3 table */
+	if (is_gcr3_table_empty(dev_data))
+		iommu_pasid_disable(dev_data);
+
+	spin_unlock_irqrestore(&sva_pdom->lock, flags);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 13/14] iommu: Add ops->domain_alloc_sva()
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (11 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-01-18  7:33 ` [PATCH v5 14/14] iommu/amd: Add SVA domain support Vasant Hegde
  2024-01-18  7:40 ` [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
  14 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Jason Gunthorpe,
	Vasant Hegde, Tina Zhang

From: Jason Gunthorpe <jgg@nvidia.com>

Make a new op that receives the device and the mm_struct that the SVA
domain should be created for. Unlike domain_alloc_paging() the dev
argument is never NULL here.

This allows drivers to fully initialize the SVA domain and allocate the
mmu_notifier during allocation. It allows the notifier lifetime to follow
the lifetime of the iommu_domain.

Since we have only one call site, upgrade the new op to return ERR_PTR
instead of NULL.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[Removed smmu3 related changes - Vasant]
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Tina Zhang <tina.zhang@intel.com>
---
 drivers/iommu/iommu-sva.c | 16 +++++++++++-----
 include/linux/iommu.h     |  3 +++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index b51995b4fe90..c2e3a083ea43 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -99,8 +99,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 
 	/* Allocate a new domain and set it on device pasid. */
 	domain = iommu_sva_domain_alloc(dev, mm);
-	if (!domain) {
-		ret = -ENOMEM;
+	if (IS_ERR(domain)) {
+		ret = PTR_ERR(domain);
 		goto out_free_handle;
 	}
 
@@ -266,9 +266,15 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
 	struct iommu_domain *domain;
 
-	domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
-	if (!domain)
-		return NULL;
+	if (ops->domain_alloc_sva) {
+		domain = ops->domain_alloc_sva(dev, mm);
+		if (IS_ERR(domain))
+			return domain;
+	} else {
+		domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+		if (!domain)
+			return ERR_PTR(-ENOMEM);
+	}
 
 	domain->type = IOMMU_DOMAIN_SVA;
 	mmgrab(mm);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c983b6a1ebce..8019712c3f6d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -454,6 +454,7 @@ static inline int __iommu_copy_struct_from_user(
  *                     Upon failure, ERR_PTR must be returned.
  * @domain_alloc_paging: Allocate an iommu_domain that can be used for
  *                       UNMANAGED, DMA, and DMA_FQ domain types.
+ * @domain_alloc_sva: Allocate an iommu_domain for Shared Virtual Addressing.
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -494,6 +495,8 @@ struct iommu_ops {
 		struct device *dev, u32 flags, struct iommu_domain *parent,
 		const struct iommu_user_data *user_data);
 	struct iommu_domain *(*domain_alloc_paging)(struct device *dev);
+	struct iommu_domain *(*domain_alloc_sva)(struct device *dev,
+						 struct mm_struct *mm);
 
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 14/14] iommu/amd: Add SVA domain support
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (12 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 13/14] iommu: Add ops->domain_alloc_sva() Vasant Hegde
@ 2024-01-18  7:33 ` Vasant Hegde
  2024-02-02 15:28   ` Jason Gunthorpe
  2024-01-18  7:40 ` [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
  14 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:33 UTC (permalink / raw)
  To: iommu, joro
  Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg, Vasant Hegde

- Allocate SVA domain and setup mmu notifier. In free path unregister
  mmu notifier and free protection domain.

- Add mmu notifier callback function. It will retrieve SVA protection
  domain and invalidates IO/TLB.

Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 drivers/iommu/amd/amd_iommu.h       |  5 ++
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/iommu.c           | 10 ++--
 drivers/iommu/amd/pasid.c           | 80 +++++++++++++++++++++++++++++
 4 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1344b48caa46..3008503513df 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -45,6 +45,11 @@ extern enum io_pgtable_fmt amd_iommu_pgtable;
 extern int amd_iommu_gpt_level;
 
 /* Protection domain ops */
+struct protection_domain *protection_domain_alloc(unsigned int type);
+void protection_domain_free(struct protection_domain *domain);
+struct iommu_domain *amd_iommu_domain_alloc_sva(struct device *dev,
+						struct mm_struct *mm);
+void amd_iommu_domain_free(struct iommu_domain *dom);
 int iommu_sva_set_dev_pasid(struct iommu_domain *domain,
 			    struct device *dev, ioasid_t pasid);
 void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 7609fc1c5aee..722120f95647 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -594,6 +594,7 @@ struct protection_domain {
 	unsigned dev_cnt;	/* devices assigned to this domain */
 	unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 
+	struct mmu_notifier mn;	/* mmu notifier for the SVA domain */
 	struct list_head dev_data_list; /* List of pdom_dev_data */
 };
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 703f2c081ed0..4280d97fa6ba 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2251,7 +2251,7 @@ static void cleanup_domain(struct protection_domain *domain)
 	WARN_ON(domain->dev_cnt != 0);
 }
 
-static void protection_domain_free(struct protection_domain *domain)
+void protection_domain_free(struct protection_domain *domain)
 {
 	if (!domain)
 		return;
@@ -2294,7 +2294,7 @@ static int protection_domain_init_v2(struct protection_domain *pdom)
 	return 0;
 }
 
-static struct protection_domain *protection_domain_alloc(unsigned int type)
+struct protection_domain *protection_domain_alloc(unsigned int type)
 {
 	struct io_pgtable_ops *pgtbl_ops;
 	struct protection_domain *domain;
@@ -2317,6 +2317,7 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
 	switch (type) {
 	/* No need to allocate io pgtable ops in passthrough mode */
 	case IOMMU_DOMAIN_IDENTITY:
+	case IOMMU_DOMAIN_SVA:
 		return domain;
 	case IOMMU_DOMAIN_DMA:
 		pgtable = amd_iommu_pgtable;
@@ -2436,7 +2437,7 @@ amd_iommu_domain_alloc_user(struct device *dev, u32 flags,
 	return do_iommu_domain_alloc(type, dev, flags);
 }
 
-static void amd_iommu_domain_free(struct iommu_domain *dom)
+void amd_iommu_domain_free(struct iommu_domain *dom)
 {
 	struct protection_domain *domain;
 	unsigned long flags;
@@ -2804,6 +2805,7 @@ static int amd_iommu_dev_enable_feature(struct device *dev,
 
 	switch (feat) {
 	case IOMMU_DEV_FEAT_IOPF:
+	case IOMMU_DEV_FEAT_SVA:
 		break;
 	default:
 		ret = -EINVAL;
@@ -2819,6 +2821,7 @@ static int amd_iommu_dev_disable_feature(struct device *dev,
 
 	switch (feat) {
 	case IOMMU_DEV_FEAT_IOPF:
+	case IOMMU_DEV_FEAT_SVA:
 		break;
 	default:
 		ret = -EINVAL;
@@ -2831,6 +2834,7 @@ const struct iommu_ops amd_iommu_ops = {
 	.capable = amd_iommu_capable,
 	.domain_alloc = amd_iommu_domain_alloc,
 	.domain_alloc_user = amd_iommu_domain_alloc_user,
+	.domain_alloc_sva = amd_iommu_domain_alloc_sva,
 	.probe_device = amd_iommu_probe_device,
 	.release_device = amd_iommu_release_device,
 	.probe_finalize = amd_iommu_probe_finalize,
diff --git a/drivers/iommu/amd/pasid.c b/drivers/iommu/amd/pasid.c
index 6b9d1917aa4c..f47feacfe92a 100644
--- a/drivers/iommu/amd/pasid.c
+++ b/drivers/iommu/amd/pasid.c
@@ -121,6 +121,49 @@ static void remove_pdom_dev_pasid(struct protection_domain *pdom,
 	}
 }
 
+static void sva_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn,
+				    struct mm_struct *mm,
+				    unsigned long start, unsigned long end)
+{
+	struct pdom_dev_data *pdom_dev_data;
+	struct protection_domain *sva_pdom;
+	unsigned long flags;
+
+	sva_pdom = container_of(mn, struct protection_domain, mn);
+
+	spin_lock_irqsave(&sva_pdom->lock, flags);
+
+	for_each_pdom_dev_data(pdom_dev_data, sva_pdom) {
+		amd_iommu_dev_flush_pasid_pages(pdom_dev_data->dev_data,
+						pdom_dev_data->pasid,
+						start, end - start);
+	}
+
+	spin_unlock_irqrestore(&sva_pdom->lock, flags);
+}
+
+static void sva_mn_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct pdom_dev_data *pdom_dev_data, *next;
+	struct protection_domain *sva_pdom;
+	unsigned long flags;
+
+	sva_pdom = container_of(mn, struct protection_domain, mn);
+
+	spin_lock_irqsave(&sva_pdom->lock, flags);
+
+	/* Assume dev_data_list contains same PASID with different devices */
+	for_each_pdom_dev_data_safe(pdom_dev_data, next, sva_pdom)
+		remove_dev_pasid(pdom_dev_data);
+
+	spin_unlock_irqrestore(&sva_pdom->lock, flags);
+}
+
+static const struct mmu_notifier_ops sva_mn = {
+	.arch_invalidate_secondary_tlbs = sva_arch_invalidate_secondary_tlbs,
+	.release = sva_mn_release,
+};
+
 int iommu_sva_set_dev_pasid(struct iommu_domain *domain,
 			    struct device *dev, ioasid_t pasid)
 {
@@ -189,3 +232,40 @@ void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 
 	spin_unlock_irqrestore(&sva_pdom->lock, flags);
 }
+
+static void iommu_sva_domain_free(struct iommu_domain *domain)
+{
+	struct protection_domain *sva_pdom = to_pdomain(domain);
+
+	if (sva_pdom->mn.ops)
+		mmu_notifier_unregister(&sva_pdom->mn, domain->mm);
+
+	amd_iommu_domain_free(domain);
+}
+
+static const struct iommu_domain_ops amd_sva_domain_ops = {
+	.set_dev_pasid = iommu_sva_set_dev_pasid,
+	.free	       = iommu_sva_domain_free
+};
+
+struct iommu_domain *amd_iommu_domain_alloc_sva(struct device *dev,
+						struct mm_struct *mm)
+{
+	struct protection_domain *pdom;
+	int ret;
+
+	pdom = protection_domain_alloc(IOMMU_DOMAIN_SVA);
+	if (!pdom)
+		return ERR_PTR(-ENOMEM);
+
+	pdom->domain.ops = &amd_sva_domain_ops;
+	pdom->mn.ops = &sva_mn;
+
+	ret = mmu_notifier_register(&pdom->mn, mm);
+	if (ret) {
+		protection_domain_free(pdom);
+		return ERR_PTR(ret);
+	}
+
+	return &pdom->domain;
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF
  2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
                   ` (13 preceding siblings ...)
  2024-01-18  7:33 ` [PATCH v5 14/14] iommu/amd: Add SVA domain support Vasant Hegde
@ 2024-01-18  7:40 ` Vasant Hegde
  14 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-01-18  7:40 UTC (permalink / raw)
  To: iommu, joro; +Cc: suravee.suthikulpanit, wei.huang2, jsnitsel, jgg



On 1/18/2024 1:03 PM, Vasant Hegde wrote:
> This is part 4 of the 4-part series to introduce Share Virtual Address
> (SVA) support for devices, which can support PCI ATS, PASID and PRI
> capabilities. These devices are referred to as SVA-capable devices in
> this series.
> 
> It contains the following enhancements:
> 
> * Patch 1 - 3:
>   Rename, add support to enable/disable features, update DTE etc.
> 
> * Patch 4 - 10:
>   Add IOPF support
> 
> * Patch 11 - 14:
>   Introduce SVA support
> 
> 
> This patch series is based on top of SVA Part 3 v5.
>   https://lore.kernel.org/linux-iommu/20240116165335.6043-1-vasant.hegde@amd.com/T/#t
> 
> Also depends on :
>   1) Baolu's IOPF improvement
>      https://lore.kernel.org/linux-iommu/20231220012332.168188-1-baolu.lu@linux.intel.com/
> 
> This is also available at github :
>   https://github.com/AMDESE/linux/tree/iommu_sva_part4_v4_v6.7_rc8

Sorry for the typo. Correct link :
  https://github.com/AMDESE/linux/tree/iommu_sva_part4_v5_v6.7_rc8

-Vasant


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 07/14] iommu/amd: Add support for page response
  2024-01-18  7:33 ` [PATCH v5 07/14] iommu/amd: Add support for page response Vasant Hegde
@ 2024-02-01 20:20   ` Jason Gunthorpe
  2024-02-06 15:39     ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-01 20:20 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:32AM +0000, Vasant Hegde wrote:

> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
> index 1eb0b737afb5..b1751eb495fd 100644
> --- a/drivers/iommu/amd/ppr.c
> +++ b/drivers/iommu/amd/ppr.c
> @@ -148,3 +148,18 @@ void amd_iommu_iopf_uninit(struct amd_iommu *iommu)
>  
>  	raw_spin_unlock_irqrestore(&iommu->lock, flags);
>  }
> +
> +int amd_iommu_page_response(struct device *dev,
> +			    struct iopf_fault *evt,
> +			    struct iommu_page_response *resp)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev || !dev_is_pci(dev))
> +		return -ENODEV;
> +
> +	pdev = to_pci_dev(dev);
> +
> +	return amd_iommu_complete_ppr(pdev, resp->pasid, resp->code,
> +				      resp->grpid);

Converting this to a pci_dev seems pointless? Nothing reaches into the
PCI struct. Why do it?

We don't need sanity checking here on the fast path..

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF
  2024-01-18  7:33 ` [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF Vasant Hegde
@ 2024-02-01 21:46   ` Jason Gunthorpe
  2024-02-06 16:02     ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-01 21:46 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:33AM +0000, Vasant Hegde wrote:
> From: Wei Huang <wei.huang2@amd.com>
> 
> Which adds/removes the device to the corresponding per-IOMMU iopf_queue.
> It also registers device fault handler. These interfaces are called
> from IOPF feature enable/disable path.

"fault handler"? Not anymore right?

They should not be called from the iopf feature path :(
 
> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
> index b1751eb495fd..6b1fdda95514 100644
> --- a/drivers/iommu/amd/ppr.c
> +++ b/drivers/iommu/amd/ppr.c
> @@ -163,3 +163,38 @@ int amd_iommu_page_response(struct device *dev,
>  	return amd_iommu_complete_ppr(pdev, resp->pasid, resp->code,
>  				      resp->grpid);
>  }
> +
> +int amd_iommu_iopf_add_device(struct amd_iommu *iommu, struct device *dev)
> +{
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	raw_spin_lock_irqsave(&iommu->lock, flags);
> +
> +	if (!iommu->iopf_queue) {
> +		raw_spin_unlock_irqrestore(&iommu->lock, flags);
> +		return ret;

goto out_unlock if you are going to write it like that

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-01-18  7:33 ` [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF Vasant Hegde
@ 2024-02-01 21:49   ` Jason Gunthorpe
  2024-02-06 16:19     ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-01 21:49 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:35AM +0000, Vasant Hegde wrote:
> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
> index 02d912b9f618..e1b3d579772f 100644
> --- a/drivers/iommu/amd/ppr.c
> +++ b/drivers/iommu/amd/ppr.c
> @@ -295,3 +295,60 @@ int amd_iommu_iopf_remove_device(struct amd_iommu *iommu, struct device *dev)
>  	raw_spin_unlock_irqrestore(&iommu->lock, flags);
>  	return 0;
>  }
> +
> +static int iopf_update_device(struct device *dev, bool enable)
> +{
> +	struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL;
> +	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> +	struct protection_domain *pdom = dev_data->domain;
> +	int ret = 0;
> +
> +	if (!pdev || !pdom)
> +		return -EINVAL;
> +
> +	spin_lock(&dev_data->lock);
> +
> +	if (enable) {
> +		if (dev_data->ppr)
> +			goto out;
> +
> +		ret = amd_iommu_iopf_add_device(iommu, dev);
> +		if (ret)
> +			goto out;
> +
> +		dev_data->ppr = true;
> +	} else {
> +		ret = amd_iommu_iopf_remove_device(iommu, dev);
> +		dev_data->ppr = false;
> +	}
> +
> +	amd_iommu_dev_update_dte(dev_data, true);

This dte change is in the wrong place. When the domain is first
attached we know if it requires PRI or not. At that moment the DTE
should be set properly, it should not be set wrong and then changed
later.

Remember what I said earlier there should be one DTE write on the
attach paths.

So probably all of this should go away and be moved into set dte

(and again the whole dte setting flow needs cleaning, I think you
should do that before trying to build more complex stuff on top)

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function
  2024-01-18  7:33 ` [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function Vasant Hegde
@ 2024-02-02 15:17   ` Jason Gunthorpe
  2024-02-06 17:00     ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:17 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:36AM +0000, Vasant Hegde wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> These functions are used in PASID to device bind path. In this path
> it checks whether device GCR3 table is setup or not. If not it will
> setup GCR3 table.
> 
> Also in attach device path change default GCR3 table to use MAX
> supported PASIDs. Ideally it should use 1 level PASID table as its
> using PASID zero only. But we don't have support to extend PASID table
> yet. We will fix this later.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu.h |  3 ++
>  drivers/iommu/amd/iommu.c     | 52 +++++++++++++++++++++++++++++++++--
>  2 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index d5e6315ef9dd..2d099c54b941 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -44,7 +44,10 @@ extern int amd_iommu_guest_ir;
>  extern enum io_pgtable_fmt amd_iommu_pgtable;
>  extern int amd_iommu_gpt_level;
>  
> +/* SVA/PASID */
>  bool amd_iommu_pasid_supported(void);
> +int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids);
> +void amd_iommu_gcr3_uninit(struct iommu_dev_data *dev_data);
>  
>  /* IOPF */
>  int amd_iommu_iopf_init(struct amd_iommu *iommu);
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 36a5458a7946..6f5900333946 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -93,6 +93,11 @@ static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
>  	return (pdom && (pdom->pd_mode == PD_MODE_V2));
>  }
>  
> +static inline bool pdom_is_in_pt_mode(struct protection_domain *pdom)
> +{
> +	return (pdom->domain.type == IOMMU_DOMAIN_IDENTITY);
> +}

Why? And what about blocking someday?

Just use dev_data->gcr3_info.gcr3_tbl == NULL to indicate if GCR3 is
loaded or not?

>  /*
>   * Allocate per device domain ID when using V2 page table
>   */
> @@ -1837,6 +1842,46 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
>  	return ret;
>  }
>  
> +int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids)
> +{
> +	struct protection_domain *pdom = dev_data->domain;
> +	int ret = 0;
> +
> +	lockdep_assert_held(&dev_data->lock);
> +
> +	/*
> +	 * We cannot support PASID w/ existing v1 page table in the same domain
> +	 * since it will be nested. However, existing domain w/ v2 page table
> +	 * can be used for PASID.
> +	 */
> +	if (!pdom_is_v2_pgtbl_mode(pdom) && !pdom_is_in_pt_mode(pdom))
> +		return -EOPNOTSUPP;

Put the test directly in the set dev pasid op and it should be more
like

if (!dev_data->gcr3_info.gcr3_tbl && dev_data->domain->type & _IOMMU_DOMAIN_PAGING)
    return -EINVAL;


And you need the mirroring test in alloc_dev to prevent changing the
RID to things the driver does not yet support, I didn't notice that?

> +	/* Allocate GCR3 table */
> +	if (pdom_is_in_pt_mode(pdom) && dev_data->gcr3_info.gcr3_tbl == NULL) {
> +		ret = setup_gcr3_table(&dev_data->gcr3_info,
> +				       dev_to_node(dev_data->dev), pasids);
> +
> +		/* Update device table */
> +		amd_iommu_dev_update_dte(dev_data, true);
> +	}

This shouldn't be any different from do_attach()

> +
> +	return ret;

success oritened flow please

> +void amd_iommu_gcr3_uninit(struct iommu_dev_data *dev_data)
> +{
> +	lockdep_assert_held(&dev_data->lock);
> +
> +	/* Free GCR3 table */
> +	if (pdom_is_in_pt_mode(dev_data->domain)) {
> +		free_gcr3_table(&dev_data->gcr3_info);
> +
> +		/* Update device table */
> +		amd_iommu_dev_update_dte(dev_data, true);
> +	}
> +}

Is it really worth freeing it?

>  static void set_dte_entry(struct amd_iommu *iommu,
>  			  struct iommu_dev_data *dev_data)
>  {
> @@ -1986,9 +2031,12 @@ static int do_attach(struct iommu_dev_data *dev_data,
>  
>  	/* Init GCR3 table and update device table */
>  	if (domain->pd_mode == PD_MODE_V2) {
> -		/* By default, setup GCR3 table to support single PASID */
> +		/*
> +		 * By default, setup GCR3 table to support MAX PASIDs
> +		 * supported by the IOMMU HW.
> +		 */
>  		ret = setup_gcr3_table(&dev_data->gcr3_info,
> -				       dev_to_node(dev_data->dev), 1);
> +				       dev_to_node(dev_data->dev), -1);

-1 should be dev->iommu->max_pasids, but better would be this specific
device's max pasid size, cached from pci_max_pasids()

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-01-18  7:33 ` [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU Vasant Hegde
@ 2024-02-02 15:25   ` Jason Gunthorpe
  2024-02-06 17:16     ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:25 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:37AM +0000, Vasant Hegde wrote:

> +static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
> +{
> +	struct device *dev = dev_data->dev;
> +	int ret = 0;
> +
> +	spin_lock(&dev_data->lock);
> +
> +	if (is_pasid_enabled(dev_data))
> +		goto out;
> +
> +	if (!amd_iommu_pasid_supported()) {
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	/* attach_device path enables device PASID feature */
> +	if (!dev_data->pasid_enabled) {

How many times are we testing for this? Just check if the gcr3 table
is installed once

> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = amd_iommu_gcr3_init(dev_data, dev->iommu->max_pasids);
> +
> +out:
> +	spin_unlock(&dev_data->lock);
> +	return ret;
> +}

This seems like too much, and it doesn't need to be in a function..

1) Check directly if the gcr3 table is installed otherwise try to
install it. That might be a usefull function

2) Precompute the max pasids during device probe and store it in
iommu_dev_data. Just check if PASID >= max_pasid and fail

> +static void iommu_pasid_disable(struct iommu_dev_data *dev_data)
> +{
> +	spin_lock(&dev_data->lock);
> +
> +	if (!is_gcr3_table_empty(dev_data))
> +		goto out;

Caller already checked this

> +
> +	if (dev_data->gcr3_info.gcr3_tbl == NULL)
> +		goto out;

Can't happen, right?

> +
> +	amd_iommu_gcr3_uninit(dev_data);

Just write it out clearly in remove dev pasid

if (is_gcr3_table_empty(dev_data))
   amd_iommu_gcr3_uninit(dev_data);

> +static int iommu_setup_pasid_pri(struct iommu_dev_data *dev_data)
> +{
> +	struct pci_dev *pdev;
> +	int ret;
> +
> +	if (is_pasid_enabled(dev_data))
> +		return 0;
> +
> +	ret = iommu_pasid_enable(dev_data);
> +	if (ret)
> +		return ret;
> +
> +	pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
> +	if (!pdev || !amd_iommu_pdev_pri_supported(pdev))
> +		return 0;
> +
> +	if (!dev_data->pri_enabled)
> +		return -EINVAL;
> +
> +	ret = amd_iommu_iopf_enable_device(dev_data->dev);
> +
> +	return ret;
> +}
> +
> +static void remove_dev_pasid(struct pdom_dev_data *pdom_dev_data)
> +{
> +	/* Update GCR3 table and flush IOTLB */
> +	amd_iommu_clear_gcr3(pdom_dev_data->dev_data, pdom_dev_data->pasid);

This is in the wrong place, there is only one DTE/GCR3 remove_dev is
touching. The list iteration below is just to clean up the tracking
list it should not touch HW. Move this up into
amd_iommu_remove_dev_pasid.

And these functions related to the tracking list should be more
general and called in more places. Just the tracking list itself
should have a few patches to create it and situate it generically in
the driver. Even the RID should be using the same mechanism.

> +int iommu_sva_set_dev_pasid(struct iommu_domain *domain,
> +			    struct device *dev, ioasid_t pasid)
> +{
> +	struct pdom_dev_data *pdom_dev_data;
> +	struct protection_domain *sva_pdom = to_pdomain(domain);
> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	/* PASID zero is used for requests from the I/O device without PASID */
> +	if (pasid == 0 || pasid >= dev->iommu->max_pasids)
> +		return ret;
> +
> +	/* Make sure PASID/PRI is enabled */
> +	ret = iommu_setup_pasid_pri(dev_data);
> +	if (ret)
> +		return ret;
> +
> +	/* Add PASID to protection domain pasid list */
> +	pdom_dev_data = kzalloc(sizeof(*pdom_dev_data), GFP_KERNEL);
> +	if (pdom_dev_data == NULL)
> +		return ret;
> +
> +	pdom_dev_data->pasid = pasid;
> +	pdom_dev_data->dev_data = dev_data;
> +
> +	/* Setup GCR3 table */
> +	ret = amd_iommu_set_gcr3(dev_data, pasid,
> +				 iommu_virt_to_phys(domain->mm->pgd));
> +	if (ret) {
> +		kfree(pdom_dev_data);
> +		return ret;
> +	}
> +
> +	spin_lock_irqsave(&sva_pdom->lock, flags);
> +	list_add(&pdom_dev_data->list, &sva_pdom->dev_data_list);
> +	spin_unlock_irqrestore(&sva_pdom->lock, flags);

This doesn't seem right? The tracking list should be loaded before any
change is made visible to the HW, or at least under the same lock.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 14/14] iommu/amd: Add SVA domain support
  2024-01-18  7:33 ` [PATCH v5 14/14] iommu/amd: Add SVA domain support Vasant Hegde
@ 2024-02-02 15:28   ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:28 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:39AM +0000, Vasant Hegde wrote:
> - Allocate SVA domain and setup mmu notifier. In free path unregister
>   mmu notifier and free protection domain.
> 
> - Add mmu notifier callback function. It will retrieve SVA protection
>   domain and invalidates IO/TLB.
> 
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu.h       |  5 ++
>  drivers/iommu/amd/amd_iommu_types.h |  1 +
>  drivers/iommu/amd/iommu.c           | 10 ++--
>  drivers/iommu/amd/pasid.c           | 80 +++++++++++++++++++++++++++++
>  4 files changed, 93 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 02/14] iommu/amd: Introduce per device DTE update function
  2024-01-18  7:33 ` [PATCH v5 02/14] iommu/amd: Introduce per device DTE update function Vasant Hegde
@ 2024-02-02 15:29   ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:29 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:27AM +0000, Vasant Hegde wrote:
> Consolidate per device update and flush logic into separate function.
> Also make it as global function as it will be used in subsequent series
> to update the DTE.
> 
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu.h |  1 +
>  drivers/iommu/amd/iommu.c     | 26 ++++++++++++++++++--------
>  2 files changed, 19 insertions(+), 8 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 04/14] iommu/amd: Move PPR-related functions into ppr.c
  2024-01-18  7:33 ` [PATCH v5 04/14] iommu/amd: Move PPR-related functions into ppr.c Vasant Hegde
@ 2024-02-02 15:29   ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:29 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:29AM +0000, Vasant Hegde wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> In preparation to subsequent PPR-related patches, and also remove static
> declaration for certain helper functions so that it can be reused in other
> files.
> 
> Also rename below functions:
>   alloc_ppr_log        -> amd_iommu_alloc_ppr_log
>   iommu_enable_ppr_log -> amd_iommu_enable_ppr_log
>   free_ppr_log         -> amd_iommu_free_ppr_log
>   iommu_poll_ppr_log   -> amd_iommu_poll_ppr_log
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  drivers/iommu/amd/Makefile    |   2 +-
>  drivers/iommu/amd/amd_iommu.h |  17 +++++-
>  drivers/iommu/amd/init.c      |  65 +++-----------------
>  drivers/iommu/amd/iommu.c     |  55 +----------------
>  drivers/iommu/amd/ppr.c       | 112 ++++++++++++++++++++++++++++++++++
>  5 files changed, 137 insertions(+), 114 deletions(-)
>  create mode 100644 drivers/iommu/amd/ppr.c

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 05/14] iommu/amd: Fix PPR interrupt processing logic
  2024-01-18  7:33 ` [PATCH v5 05/14] iommu/amd: Fix PPR interrupt processing logic Vasant Hegde
@ 2024-02-02 15:30   ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:30 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel,
	Tom Lendacky

On Thu, Jan 18, 2024 at 07:33:30AM +0000, Vasant Hegde wrote:
> * Do not re-read ppr head pointer as its just updated by the driver.
> 
> * Do not read PPR buffer tail pointer inside while loop. If IOMMU
>   generates PPR events continuously then completing interrupt processing
>   takes long time. In worst case it may cause infinite loop.
> 
> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  drivers/iommu/amd/ppr.c | 4 ----
>  1 file changed, 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 06/14] iommu/amd: Define per-IOMMU iopf_queue
  2024-01-18  7:33 ` [PATCH v5 06/14] iommu/amd: Define per-IOMMU iopf_queue Vasant Hegde
@ 2024-02-02 15:30   ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-02 15:30 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Jan 18, 2024 at 07:33:31AM +0000, Vasant Hegde wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> AMD IOMMU hardware supports PCI Peripheral Paging Request (PPR) using
> a PPR log, which is a circular buffer containing requests from downstream
> end-point devices.
> 
> There is one PPR log per IOMMU instance. Therefore, allocate an iopf_queue
> per IOMMU instance during driver initialization, and free the queue during
> driver deinitialization.
> 
> Also rename enable_iommus_v2() -> enable_iommus_ppr() to reflect its
> usage.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu.h       |  5 ++++
>  drivers/iommu/amd/amd_iommu_types.h |  4 +++
>  drivers/iommu/amd/init.c            | 12 ++++++---
>  drivers/iommu/amd/ppr.c             | 42 +++++++++++++++++++++++++++++
>  4 files changed, 60 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 07/14] iommu/amd: Add support for page response
  2024-02-01 20:20   ` Jason Gunthorpe
@ 2024-02-06 15:39     ` Vasant Hegde
  0 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-02-06 15:39 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/2/2024 1:50 AM, Jason Gunthorpe wrote:
> On Thu, Jan 18, 2024 at 07:33:32AM +0000, Vasant Hegde wrote:
> 
>> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
>> index 1eb0b737afb5..b1751eb495fd 100644
>> --- a/drivers/iommu/amd/ppr.c
>> +++ b/drivers/iommu/amd/ppr.c
>> @@ -148,3 +148,18 @@ void amd_iommu_iopf_uninit(struct amd_iommu *iommu)
>>  
>>  	raw_spin_unlock_irqrestore(&iommu->lock, flags);
>>  }
>> +
>> +int amd_iommu_page_response(struct device *dev,
>> +			    struct iopf_fault *evt,
>> +			    struct iommu_page_response *resp)
>> +{
>> +	struct pci_dev *pdev;
>> +
>> +	if (!dev || !dev_is_pci(dev))
>> +		return -ENODEV;
>> +
>> +	pdev = to_pci_dev(dev);
>> +
>> +	return amd_iommu_complete_ppr(pdev, resp->pasid, resp->code,
>> +				      resp->grpid);
> 
> Converting this to a pci_dev seems pointless? Nothing reaches into the
> PCI struct. Why do it?

I can update amd_iommu_complete_ppr() to accept 'struct device' instead of pdev
as we don't need pdev reference inside amd_iommu_complete_ppr().

Will fix it in v6.

-Vasant


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF
  2024-02-01 21:46   ` Jason Gunthorpe
@ 2024-02-06 16:02     ` Vasant Hegde
  0 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-02-06 16:02 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/2/2024 3:16 AM, Jason Gunthorpe wrote:
> On Thu, Jan 18, 2024 at 07:33:33AM +0000, Vasant Hegde wrote:
>> From: Wei Huang <wei.huang2@amd.com>
>>
>> Which adds/removes the device to the corresponding per-IOMMU iopf_queue.
>> It also registers device fault handler. These interfaces are called
>> from IOPF feature enable/disable path.
> 
> "fault handler"? Not anymore right?

Yes. It adds device to IOMMU IOPF queue so that we can handler faults from device.

> 
> They should not be called from the iopf feature path :(

We enable IOPF feature (and device PASID) in set_dev_pasid() path when we attach
first PASID to device.


>  
>> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
>> index b1751eb495fd..6b1fdda95514 100644
>> --- a/drivers/iommu/amd/ppr.c
>> +++ b/drivers/iommu/amd/ppr.c
>> @@ -163,3 +163,38 @@ int amd_iommu_page_response(struct device *dev,
>>  	return amd_iommu_complete_ppr(pdev, resp->pasid, resp->code,
>>  				      resp->grpid);
>>  }
>> +
>> +int amd_iommu_iopf_add_device(struct amd_iommu *iommu, struct device *dev)
>> +{
>> +	unsigned long flags;
>> +	int ret = -EINVAL;
>> +
>> +	raw_spin_lock_irqsave(&iommu->lock, flags);
>> +
>> +	if (!iommu->iopf_queue) {
>> +		raw_spin_unlock_irqrestore(&iommu->lock, flags);
>> +		return ret;
> 
> goto out_unlock if you are going to write it like that

Ok.

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-01 21:49   ` Jason Gunthorpe
@ 2024-02-06 16:19     ` Vasant Hegde
  2024-02-06 16:36       ` Jason Gunthorpe
  0 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-02-06 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/2/2024 3:19 AM, Jason Gunthorpe wrote:
> On Thu, Jan 18, 2024 at 07:33:35AM +0000, Vasant Hegde wrote:
>> diff --git a/drivers/iommu/amd/ppr.c b/drivers/iommu/amd/ppr.c
>> index 02d912b9f618..e1b3d579772f 100644
>> --- a/drivers/iommu/amd/ppr.c
>> +++ b/drivers/iommu/amd/ppr.c
>> @@ -295,3 +295,60 @@ int amd_iommu_iopf_remove_device(struct amd_iommu *iommu, struct device *dev)
>>  	raw_spin_unlock_irqrestore(&iommu->lock, flags);
>>  	return 0;
>>  }
>> +
>> +static int iopf_update_device(struct device *dev, bool enable)
>> +{
>> +	struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL;
>> +	struct amd_iommu *iommu = get_amd_iommu_from_dev(dev);
>> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
>> +	struct protection_domain *pdom = dev_data->domain;
>> +	int ret = 0;
>> +
>> +	if (!pdev || !pdom)
>> +		return -EINVAL;
>> +
>> +	spin_lock(&dev_data->lock);
>> +
>> +	if (enable) {
>> +		if (dev_data->ppr)
>> +			goto out;
>> +
>> +		ret = amd_iommu_iopf_add_device(iommu, dev);
>> +		if (ret)
>> +			goto out;
>> +
>> +		dev_data->ppr = true;
>> +	} else {
>> +		ret = amd_iommu_iopf_remove_device(iommu, dev);
>> +		dev_data->ppr = false;
>> +	}
>> +
>> +	amd_iommu_dev_update_dte(dev_data, true);
> 
> This dte change is in the wrong place. When the domain is first
> attached we know if it requires PRI or not. At that moment the DTE
> should be set properly, it should not be set wrong and then changed
> later.

We want to enable PPR support only after setting up the handler.


> 
> Remember what I said earlier there should be one DTE write on the
> attach paths.
> 
> So probably all of this should go away and be moved into set dte
> 
> (and again the whole dte setting flow needs cleaning, I think you
> should do that before trying to build more complex stuff on top)

Yeah. I want to fix few things in that path. But that's outside this series.

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-06 16:19     ` Vasant Hegde
@ 2024-02-06 16:36       ` Jason Gunthorpe
  2024-02-06 17:29         ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 16:36 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Tue, Feb 06, 2024 at 09:49:36PM +0530, Vasant Hegde wrote:

> > This dte change is in the wrong place. When the domain is first
> > attached we know if it requires PRI or not. At that moment the DTE
> > should be set properly, it should not be set wrong and then changed
> > later.
> 
> We want to enable PPR support only after setting up the handler.

That's backwards. It means error handling can't really be sane..

> > (and again the whole dte setting flow needs cleaning, I think you
> > should do that before trying to build more complex stuff on top)
> 
> Yeah. I want to fix few things in that path. But that's outside this series.

I was looking at it and there are many security bugs in here now that
iommufd can change the DTE at any time. The current design assumes DMA
will be stopped and ignores the spec guidance on how to do a safe DTE
update :(

I'm really not comfortable with adding more stuff here until the
security issue is solved. Especially if the more stuff is drifting
further from being correct. If you can keep the updates in set_dte
then maybe with some reluctance. But not like this with random touches
to the DTE all over the place.

I was looking at what it would take and got part way through. It is a
very similar problem to ARM and I drafted a similar solution.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function
  2024-02-02 15:17   ` Jason Gunthorpe
@ 2024-02-06 17:00     ` Vasant Hegde
  0 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-02-06 17:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/2/2024 8:47 PM, Jason Gunthorpe wrote:
> On Thu, Jan 18, 2024 at 07:33:36AM +0000, Vasant Hegde wrote:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>
>> These functions are used in PASID to device bind path. In this path
>> it checks whether device GCR3 table is setup or not. If not it will
>> setup GCR3 table.
>>
>> Also in attach device path change default GCR3 table to use MAX
>> supported PASIDs. Ideally it should use 1 level PASID table as its
>> using PASID zero only. But we don't have support to extend PASID table
>> yet. We will fix this later.
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com>
>> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
>> ---
>>  drivers/iommu/amd/amd_iommu.h |  3 ++
>>  drivers/iommu/amd/iommu.c     | 52 +++++++++++++++++++++++++++++++++--
>>  2 files changed, 53 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
>> index d5e6315ef9dd..2d099c54b941 100644
>> --- a/drivers/iommu/amd/amd_iommu.h
>> +++ b/drivers/iommu/amd/amd_iommu.h
>> @@ -44,7 +44,10 @@ extern int amd_iommu_guest_ir;
>>  extern enum io_pgtable_fmt amd_iommu_pgtable;
>>  extern int amd_iommu_gpt_level;
>>  
>> +/* SVA/PASID */
>>  bool amd_iommu_pasid_supported(void);
>> +int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids);
>> +void amd_iommu_gcr3_uninit(struct iommu_dev_data *dev_data);
>>  
>>  /* IOPF */
>>  int amd_iommu_iopf_init(struct amd_iommu *iommu);
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index 36a5458a7946..6f5900333946 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -93,6 +93,11 @@ static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
>>  	return (pdom && (pdom->pd_mode == PD_MODE_V2));
>>  }
>>  
>> +static inline bool pdom_is_in_pt_mode(struct protection_domain *pdom)
>> +{
>> +	return (pdom->domain.type == IOMMU_DOMAIN_IDENTITY);
>> +}
> 
> Why? And what about blocking someday?

Today model allocates GCR3 if its in PT mode only. If we support PASID with
BLOCKing domain then I will add that support.

> 
> Just use dev_data->gcr3_info.gcr3_tbl == NULL to indicate if GCR3 is
> loaded or not?

Given number of combination (like v2, v1, pt) above check is not sufficient.

> 
>>  /*
>>   * Allocate per device domain ID when using V2 page table
>>   */
>> @@ -1837,6 +1842,46 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
>>  	return ret;
>>  }
>>  
>> +int amd_iommu_gcr3_init(struct iommu_dev_data *dev_data, ioasid_t pasids)
>> +{
>> +	struct protection_domain *pdom = dev_data->domain;
>> +	int ret = 0;
>> +
>> +	lockdep_assert_held(&dev_data->lock);
>> +
>> +	/*
>> +	 * We cannot support PASID w/ existing v1 page table in the same domain
>> +	 * since it will be nested. However, existing domain w/ v2 page table
>> +	 * can be used for PASID.
>> +	 */
>> +	if (!pdom_is_v2_pgtbl_mode(pdom) && !pdom_is_in_pt_mode(pdom))
>> +		return -EOPNOTSUPP;
> 
> Put the test directly in the set dev pasid op and it should be more
> like
> 
> if (!dev_data->gcr3_info.gcr3_tbl && dev_data->domain->type & _IOMMU_DOMAIN_PAGING)
>     return -EINVAL;

I don't think that's right place. sev_dev pasid doesn't need to know all domain
combination, setting up gcr3 table, etc.

> 
> 
> And you need the mirroring test in alloc_dev to prevent changing the
> RID to things the driver does not yet support, I didn't notice that?

You mean attach_dev() path?

> 
>> +	/* Allocate GCR3 table */
>> +	if (pdom_is_in_pt_mode(pdom) && dev_data->gcr3_info.gcr3_tbl == NULL) {
>> +		ret = setup_gcr3_table(&dev_data->gcr3_info,
>> +				       dev_to_node(dev_data->dev), pasids);
>> +
>> +		/* Update device table */
>> +		amd_iommu_dev_update_dte(dev_data, true);
>> +	}
> 
> This shouldn't be any different from do_attach()

We have allocated new GCR3 table and DTE needs to be updated.

> 
>> +
>> +	return ret;
> 
> success oritened flow please

Ok.

> 
>> +void amd_iommu_gcr3_uninit(struct iommu_dev_data *dev_data)
>> +{
>> +	lockdep_assert_held(&dev_data->lock);
>> +
>> +	/* Free GCR3 table */
>> +	if (pdom_is_in_pt_mode(dev_data->domain)) {
>> +		free_gcr3_table(&dev_data->gcr3_info);
>> +
>> +		/* Update device table */
>> +		amd_iommu_dev_update_dte(dev_data, true);
>> +	}
>> +}
> 
> Is it really worth freeing it?

Why not? But I don't like the domain ID changes every time. So may be OK to drop
the uninit. Will look into it.


> 
>>  static void set_dte_entry(struct amd_iommu *iommu,
>>  			  struct iommu_dev_data *dev_data)
>>  {
>> @@ -1986,9 +2031,12 @@ static int do_attach(struct iommu_dev_data *dev_data,
>>  
>>  	/* Init GCR3 table and update device table */
>>  	if (domain->pd_mode == PD_MODE_V2) {
>> -		/* By default, setup GCR3 table to support single PASID */
>> +		/*
>> +		 * By default, setup GCR3 table to support MAX PASIDs
>> +		 * supported by the IOMMU HW.
>> +		 */
>>  		ret = setup_gcr3_table(&dev_data->gcr3_info,
>> -				       dev_to_node(dev_data->dev), 1);
>> +				       dev_to_node(dev_data->dev), -1);
> 
> -1 should be dev->iommu->max_pasids, but better would be this specific
> device's max pasid size, cached from pci_max_pasids()

Ok will fix it.

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-02-02 15:25   ` Jason Gunthorpe
@ 2024-02-06 17:16     ` Vasant Hegde
  2024-02-06 17:34       ` Jason Gunthorpe
  0 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-02-06 17:16 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/2/2024 8:55 PM, Jason Gunthorpe wrote:
> On Thu, Jan 18, 2024 at 07:33:37AM +0000, Vasant Hegde wrote:
> 
>> +static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
>> +{
>> +	struct device *dev = dev_data->dev;
>> +	int ret = 0;
>> +
>> +	spin_lock(&dev_data->lock);
>> +
>> +	if (is_pasid_enabled(dev_data))
>> +		goto out;
>> +
>> +	if (!amd_iommu_pasid_supported()) {
>> +		ret = -ENODEV;
>> +		goto out;
>> +	}
>> +
>> +	/* attach_device path enables device PASID feature */
>> +	if (!dev_data->pasid_enabled) {
> 
> How many times are we testing for this? Just check if the gcr3 table
> is installed once

One time for IOMMU capability (amd_iommu_pasid_supported()) and one time for
device capability.

is_pasid_enabled() is checked twice (once without lock, so that we can avoid
lock in most cases and one inside lock to be sure no one else entered and
enabled gcr3).

> 
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	ret = amd_iommu_gcr3_init(dev_data, dev->iommu->max_pasids);
>> +
>> +out:
>> +	spin_unlock(&dev_data->lock);
>> +	return ret;
>> +}
> 
> This seems like too much, and it doesn't need to be in a function..
> 
> 1) Check directly if the gcr3 table is installed otherwise try to
> install it. That might be a usefull function

We need other checks to make sure both IOMMU and device is capable of PASID.
Hence its a separate function.


> 
> 2) Precompute the max pasids during device probe and store it in
> iommu_dev_data. Just check if PASID >= max_pasid and fail

This is not required .. as set_dev_pasid() can directly check
dev->iommu->max_pasids.

> 
>> +static void iommu_pasid_disable(struct iommu_dev_data *dev_data)
>> +{
>> +	spin_lock(&dev_data->lock);
>> +
>> +	if (!is_gcr3_table_empty(dev_data))
>> +		goto out;
> 
> Caller already checked this

Yes. Will drop this check.

> 
>> +
>> +	if (dev_data->gcr3_info.gcr3_tbl == NULL)
>> +		goto out;
> 
> Can't happen, right?

Yes. Will drop.

> 
>> +
>> +	amd_iommu_gcr3_uninit(dev_data);
> 
> Just write it out clearly in remove dev pasid
> 
> if (is_gcr3_table_empty(dev_data))
>    amd_iommu_gcr3_uninit(dev_data);
> 

Ok.

>> +static int iommu_setup_pasid_pri(struct iommu_dev_data *dev_data)
>> +{
>> +	struct pci_dev *pdev;
>> +	int ret;
>> +
>> +	if (is_pasid_enabled(dev_data))
>> +		return 0;
>> +
>> +	ret = iommu_pasid_enable(dev_data);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL;
>> +	if (!pdev || !amd_iommu_pdev_pri_supported(pdev))
>> +		return 0;
>> +
>> +	if (!dev_data->pri_enabled)
>> +		return -EINVAL;
>> +
>> +	ret = amd_iommu_iopf_enable_device(dev_data->dev);
>> +
>> +	return ret;
>> +}
>> +
>> +static void remove_dev_pasid(struct pdom_dev_data *pdom_dev_data)
>> +{
>> +	/* Update GCR3 table and flush IOTLB */
>> +	amd_iommu_clear_gcr3(pdom_dev_data->dev_data, pdom_dev_data->pasid);
> 
> This is in the wrong place, there is only one DTE/GCR3 remove_dev is
> touching. The list iteration below is just to clean up the tracking
> list it should not touch HW. Move this up into
> amd_iommu_remove_dev_pasid.

This is used in remove_dev_pasid and sva_mn_release path.

This just clears GCR3[pasid] entry. Its not updating DTE.

> 
> And these functions related to the tracking list should be more
> general and called in more places. Just the tracking list itself
> should have a few patches to create it and situate it generically in
> the driver. Even the RID should be using the same mechanism.

That's after reworking protection domain structure. Not in this series.

> 
>> +int iommu_sva_set_dev_pasid(struct iommu_domain *domain,
>> +			    struct device *dev, ioasid_t pasid)
>> +{
>> +	struct pdom_dev_data *pdom_dev_data;
>> +	struct protection_domain *sva_pdom = to_pdomain(domain);
>> +	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
>> +	unsigned long flags;
>> +	int ret = -EINVAL;
>> +
>> +	/* PASID zero is used for requests from the I/O device without PASID */
>> +	if (pasid == 0 || pasid >= dev->iommu->max_pasids)
>> +		return ret;
>> +
>> +	/* Make sure PASID/PRI is enabled */
>> +	ret = iommu_setup_pasid_pri(dev_data);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Add PASID to protection domain pasid list */
>> +	pdom_dev_data = kzalloc(sizeof(*pdom_dev_data), GFP_KERNEL);
>> +	if (pdom_dev_data == NULL)
>> +		return ret;
>> +
>> +	pdom_dev_data->pasid = pasid;
>> +	pdom_dev_data->dev_data = dev_data;
>> +
>> +	/* Setup GCR3 table */
>> +	ret = amd_iommu_set_gcr3(dev_data, pasid,
>> +				 iommu_virt_to_phys(domain->mm->pgd));
>> +	if (ret) {
>> +		kfree(pdom_dev_data);
>> +		return ret;
>> +	}
>> +
>> +	spin_lock_irqsave(&sva_pdom->lock, flags);
>> +	list_add(&pdom_dev_data->list, &sva_pdom->dev_data_list);
>> +	spin_unlock_irqrestore(&sva_pdom->lock, flags);
> 
> This doesn't seem right? The tracking list should be loaded before any
> change is made visible to the HW, or at least under the same lock.

Ok. I can move that up and then have error handler path to remove it.

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-06 16:36       ` Jason Gunthorpe
@ 2024-02-06 17:29         ` Vasant Hegde
  2024-02-06 17:58           ` Jason Gunthorpe
  0 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-02-06 17:29 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/6/2024 10:06 PM, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2024 at 09:49:36PM +0530, Vasant Hegde wrote:
> 
>>> This dte change is in the wrong place. When the domain is first
>>> attached we know if it requires PRI or not. At that moment the DTE
>>> should be set properly, it should not be set wrong and then changed
>>> later.
>>
>> We want to enable PPR support only after setting up the handler.
> 
> That's backwards. It means error handling can't really be sane..

Why do you think enabling feature only when we are really going to use it backwards?



> 
>>> (and again the whole dte setting flow needs cleaning, I think you
>>> should do that before trying to build more complex stuff on top)
>>
>> Yeah. I want to fix few things in that path. But that's outside this series.
> 
> I was looking at it and there are many security bugs in here now that
> iommufd can change the DTE at any time. The current design assumes DMA
> will be stopped and ignores the spec guidance on how to do a safe DTE
> update :(

What security issues are you referring? Can you elaborate?

> 
> I'm really not comfortable with adding more stuff here until the
> security issue is solved. Especially if the more stuff is drifting
> further from being correct. If you can keep the updates in set_dte
> then maybe with some reluctance. But not like this with random touches
> to the DTE all over the place.

Currently all DTE update is happening inside set_dte only (dirty bit enable is
an exception that may need to moved inside set_dte). This patch just invokes
that set_dte and invalidates cache.

-Vasant

> 
> I was looking at what it would take and got part way through. It is a
> very similar problem to ARM and I drafted a similar solution.
> 
> Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-02-06 17:16     ` Vasant Hegde
@ 2024-02-06 17:34       ` Jason Gunthorpe
  2024-02-07  9:31         ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 17:34 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Tue, Feb 06, 2024 at 10:46:58PM +0530, Vasant Hegde wrote:
> Jason,
> 
> 
> On 2/2/2024 8:55 PM, Jason Gunthorpe wrote:
> > On Thu, Jan 18, 2024 at 07:33:37AM +0000, Vasant Hegde wrote:
> > 
> >> +static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
> >> +{
> >> +	struct device *dev = dev_data->dev;
> >> +	int ret = 0;
> >> +
> >> +	spin_lock(&dev_data->lock);
> >> +
> >> +	if (is_pasid_enabled(dev_data))
> >> +		goto out;
> >> +
> >> +	if (!amd_iommu_pasid_supported()) {
> >> +		ret = -ENODEV;
> >> +		goto out;
> >> +	}
> >> +
> >> +	/* attach_device path enables device PASID feature */
> >> +	if (!dev_data->pasid_enabled) {
> > 
> > How many times are we testing for this? Just check if the gcr3 table
> > is installed once
> 
> One time for IOMMU capability (amd_iommu_pasid_supported()) and one time for
> device capability.

Again I think this whole thing is out of sequence. The main focus
should be on the gcr3 table. You should dirctly know if it has been
installed or not via some direct means. Test all this stuff when you
go to install it the first time.

> is_pasid_enabled() is checked twice (once without lock, so that we can avoid
> lock in most cases and one inside lock to be sure no one else entered and
> enabled gcr3).

That never works, don't do that.

> 
> > 
> >> +		ret = -EINVAL;
> >> +		goto out;
> >> +	}
> >> +
> >> +	ret = amd_iommu_gcr3_init(dev_data, dev->iommu->max_pasids);
> >> +
> >> +out:
> >> +	spin_unlock(&dev_data->lock);
> >> +	return ret;
> >> +}
> > 
> > This seems like too much, and it doesn't need to be in a function..
> > 
> > 1) Check directly if the gcr3 table is installed otherwise try to
> > install it. That might be a usefull function
> 
> We need other checks to make sure both IOMMU and device is capable of PASID.
> Hence its a separate function.

That should be done at probe time and cached in the per-device max
pasid valid. It is 0 if there is no pasid support.

> > 2) Precompute the max pasids during device probe and store it in
> > iommu_dev_data. Just check if PASID >= max_pasid and fail
> 
> This is not required .. as set_dev_pasid() can directly check
> dev->iommu->max_pasids.

That is not the same thing, dev->iommu->max_pasids is the PCI/DT
capability only.

The driver still has to keep its own internal limit.

It is goofy and we should probably merge the driver limit and the core
limit at some point - until then it is better to follow the pattern
and keep a driver limit in the driver, doing all the work at probe
time not during attach.

> >> +static void remove_dev_pasid(struct pdom_dev_data *pdom_dev_data)
> >> +{
> >> +	/* Update GCR3 table and flush IOTLB */
> >> +	amd_iommu_clear_gcr3(pdom_dev_data->dev_data, pdom_dev_data->pasid);
> > 
> > This is in the wrong place, there is only one DTE/GCR3 remove_dev is
> > touching. The list iteration below is just to clean up the tracking
> > list it should not touch HW. Move this up into
> > amd_iommu_remove_dev_pasid.
> 
> This is used in remove_dev_pasid and sva_mn_release path.
> 
> This just clears GCR3[pasid] entry. Its not updating DTE.

Same argument, it should not be iterating, there is only ever one.

The release path is not removing the PASID, it is just disabling the
GCR3 entry. The domain is still logically connected to that PASID as
far as everything else is concerned.

> > And these functions related to the tracking list should be more
> > general and called in more places. Just the tracking list itself
> > should have a few patches to create it and situate it generically in
> > the driver. Even the RID should be using the same mechanism.
> 
> That's after reworking protection domain structure. Not in this series.

:( I wish you'd get this stuff cleaned up properly first instead of
building more mess on top of the wrong design.
 
> >> +	/* Setup GCR3 table */
> >> +	ret = amd_iommu_set_gcr3(dev_data, pasid,
> >> +				 iommu_virt_to_phys(domain->mm->pgd));
> >> +	if (ret) {
> >> +		kfree(pdom_dev_data);
> >> +		return ret;
> >> +	}
> >> +
> >> +	spin_lock_irqsave(&sva_pdom->lock, flags);
> >> +	list_add(&pdom_dev_data->list, &sva_pdom->dev_data_list);
> >> +	spin_unlock_irqrestore(&sva_pdom->lock, flags);
> > 
> > This doesn't seem right? The tracking list should be loaded before any
> > change is made visible to the HW, or at least under the same lock.
> 
> Ok. I can move that up and then have error handler path to remove it.

Because, like this shows, it is hard to get it all right and it is
even harder when everything is not consistent and there are two
versions of the same flows.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-06 17:29         ` Vasant Hegde
@ 2024-02-06 17:58           ` Jason Gunthorpe
  2024-02-07  8:58             ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 17:58 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Tue, Feb 06, 2024 at 10:59:03PM +0530, Vasant Hegde wrote:
> On 2/6/2024 10:06 PM, Jason Gunthorpe wrote:
> > On Tue, Feb 06, 2024 at 09:49:36PM +0530, Vasant Hegde wrote:
> > 
> >>> This dte change is in the wrong place. When the domain is first
> >>> attached we know if it requires PRI or not. At that moment the DTE
> >>> should be set properly, it should not be set wrong and then changed
> >>> later.
> >>
> >> We want to enable PPR support only after setting up the handler.
> > 
> > That's backwards. It means error handling can't really be sane..
> 
> Why do you think enabling feature only when we are really going to
> use it backwards?

Because you touch the DTE twice, it means the domain is installed in
an inconsistent state where it is not actually working
properly. Domain updates should not "tear" like that.

You already know what is going to happen at the very start of attach,
you don't need to "enable it after" just do it right the first time
through.

There is a clear protocol and ordering requirement for the PRI
enablement. Lu described it in a comment, make sure you follow it.

You also have to think about what happens during detach and what
happens on all the pairs of attach -> attach.

The error unwinds are tricky, the only way I could make it all be
correct for ARM was to fix the attach handling so that there is no
failure scenario after the DTE is updated. ie the attach functions
either do nothing or fully succeed.

The situation where attach fails and leaves the HW in an unknown state
is really hard to deal with - and without the reliable global blocked
domain the core code can't 100% rescue it either.

> >>> (and again the whole dte setting flow needs cleaning, I think you
> >>> should do that before trying to build more complex stuff on top)
> >>
> >> Yeah. I want to fix few things in that path. But that's outside this series.
> > 
> > I was looking at it and there are many security bugs in here now that
> > iommufd can change the DTE at any time. The current design assumes DMA
> > will be stopped and ignores the spec guidance on how to do a safe DTE
> > update :(
> 
> What security issues are you referring? Can you elaborate?

The DTE is not updated correctly. The HW can read inconsistent
versions of it with unpredictable - and possibly security bad -
results.

Like it doesn't even write the two qwords of the DTE in a predicatble
order! Let alone worrying about the 3 qw update or being correct with
races during an ITE touch :(

This doesn't matter so much if there is no DMA active while the DTE is
being changed, which could sort of reasonably be assumed up till
iommufd allowed it to happen under userspace control.

Now a driver cannot make the assumption that DMA is halted. It must
follow all the protocols to ensure that HW observes only exactly the
DTEs/etc it is trying to build and not something random.

The documentation is pretty clear how this is supposed to work. It is
the same as ARM. Use atomic 64/128 bit stores, rely on 'ignored
behavior' or use the valid bit.

This also means, broadly, you can't allow the DTE to evolve during the
operation of attach/detach as the in-between states may become
userspace visible and may be harmful in some way.

> > I'm really not comfortable with adding more stuff here until the
> > security issue is solved. Especially if the more stuff is drifting
> > further from being correct. If you can keep the updates in set_dte
> > then maybe with some reluctance. But not like this with random touches
> > to the DTE all over the place.
>
> Currently all DTE update is happening inside set_dte only (dirty bit enable is
> an exception that may need to moved inside set_dte). This patch just invokes
> that set_dte and invalidates cache.

So then why all this strangeness?? Just set dev_data->ppr earlier in
attach and order the handler setup properly.

It should be really simple:

 // All protected by the core's group mutex

 if (domain->needs_pri)  {
     dev_data->ppr = true;
     if (!dev_data->num_pri_domains)
       // enable fault queues for the device
     dev_data->num_pri_domains++
 }

 if (old_domain->needs_pri) {
     dev_data->num_pri_domains--;
     if (!dev_data->num_pri_domains) {
         dev_data->ppr = false;
         disable pri at PCI()
     }
 }

 set_dte()

 if (domain->needs_pr)
    enable pri at PCI()

The order here is really important too!

Since PRI can only be supported when a GCR3 is present, this should
all be part of some generic 'install GCR3 table DTE' routine that is
called on all the attach paths.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-06 17:58           ` Jason Gunthorpe
@ 2024-02-07  8:58             ` Vasant Hegde
  2024-02-07 12:36               ` Baolu Lu
  2024-02-08 17:31               ` Jason Gunthorpe
  0 siblings, 2 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-02-07  8:58 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel



On 2/6/2024 11:28 PM, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2024 at 10:59:03PM +0530, Vasant Hegde wrote:
>> On 2/6/2024 10:06 PM, Jason Gunthorpe wrote:
>>> On Tue, Feb 06, 2024 at 09:49:36PM +0530, Vasant Hegde wrote:
>>>
>>>>> This dte change is in the wrong place. When the domain is first
>>>>> attached we know if it requires PRI or not. At that moment the DTE
>>>>> should be set properly, it should not be set wrong and then changed
>>>>> later.
>>>>
>>>> We want to enable PPR support only after setting up the handler.
>>>
>>> That's backwards. It means error handling can't really be sane..
>>
>> Why do you think enabling feature only when we are really going to
>> use it backwards?
> 
> Because you touch the DTE twice, it means the domain is installed in
> an inconsistent state where it is not actually working
> properly. Domain updates should not "tear" like that.
> 
> You already know what is going to happen at the very start of attach,
> you don't need to "enable it after" just do it right the first time
> through.

First time we will not know whether device will actually use fault handler or
not. All we will know is whether IOMMU and device is capable of PRI or not.

We can make assumption that it may use and just enable it.


> 
> There is a clear protocol and ordering requirement for the PRI
> enablement. Lu described it in a comment, make sure you follow it.

Where? in intel driver? (they seems to be using feature_enable() path)

I did look into latest "iommu: Prepare to deliver page faults to user space"
series. I don't see anything specific to PRI enablement flow.


> 
> You also have to think about what happens during detach and what
> happens on all the pairs of attach -> attach.

ok.

> 
> The error unwinds are tricky, the only way I could make it all be
> correct for ARM was to fix the attach handling so that there is no
> failure scenario after the DTE is updated. ie the attach functions
> either do nothing or fully succeed.
> 
> The situation where attach fails and leaves the HW in an unknown state
> is really hard to deal with - and without the reliable global blocked
> domain the core code can't 100% rescue it either.

If attach fails we throw error message and skip updating DTE. I believe core
layer understands that driver failed to attach device and puts device/group to
its original domain. So things should work fine.

> 
>>>>> (and again the whole dte setting flow needs cleaning, I think you
>>>>> should do that before trying to build more complex stuff on top)
>>>>
>>>> Yeah. I want to fix few things in that path. But that's outside this series.
>>>
>>> I was looking at it and there are many security bugs in here now that
>>> iommufd can change the DTE at any time. The current design assumes DMA
>>> will be stopped and ignores the spec guidance on how to do a safe DTE
>>> update :(
>>
>> What security issues are you referring? Can you elaborate?
> 
> The DTE is not updated correctly. The HW can read inconsistent
> versions of it with unpredictable - and possibly security bad -
> results.
> 
> Like it doesn't even write the two qwords of the DTE in a predicatble
> order! Let alone worrying about the 3 qw update or being correct with
> races during an ITE touch :(
> 
> This doesn't matter so much if there is no DMA active while the DTE is
> being changed, which could sort of reasonably be assumed up till
> iommufd allowed it to happen under userspace control.
> 
> Now a driver cannot make the assumption that DMA is halted. It must
> follow all the protocols to ensure that HW observes only exactly the
> DTEs/etc it is trying to build and not something random.
> 
> The documentation is pretty clear how this is supposed to work. It is
> the same as ARM. Use atomic 64/128 bit stores, rely on 'ignored
> behavior' or use the valid bit.
> 
> This also means, broadly, you can't allow the DTE to evolve during the
> operation of attach/detach as the in-between states may become
> userspace visible and may be harmful in some way.

We make DTE changes in set_dte() function only (except dirty bit change that
will be consolidated).


> 
>>> I'm really not comfortable with adding more stuff here until the
>>> security issue is solved. Especially if the more stuff is drifting
>>> further from being correct. If you can keep the updates in set_dte
>>> then maybe with some reluctance. But not like this with random touches
>>> to the DTE all over the place.
>>
>> Currently all DTE update is happening inside set_dte only (dirty bit enable is
>> an exception that may need to moved inside set_dte). This patch just invokes
>> that set_dte and invalidates cache.
> 
> So then why all this strangeness?? Just set dev_data->ppr earlier in
> attach and order the handler setup properly.
> 
> It should be really simple:
> 
>  // All protected by the core's group mutex
> 
>  if (domain->needs_pri)  {
>      dev_data->ppr = true;
>      if (!dev_data->num_pri_domains)

What is PRI domain?

If I have to enable PRI in attach path then I don't need to track number of
domain stuff. I can simply do something like
	if (pdom_is_sva_capable(pdom))
		// enable PRI in IOMMU
		// enable device PRI

and in detach path,
	if (PRI is enabled)
		// disable IOMMU/device PRI stuff


-Vasant

>        // enable fault queues for the device
>      dev_data->num_pri_domains++
>  }
> 
>  if (old_domain->needs_pri) {
>      dev_data->num_pri_domains--;
>      if (!dev_data->num_pri_domains) {
>          dev_data->ppr = false;
>          disable pri at PCI()
>      }
>  }
> 
>  set_dte()
> 
>  if (domain->needs_pr)
>     enable pri at PCI()
> 
> The order here is really important too!
> 
> Since PRI can only be supported when a GCR3 is present, this should
> all be part of some generic 'install GCR3 table DTE' routine that is
> called on all the attach paths.
> 
> Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-02-06 17:34       ` Jason Gunthorpe
@ 2024-02-07  9:31         ` Vasant Hegde
  2024-02-08 17:41           ` Jason Gunthorpe
  0 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-02-07  9:31 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/6/2024 11:04 PM, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2024 at 10:46:58PM +0530, Vasant Hegde wrote:
>> Jason,
>>
>>
>> On 2/2/2024 8:55 PM, Jason Gunthorpe wrote:
>>> On Thu, Jan 18, 2024 at 07:33:37AM +0000, Vasant Hegde wrote:
>>>
>>>> +static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
>>>> +{
>>>> +	struct device *dev = dev_data->dev;
>>>> +	int ret = 0;
>>>> +
>>>> +	spin_lock(&dev_data->lock);
>>>> +
>>>> +	if (is_pasid_enabled(dev_data))
>>>> +		goto out;
>>>> +
>>>> +	if (!amd_iommu_pasid_supported()) {
>>>> +		ret = -ENODEV;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* attach_device path enables device PASID feature */
>>>> +	if (!dev_data->pasid_enabled) {
>>>
>>> How many times are we testing for this? Just check if the gcr3 table
>>> is installed once
>>
>> One time for IOMMU capability (amd_iommu_pasid_supported()) and one time for
>> device capability.
> 
> Again I think this whole thing is out of sequence. The main focus
> should be on the gcr3 table. You should dirctly know if it has been
> installed or not via some direct means. Test all this stuff when you
> go to install it the first time.

We are doing 1 SVA domain : 1 PASID : N devices model. That means we need to
check all these things while attaching *first* PASID to device. (That's what we
had discussed sometime back when I had all these things in feature_enable(SVA)
path).

Ex: If device is in domain with V1 page table tries to attach PASID it should fail
   If device is in domain of PT mode, then we should setup gcr3.

So I don't see how we can infer these things unless we have something like
enable_feature(PASID).. which would have taken care of setting up things.


> 
>> is_pasid_enabled() is checked twice (once without lock, so that we can avoid
>> lock in most cases and one inside lock to be sure no one else entered and
>> enabled gcr3).
> 
> That never works, don't do that.
> 

It does work as second one is lock protected. And the intention was to avoid
locking for attaching second PASID onwards. Only for first time attaching PASID
to device we will have check for two times.


>>
>>>
>>>> +		ret = -EINVAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = amd_iommu_gcr3_init(dev_data, dev->iommu->max_pasids);
>>>> +
>>>> +out:
>>>> +	spin_unlock(&dev_data->lock);
>>>> +	return ret;
>>>> +}
>>>
>>> This seems like too much, and it doesn't need to be in a function..
>>>
>>> 1) Check directly if the gcr3 table is installed otherwise try to
>>> install it. That might be a usefull function
>>
>> We need other checks to make sure both IOMMU and device is capable of PASID.
>> Hence its a separate function.
> 
> That should be done at probe time and cached in the per-device max
> pasid valid. It is 0 if there is no pasid support.

Ok. Yet another variable!

> 
>>> 2) Precompute the max pasids during device probe and store it in
>>> iommu_dev_data. Just check if PASID >= max_pasid and fail
>>
>> This is not required .. as set_dev_pasid() can directly check
>> dev->iommu->max_pasids.
> 
> That is not the same thing, dev->iommu->max_pasids is the PCI/DT
> capability only.

Its min of PCI device and corresponding IOMMU (see dev_iommu_get_max_pasids()).

> 
> The driver still has to keep its own internal limit.
> 
> It is goofy and we should probably merge the driver limit and the core
> limit at some point - until then it is better to follow the pattern
> and keep a driver limit in the driver, doing all the work at probe
> time not during attach.
> 
>>>> +static void remove_dev_pasid(struct pdom_dev_data *pdom_dev_data)
>>>> +{
>>>> +	/* Update GCR3 table and flush IOTLB */
>>>> +	amd_iommu_clear_gcr3(pdom_dev_data->dev_data, pdom_dev_data->pasid);
>>>
>>> This is in the wrong place, there is only one DTE/GCR3 remove_dev is
>>> touching. The list iteration below is just to clean up the tracking
>>> list it should not touch HW. Move this up into
>>> amd_iommu_remove_dev_pasid.
>>
>> This is used in remove_dev_pasid and sva_mn_release path.
>>
>> This just clears GCR3[pasid] entry. Its not updating DTE.
> 
> Same argument, it should not be iterating, there is only ever one.

We can have N device in same SVA domain. So we need to iterate to find the right
entry in the list. So the flow is :
   remove_dev_pasid()
	// iterate over dev_data_list to find dev/pasid combination
	// Update gcr3[pasid]
	// remove entry from the list

I can take out 'update gcr3[pasid]' from remove_dev_pasid() and put it in
sva_mn_release() and amd_iommu_remove_dev_pasid().. Just that its duplicate code!


> 
> The release path is not removing the PASID, it is just disabling the
> GCR3 entry. The domain is still logically connected to that PASID as
> far as everything else is concerned.

Right.. But for device side we need to clear the entry.

> 
>>> And these functions related to the tracking list should be more
>>> general and called in more places. Just the tracking list itself
>>> should have a few patches to create it and situate it generically in
>>> the driver. Even the RID should be using the same mechanism.
>>
>> That's after reworking protection domain structure. Not in this series.
> 
> :( I wish you'd get this stuff cleaned up properly first instead of
> building more mess on top of the wrong design.
>  
>>>> +	/* Setup GCR3 table */
>>>> +	ret = amd_iommu_set_gcr3(dev_data, pasid,
>>>> +				 iommu_virt_to_phys(domain->mm->pgd));
>>>> +	if (ret) {
>>>> +		kfree(pdom_dev_data);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	spin_lock_irqsave(&sva_pdom->lock, flags);
>>>> +	list_add(&pdom_dev_data->list, &sva_pdom->dev_data_list);
>>>> +	spin_unlock_irqrestore(&sva_pdom->lock, flags);
>>>
>>> This doesn't seem right? The tracking list should be loaded before any
>>> change is made visible to the HW, or at least under the same lock.
>>
>> Ok. I can move that up and then have error handler path to remove it.
> 
> Because, like this shows, it is hard to get it all right and it is
> even harder when everything is not consistent and there are two
> versions of the same flows.

Ok.

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-07  8:58             ` Vasant Hegde
@ 2024-02-07 12:36               ` Baolu Lu
  2024-02-07 18:00                 ` Vasant Hegde
  2024-02-08 17:31               ` Jason Gunthorpe
  1 sibling, 1 reply; 45+ messages in thread
From: Baolu Lu @ 2024-02-07 12:36 UTC (permalink / raw)
  To: Vasant Hegde, Jason Gunthorpe
  Cc: baolu.lu, iommu, joro, suravee.suthikulpanit, wei.huang2,
	jsnitsel

On 2024/2/7 16:58, Vasant Hegde wrote:
>> There is a clear protocol and ordering requirement for the PRI
>> enablement. Lu described it in a comment, make sure you follow it.
> Where? in intel driver? (they seems to be using feature_enable() path)
> 
> I did look into latest "iommu: Prepare to deliver page faults to user space"
> series. I don't see anything specific to PRI enablement flow.

The latest version is here.

https://lore.kernel.org/linux-iommu/20240207013325.95182-14-baolu.lu@linux.intel.com/

Best regards,
baolu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-07 12:36               ` Baolu Lu
@ 2024-02-07 18:00                 ` Vasant Hegde
  0 siblings, 0 replies; 45+ messages in thread
From: Vasant Hegde @ 2024-02-07 18:00 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Baolu,

On 2/7/2024 6:06 PM, Baolu Lu wrote:
> On 2024/2/7 16:58, Vasant Hegde wrote:
>>> There is a clear protocol and ordering requirement for the PRI
>>> enablement. Lu described it in a comment, make sure you follow it.
>> Where? in intel driver? (they seems to be using feature_enable() path)
>>
>> I did look into latest "iommu: Prepare to deliver page faults to user space"
>> series. I don't see anything specific to PRI enablement flow.
> 
> The latest version is here.
> 
> https://lore.kernel.org/linux-iommu/20240207013325.95182-14-baolu.lu@linux.intel.com/
> 

Thanks!

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-07  8:58             ` Vasant Hegde
  2024-02-07 12:36               ` Baolu Lu
@ 2024-02-08 17:31               ` Jason Gunthorpe
  2024-02-08 18:37                 ` Vasant Hegde
  1 sibling, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 17:31 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Wed, Feb 07, 2024 at 02:28:08PM +0530, Vasant Hegde wrote:

> > You already know what is going to happen at the very start of attach,
> > you don't need to "enable it after" just do it right the first time
> > through.
> 
> First time we will not know whether device will actually use fault handler or
> not. All we will know is whether IOMMU and device is capable of PRI or not.

I don't understand this, you should know all of this before you get to
setting the DTE. What is missing?

> > The situation where attach fails and leaves the HW in an unknown state
> > is really hard to deal with - and without the reliable global blocked
> > domain the core code can't 100% rescue it either.
> 
> If attach fails we throw error message and skip updating DTE. I believe core
> layer understands that driver failed to attach device and puts device/group to
> its original domain. So things should work fine.

It is not just the DTE, the whole thing including changing PCIE config
space and so forth has to be kept correct.

> > This also means, broadly, you can't allow the DTE to evolve during the
> > operation of attach/detach as the in-between states may become
> > userspace visible and may be harmful in some way.
> 
> We make DTE changes in set_dte() function only (except dirty bit change that
> will be consolidated).

Sure, but set_dte doesn't take care to sequence the update.

> >>> security issue is solved. Especially if the more stuff is drifting
> >>> further from being correct. If you can keep the updates in set_dte
> >>> then maybe with some reluctance. But not like this with random touches
> >>> to the DTE all over the place.
> >>
> >> Currently all DTE update is happening inside set_dte only (dirty bit enable is
> >> an exception that may need to moved inside set_dte). This patch just invokes
> >> that set_dte and invalidates cache.
> > 
> > So then why all this strangeness?? Just set dev_data->ppr earlier in
> > attach and order the handler setup properly.
> > 
> > It should be really simple:
> > 
> >  // All protected by the core's group mutex
> > 
> >  if (domain->needs_pri)  {
> >      dev_data->ppr = true;
> >      if (!dev_data->num_pri_domains)
> 
> What is PRI domain?

Right now it is only a SVA domain, it is a domain that wishes to use
PRI. Quite soon we are going to expand this to PAGING domains as well.

> If I have to enable PRI in attach path then I don't need to track number of
> domain stuff. I can simply do something like
> 	if (pdom_is_sva_capable(pdom))
> 		// enable PRI in IOMMU
> 		// enable device PRI
> 
> and in detach path,
> 	if (PRI is enabled)
> 		// disable IOMMU/device PRI stuff

Each PASID can have PRI on or not, so you need to keep track of how
many PASIDs are using PRI at any moment and keep things in sync that
way.

If there are no PRI handlers installed then the PCI config space
should disable PRI and all the PRI bits flushed and disabled.

At some point you need to determine if any PASIDs have domains that
need PRI. A counter is a simple solution.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-02-07  9:31         ` Vasant Hegde
@ 2024-02-08 17:41           ` Jason Gunthorpe
  2024-02-08 18:23             ` Vasant Hegde
  0 siblings, 1 reply; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 17:41 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Wed, Feb 07, 2024 at 03:01:22PM +0530, Vasant Hegde wrote:
> Jason,
> 
> 
> On 2/6/2024 11:04 PM, Jason Gunthorpe wrote:
> > On Tue, Feb 06, 2024 at 10:46:58PM +0530, Vasant Hegde wrote:
> >> Jason,
> >>
> >>
> >> On 2/2/2024 8:55 PM, Jason Gunthorpe wrote:
> >>> On Thu, Jan 18, 2024 at 07:33:37AM +0000, Vasant Hegde wrote:
> >>>
> >>>> +static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
> >>>> +{
> >>>> +	struct device *dev = dev_data->dev;
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	spin_lock(&dev_data->lock);
> >>>> +
> >>>> +	if (is_pasid_enabled(dev_data))
> >>>> +		goto out;
> >>>> +
> >>>> +	if (!amd_iommu_pasid_supported()) {
> >>>> +		ret = -ENODEV;
> >>>> +		goto out;
> >>>> +	}
> >>>> +
> >>>> +	/* attach_device path enables device PASID feature */
> >>>> +	if (!dev_data->pasid_enabled) {
> >>>
> >>> How many times are we testing for this? Just check if the gcr3 table
> >>> is installed once
> >>
> >> One time for IOMMU capability (amd_iommu_pasid_supported()) and one time for
> >> device capability.
> > 
> > Again I think this whole thing is out of sequence. The main focus
> > should be on the gcr3 table. You should dirctly know if it has been
> > installed or not via some direct means. Test all this stuff when you
> > go to install it the first time.
> 
> We are doing 1 SVA domain : 1 PASID : N devices model. That means we need to
> check all these things while attaching *first* PASID to device. (That's what we
> had discussed sometime back when I had all these things in feature_enable(SVA)
> path).

I thought I said you 1 PASID is not technically correct, but you could
get away it it for a short term. You should be planning to support N
PASIDs because that is what the API defines.

> Ex: If device is in domain with V1 page table tries to attach PASID it should fail
>    If device is in domain of PT mode, then we should setup gcr3.

Yes
 
> So I don't see how we can infer these things unless we have something like
> enable_feature(PASID).. which would have taken care of setting up things.

You just trivially check these conditions at the top of set_dev_pasid, look at
what I did for ARM:

	if (smmu_domain->smmu != master->smmu || pasid == IOMMU_NO_PASID)
		return -EINVAL;

	if (!master->cd_table.in_ste &&
	    sid_domain->type != IOMMU_DOMAIN_IDENTITY &&
	    sid_domain->type != IOMMU_DOMAIN_BLOCKED)
		return -EINVAL;

For AMD "cd_table in_ste" means that the DTE points to a GCR3 table
already. This is trivially tracked in the DTE programming in set_dte
because you know if the dte is being configured with a GCR3 or not.

The other cases represent the situations where we know how to upgrade a DTE
for those domains to one with a GCR3. Everything else is blocked.

> >> is_pasid_enabled() is checked twice (once without lock, so that we can avoid
> >> lock in most cases and one inside lock to be sure no one else entered and
> >> enabled gcr3).
> > 
> > That never works, don't do that.
> 
> It does work as second one is lock protected. And the intention was to avoid
> locking for attaching second PASID onwards. Only for first time attaching PASID
> to device we will have check for two times.

You can get racy false negatives, that are hard to reason about.

 CPU0                               CPU 1
                                    lock()
                                    change state to !a
if (!a)
  forget it
                                    a = false
                                    unlock()
lock()
if (!a)
   forget it
change state to a
a = true
unlock()

There are ways with atomics you can build those kinds of lockless
patterns, but they are tricky and not justified here.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-02-08 17:41           ` Jason Gunthorpe
@ 2024-02-08 18:23             ` Vasant Hegde
  2024-02-08 18:48               ` Jason Gunthorpe
  0 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-02-08 18:23 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel



On 2/8/2024 11:11 PM, Jason Gunthorpe wrote:
> On Wed, Feb 07, 2024 at 03:01:22PM +0530, Vasant Hegde wrote:
>> Jason,
>>
>>
>> On 2/6/2024 11:04 PM, Jason Gunthorpe wrote:
>>> On Tue, Feb 06, 2024 at 10:46:58PM +0530, Vasant Hegde wrote:
>>>> Jason,
>>>>
>>>>
>>>> On 2/2/2024 8:55 PM, Jason Gunthorpe wrote:
>>>>> On Thu, Jan 18, 2024 at 07:33:37AM +0000, Vasant Hegde wrote:
>>>>>
>>>>>> +static int iommu_pasid_enable(struct iommu_dev_data *dev_data)
>>>>>> +{
>>>>>> +	struct device *dev = dev_data->dev;
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	spin_lock(&dev_data->lock);
>>>>>> +
>>>>>> +	if (is_pasid_enabled(dev_data))
>>>>>> +		goto out;
>>>>>> +
>>>>>> +	if (!amd_iommu_pasid_supported()) {
>>>>>> +		ret = -ENODEV;
>>>>>> +		goto out;
>>>>>> +	}
>>>>>> +
>>>>>> +	/* attach_device path enables device PASID feature */
>>>>>> +	if (!dev_data->pasid_enabled) {
>>>>>
>>>>> How many times are we testing for this? Just check if the gcr3 table
>>>>> is installed once
>>>>
>>>> One time for IOMMU capability (amd_iommu_pasid_supported()) and one time for
>>>> device capability.
>>>
>>> Again I think this whole thing is out of sequence. The main focus
>>> should be on the gcr3 table. You should dirctly know if it has been
>>> installed or not via some direct means. Test all this stuff when you
>>> go to install it the first time.
>>
>> We are doing 1 SVA domain : 1 PASID : N devices model. That means we need to
>> check all these things while attaching *first* PASID to device. (That's what we
>> had discussed sometime back when I had all these things in feature_enable(SVA)
>> path).
> 
> I thought I said you 1 PASID is not technically correct, but you could
> get away it it for a short term. You should be planning to support N
> PASIDs because that is what the API defines.

I am describing API in SVA context only. In that flow once we allocate a SVA
domain for a PASID, then all devices for that PASID is attached to same domain.

> 
>> Ex: If device is in domain with V1 page table tries to attach PASID it should fail
>>    If device is in domain of PT mode, then we should setup gcr3.
> 
> Yes
>  
>> So I don't see how we can infer these things unless we have something like
>> enable_feature(PASID).. which would have taken care of setting up things.
> 
> You just trivially check these conditions at the top of set_dev_pasid, look at
> what I did for ARM:
> 
> 	if (smmu_domain->smmu != master->smmu || pasid == IOMMU_NO_PASID)
> 		return -EINVAL;
> 
> 	if (!master->cd_table.in_ste &&
> 	    sid_domain->type != IOMMU_DOMAIN_IDENTITY &&
> 	    sid_domain->type != IOMMU_DOMAIN_BLOCKED)
> 		return -EINVAL;
> 
> For AMD "cd_table in_ste" means that the DTE points to a GCR3 table
> already. This is trivially tracked in the DTE programming in set_dte
> because you know if the dte is being configured with a GCR3 or not.
> 
> The other cases represent the situations where we know how to upgrade a DTE
> for those domains to one with a GCR3. Everything else is blocked.

I got something similar. I have reworked set/remove pasid path completely.
I will try to post it soon.

> 
>>>> is_pasid_enabled() is checked twice (once without lock, so that we can avoid
>>>> lock in most cases and one inside lock to be sure no one else entered and
>>>> enabled gcr3).
>>>
>>> That never works, don't do that.
>>
>> It does work as second one is lock protected. And the intention was to avoid
>> locking for attaching second PASID onwards. Only for first time attaching PASID
>> to device we will have check for two times.
> 
> You can get racy false negatives, that are hard to reason about.
> 
>  CPU0                               CPU 1
>                                     lock()
>                                     change state to !a
> if (!a)
>   forget it
>                                     a = false
>                                     unlock()
> lock()
> if (!a)
>    forget it
> change state to a
> a = true
> unlock()

I don't hit above like scenario as its enable once/check awlays until all PASID
binding is removed.  Anyway I have reworked this part completely and removed
most of these checks.


-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-08 17:31               ` Jason Gunthorpe
@ 2024-02-08 18:37                 ` Vasant Hegde
  2024-02-08 19:03                   ` Jason Gunthorpe
  0 siblings, 1 reply; 45+ messages in thread
From: Vasant Hegde @ 2024-02-08 18:37 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

Jason,


On 2/8/2024 11:01 PM, Jason Gunthorpe wrote:
> On Wed, Feb 07, 2024 at 02:28:08PM +0530, Vasant Hegde wrote:
> 
>>> You already know what is going to happen at the very start of attach,
>>> you don't need to "enable it after" just do it right the first time
>>> through.
>>
>> First time we will not know whether device will actually use fault handler or
>> not. All we will know is whether IOMMU and device is capable of PRI or not.
> 
> I don't understand this, you should know all of this before you get to
> setting the DTE. What is missing?

In attach path we will know the device capabilities (like PRI) but we will not
know whether device is going to use it or not. Its like device has capability,
let us enable it without assuming device may use it.

IMO enable_feature() was better place as device had a control and we would have
enabled only when its needed.

Anyway for now I have moved device PRI and IOPF handler to attach device path.
Will try to post it soon.

> 
>>> The situation where attach fails and leaves the HW in an unknown state
>>> is really hard to deal with - and without the reliable global blocked
>>> domain the core code can't 100% rescue it either.
>>
>> If attach fails we throw error message and skip updating DTE. I believe core
>> layer understands that driver failed to attach device and puts device/group to
>> its original domain. So things should work fine.
> 
> It is not just the DTE, the whole thing including changing PCIE config
> space and so forth has to be kept correct.
> 
>>> This also means, broadly, you can't allow the DTE to evolve during the
>>> operation of attach/detach as the in-between states may become
>>> userspace visible and may be harmful in some way.
>>
>> We make DTE changes in set_dte() function only (except dirty bit change that
>> will be consolidated).
> 
> Sure, but set_dte doesn't take care to sequence the update.
> 
>>>>> security issue is solved. Especially if the more stuff is drifting
>>>>> further from being correct. If you can keep the updates in set_dte
>>>>> then maybe with some reluctance. But not like this with random touches
>>>>> to the DTE all over the place.
>>>>
>>>> Currently all DTE update is happening inside set_dte only (dirty bit enable is
>>>> an exception that may need to moved inside set_dte). This patch just invokes
>>>> that set_dte and invalidates cache.
>>>
>>> So then why all this strangeness?? Just set dev_data->ppr earlier in
>>> attach and order the handler setup properly.
>>>
>>> It should be really simple:
>>>
>>>  // All protected by the core's group mutex
>>>
>>>  if (domain->needs_pri)  {
>>>      dev_data->ppr = true;
>>>      if (!dev_data->num_pri_domains)
>>
>> What is PRI domain?
> 
> Right now it is only a SVA domain, it is a domain that wishes to use
> PRI. Quite soon we are going to expand this to PAGING domains as well.
> 
>> If I have to enable PRI in attach path then I don't need to track number of
>> domain stuff. I can simply do something like
>> 	if (pdom_is_sva_capable(pdom))
>> 		// enable PRI in IOMMU
>> 		// enable device PRI
>>
>> and in detach path,
>> 	if (PRI is enabled)
>> 		// disable IOMMU/device PRI stuff
> 
> Each PASID can have PRI on or not, so you need to keep track of how
> many PASIDs are using PRI at any moment and keep things in sync that
> way.
> 
> If there are no PRI handlers installed then the PCI config space
> should disable PRI and all the PRI bits flushed and disabled.

If we don't have handler then PRI is disabled in attach device path only.

So in detach path, if number of PASIDs are zero then we are good to disable PRI
(at least for the current usage model).

> 
> At some point you need to determine if any PASIDs have domains that
> need PRI. A counter is a simple solution.

That we can implement it as use case evolves.

-Vasant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU
  2024-02-08 18:23             ` Vasant Hegde
@ 2024-02-08 18:48               ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 18:48 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Thu, Feb 08, 2024 at 11:53:58PM +0530, Vasant Hegde wrote:
> >> We are doing 1 SVA domain : 1 PASID : N devices model. That means we need to
> >> check all these things while attaching *first* PASID to device. (That's what we
> >> had discussed sometime back when I had all these things in feature_enable(SVA)
> >> path).
> > 
> > I thought I said you 1 PASID is not technically correct, but you could
> > get away it it for a short term. You should be planning to support N
> > PASIDs because that is what the API defines.
> 
> I am describing API in SVA context only. In that flow once we allocate a SVA
> domain for a PASID, then all devices for that PASID is attached to
> same domain.

It is OK to take a shortcut here if it helps, but that is not the API
model. Domains, SVA or not, do not have PASIDs. The only place a PASID
comes is via the set_dev_pasid() argument and any domain should be
attachable to any combination of PASIDs.

So the only place a PASID should be stored is in the elements of the
invalidation tracking list of the domain. When you get that all
completed it should all be uniform and a limitation like this should
go away. But this is fine to hold over for a followup that completes
the PASID support.

> > You just trivially check these conditions at the top of set_dev_pasid, look at
> > what I did for ARM:
> > 
> > 	if (smmu_domain->smmu != master->smmu || pasid == IOMMU_NO_PASID)
> > 		return -EINVAL;
> > 
> > 	if (!master->cd_table.in_ste &&
> > 	    sid_domain->type != IOMMU_DOMAIN_IDENTITY &&
> > 	    sid_domain->type != IOMMU_DOMAIN_BLOCKED)
> > 		return -EINVAL;
> > 
> > For AMD "cd_table in_ste" means that the DTE points to a GCR3 table
> > already. This is trivially tracked in the DTE programming in set_dte
> > because you know if the dte is being configured with a GCR3 or not.
> > 
> > The other cases represent the situations where we know how to upgrade a DTE
> > for those domains to one with a GCR3. Everything else is blocked.
> 
> I got something similar. I have reworked set/remove pasid path completely.
> I will try to post it soon.

Nice!

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF
  2024-02-08 18:37                 ` Vasant Hegde
@ 2024-02-08 19:03                   ` Jason Gunthorpe
  0 siblings, 0 replies; 45+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 19:03 UTC (permalink / raw)
  To: Vasant Hegde; +Cc: iommu, joro, suravee.suthikulpanit, wei.huang2, jsnitsel

On Fri, Feb 09, 2024 at 12:07:16AM +0530, Vasant Hegde wrote:
> On 2/8/2024 11:01 PM, Jason Gunthorpe wrote:
> > On Wed, Feb 07, 2024 at 02:28:08PM +0530, Vasant Hegde wrote:
> > 
> >>> You already know what is going to happen at the very start of attach,
> >>> you don't need to "enable it after" just do it right the first time
> >>> through.
> >>
> >> First time we will not know whether device will actually use fault handler or
> >> not. All we will know is whether IOMMU and device is capable of PRI or not.
> > 
> > I don't understand this, you should know all of this before you get to
> > setting the DTE. What is missing?
> 
> In attach path we will know the device capabilities (like PRI) but we will not
> know whether device is going to use it or not. Its like device has capability,
> let us enable it without assuming device may use it.

I don't understand "device may use it"

At domain pasid attach time you have the domain being attached and
information the device capabilities.

The decision is simple, if the domain is using PRI and the device
supports PRI you enable PRI.

This means you enable PRI when you attach the first PRI domain to the
GCR3, which will require rewriting the DTE during set_dev_pasid.

This is part of the same logical flow where you'd rewrite the DTE
during set_dev_pasid to do things like IDENTITY and BLOCKING RID
support.

> >> If I have to enable PRI in attach path then I don't need to track number of
> >> domain stuff. I can simply do something like
> >> 	if (pdom_is_sva_capable(pdom))
> >> 		// enable PRI in IOMMU
> >> 		// enable device PRI
> >>
> >> and in detach path,
> >> 	if (PRI is enabled)
> >> 		// disable IOMMU/device PRI stuff
> > 
> > Each PASID can have PRI on or not, so you need to keep track of how
> > many PASIDs are using PRI at any moment and keep things in sync that
> > way.
> > 
> > If there are no PRI handlers installed then the PCI config space
> > should disable PRI and all the PRI bits flushed and disabled.
> 
> If we don't have handler then PRI is disabled in attach device path only.
>
> So in detach path, if number of PASIDs are zero then we are good to disable PRI
> (at least for the current usage model).

So this is why you are having the problem above, the PRI logic should
not be in the attach_dev path because that does not take in a PRI
capable domaine (currently only SVA)

It should be triggered in set_dev_pasid and remove_dev_pasid.

I'm trying to get you to organize things so they are ready for the
next steps and things are not in the wrong spot.

Jason

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2024-02-08 19:03 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-18  7:33 [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 01/14] iommu/amd: Rename amd_iommu_v2_supported() as amd_iommu_pasid_supported() Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 02/14] iommu/amd: Introduce per device DTE update function Vasant Hegde
2024-02-02 15:29   ` Jason Gunthorpe
2024-01-18  7:33 ` [PATCH v5 03/14] iommu/amd: Add support for enabling/disabling IOMMU features Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 04/14] iommu/amd: Move PPR-related functions into ppr.c Vasant Hegde
2024-02-02 15:29   ` Jason Gunthorpe
2024-01-18  7:33 ` [PATCH v5 05/14] iommu/amd: Fix PPR interrupt processing logic Vasant Hegde
2024-02-02 15:30   ` Jason Gunthorpe
2024-01-18  7:33 ` [PATCH v5 06/14] iommu/amd: Define per-IOMMU iopf_queue Vasant Hegde
2024-02-02 15:30   ` Jason Gunthorpe
2024-01-18  7:33 ` [PATCH v5 07/14] iommu/amd: Add support for page response Vasant Hegde
2024-02-01 20:20   ` Jason Gunthorpe
2024-02-06 15:39     ` Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 08/14] iommu/amd: Add support for add/remove device for IOPF Vasant Hegde
2024-02-01 21:46   ` Jason Gunthorpe
2024-02-06 16:02     ` Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 09/14] iommu/amd: Add IO page fault notifier handler Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 10/14] iommu/amd: Introduce logic to enable/disable IOPF Vasant Hegde
2024-02-01 21:49   ` Jason Gunthorpe
2024-02-06 16:19     ` Vasant Hegde
2024-02-06 16:36       ` Jason Gunthorpe
2024-02-06 17:29         ` Vasant Hegde
2024-02-06 17:58           ` Jason Gunthorpe
2024-02-07  8:58             ` Vasant Hegde
2024-02-07 12:36               ` Baolu Lu
2024-02-07 18:00                 ` Vasant Hegde
2024-02-08 17:31               ` Jason Gunthorpe
2024-02-08 18:37                 ` Vasant Hegde
2024-02-08 19:03                   ` Jason Gunthorpe
2024-01-18  7:33 ` [PATCH v5 11/14] iommu/amd: Add GCR3 [un]initialization function Vasant Hegde
2024-02-02 15:17   ` Jason Gunthorpe
2024-02-06 17:00     ` Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 12/14] iommu/amd: Initial SVA support for AMD IOMMU Vasant Hegde
2024-02-02 15:25   ` Jason Gunthorpe
2024-02-06 17:16     ` Vasant Hegde
2024-02-06 17:34       ` Jason Gunthorpe
2024-02-07  9:31         ` Vasant Hegde
2024-02-08 17:41           ` Jason Gunthorpe
2024-02-08 18:23             ` Vasant Hegde
2024-02-08 18:48               ` Jason Gunthorpe
2024-01-18  7:33 ` [PATCH v5 13/14] iommu: Add ops->domain_alloc_sva() Vasant Hegde
2024-01-18  7:33 ` [PATCH v5 14/14] iommu/amd: Add SVA domain support Vasant Hegde
2024-02-02 15:28   ` Jason Gunthorpe
2024-01-18  7:40 ` [PATCH v5 00/14] iommu/amd: SVA Support (Part 4) - SVA and IOPF Vasant Hegde

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.