[PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space
@ 2024-07-02  6:34 Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 01/10] iommu: Introduce domain attachment handle Lu Baolu
                   ` (10 more replies)
  0 siblings, 11 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that
supports two-stage translation tables for IOMMU. The second-stage
translation table is managed by the host VMM, while the first-stage
translation table is owned by user space. This allows user space to
control the IOMMU mappings for its devices.

When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.

User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.

The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.

The series and related patches are available on GitHub:
https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v8

Change log:
v8:
 - Update an out-of-date commit message.
 - Fix an error unwind issue by storing to a reserved entry.
 - Add a fail_nth selftest case to test iommufd_ucmd_respond() failure
   path.
 - Fix a refcounting issue in hwpt allocation with fault object.
 - Miscellaneous cleanup.

v7: https://lore.kernel.org/linux-iommu/20240616061155.169343-1-baolu.lu@linux.intel.com/
 - Move the setting of handle.domain into the helpers.
 - Return value of copy_to_user() should be converted to -EFAULT.
 - Add more checks on a fetched dma handle in the sva bind path.
 - Add a constant flag in iommu_ops to replace
   IOMMU_CAP_USER_IOASID_TABLE.
 - Simplify iommu_hwpt_pgfault and iommu_hwpt_page_response by removing
   some unnecessary fields.
 - Address the wrong order between dec users and ctx_put in fault FD
   release path.
 - Miscellaneous cleanup.

v6: https://lore.kernel.org/linux-iommu/20240527040517.38561-1-baolu.lu@linux.intel.com/
 - Refine the attach handle code by shifting the handle allocation to
   the caller. The caller will then provide the allocated handle to the
   domain attachment interfaces.
 - Add reference counter in iommufd_fault_iopf_enable/disable() helpers.
 - Fix the return values of fault FD's read/write fops.
 - Add IOMMU_CAP_USER_IOASID_TABLE capability and check it before roll
   back getting attach_handle to RID.
 - Move the iopf respond queue from iommufd device to iommufd fault.
 - Disallow PRI enablement on SR-IOV VF devices.
 - Miscellaneous cleanup.

v5: https://lore.kernel.org/linux-iommu/20240430145710.68112-1-baolu.lu@linux.intel.com/
 - Removed attach handle reference count from the core. Drivers will now
   synchronize their use of handles and domain attach/detach.
 - Automatically responds to all outstanding faults in hwpt detach or
   replace paths.
 - Supports getting a domain-type specific attach handle.
 - Reorganized the series by changing the patch order.
 - Miscellaneous cleanup.

v4: https://lore.kernel.org/linux-iommu/20240403011519.78512-1-baolu.lu@linux.intel.com/
 - Add the iommu domain attachment handle to replace the iopf-specific
   domain attachment interfaces introduced in the previous v3.
 - Replace the iommu_sva with iommu domain attachment handle.
 - Refine some fields in the fault and response message encoding
   according to feedback collected during v3 review period.
 - Refine and fix some problems in the fault FD implementation.
 - Miscellaneous cleanup.

v3: https://lore.kernel.org/linux-iommu/20240122073903.24406-1-baolu.lu@linux.intel.com/
 - Add iopf domain attach/detach/replace interfaces to manage the
   reference counters of hwpt and device, ensuring that both can only be
   destroyed after all outstanding IOPFs have been responded to. 
 - Relocate the fault handling file descriptor from hwpt to a fault
   object to enable a single fault handling object to be utilized
   across multiple domains.
 - Miscellaneous cleanup and performance improvements.

v2: https://lore.kernel.org/linux-iommu/20231026024930.382898-1-baolu.lu@linux.intel.com/
 - Move all iommu refactoring patches into a sparated series and discuss
   it in a different thread. The latest patch series [v6] is available at
   https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
 - We discussed the timeout of the pending page fault messages. We
   agreed that we shouldn't apply any timeout policy for the page fault
   handling in user space.
   https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
 - Jason suggested that we adopt a simple file descriptor interface for
   reading and responding to I/O page requests, so that user space
   applications can improve performance using io_uring.
   https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/

v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/

Lu Baolu (10):
  iommu: Introduce domain attachment handle
  iommu: Remove sva handle list
  iommu: Add attach handle to struct iopf_group
  iommu: Extend domain attach group with handle support
  iommufd: Add fault and response message definitions
  iommufd: Add iommufd fault object
  iommufd: Fault-capable hwpt attach/detach/replace
  iommufd: Associate fault object with iommufd_hw_pgtable
  iommufd/selftest: Add IOPF support for mock device
  iommufd/selftest: Add coverage for IOPF test

 include/linux/iommu.h                         |  41 +-
 drivers/iommu/iommu-priv.h                    |  11 +
 drivers/iommu/iommufd/iommufd_private.h       |  80 ++++
 drivers/iommu/iommufd/iommufd_test.h          |   8 +
 include/uapi/linux/iommufd.h                  | 109 +++++
 tools/testing/selftests/iommu/iommufd_utils.h |  86 +++-
 drivers/dma/idxd/init.c                       |   2 +-
 drivers/iommu/io-pgfault.c                    |  63 +--
 drivers/iommu/iommu-sva.c                     |  42 +-
 drivers/iommu/iommu.c                         | 185 ++++++--
 drivers/iommu/iommufd/device.c                |   7 +-
 drivers/iommu/iommufd/fault.c                 | 433 ++++++++++++++++++
 drivers/iommu/iommufd/hw_pagetable.c          |  38 +-
 drivers/iommu/iommufd/main.c                  |   6 +
 drivers/iommu/iommufd/selftest.c              |  64 +++
 tools/testing/selftests/iommu/iommufd.c       |  22 +
 .../selftests/iommu/iommufd_fail_nth.c        |   2 +-
 drivers/iommu/iommufd/Makefile                |   1 +
 18 files changed, 1083 insertions(+), 117 deletions(-)
 create mode 100644 drivers/iommu/iommufd/fault.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v8 01/10] iommu: Introduce domain attachment handle
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 02/10] iommu: Remove sva handle list Lu Baolu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu, Jason Gunthorpe

Currently, when attaching a domain to a device or its PASID, domain is
stored within the iommu group. It could be retrieved for use during the
window between attachment and detachment.

With new features introduced, there's a need to store more information
than just a domain pointer. This information essentially represents the
association between a domain and a device. For example, the SVA code
already has a custom struct iommu_sva which represents a bond between
sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
a place to store the iommufd_device pointer in the core, so that the
device object ID could be quickly retrieved in the critical fault handling
path.

Introduce domain attachment handle that explicitly represents the
attachment relationship between a domain and a device or its PASID.

Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/linux/iommu.h     | 18 +++++++++++++++---
 drivers/dma/idxd/init.c   |  2 +-
 drivers/iommu/iommu-sva.c | 13 ++++++++-----
 drivers/iommu/iommu.c     | 26 ++++++++++++++++----------
 4 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 17b3f36ad843..afc5af0069bb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -989,12 +989,22 @@ struct iommu_fwspec {
 /* ATS is supported */
 #define IOMMU_FWSPEC_PCI_RC_ATS			(1 << 0)
 
+/*
+ * An iommu attach handle represents a relationship between an iommu domain
+ * and a PASID or RID of a device. It is allocated and managed by the component
+ * that manages the domain and is stored in the iommu group during the time the
+ * domain is attached.
+ */
+struct iommu_attach_handle {
+	struct iommu_domain		*domain;
+};
+
 /**
  * struct iommu_sva - handle to a device-mm bond
  */
 struct iommu_sva {
+	struct iommu_attach_handle	handle;
 	struct device			*dev;
-	struct iommu_domain		*domain;
 	struct list_head		handle_item;
 	refcount_t			users;
 };
@@ -1052,7 +1062,8 @@ int iommu_device_claim_dma_owner(struct device *dev, void *owner);
 void iommu_device_release_dma_owner(struct device *dev);
 
 int iommu_attach_device_pasid(struct iommu_domain *domain,
-			      struct device *dev, ioasid_t pasid);
+			      struct device *dev, ioasid_t pasid,
+			      struct iommu_attach_handle *handle);
 void iommu_detach_device_pasid(struct iommu_domain *domain,
 			       struct device *dev, ioasid_t pasid);
 struct iommu_domain *
@@ -1388,7 +1399,8 @@ static inline int iommu_device_claim_dma_owner(struct device *dev, void *owner)
 }
 
 static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
-					    struct device *dev, ioasid_t pasid)
+					    struct device *dev, ioasid_t pasid,
+					    struct iommu_attach_handle *handle)
 {
 	return -ENODEV;
 }
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index a7295943fa22..385c488c9cd1 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -584,7 +584,7 @@ static int idxd_enable_system_pasid(struct idxd_device *idxd)
 	 * DMA domain is owned by the driver, it should support all valid
 	 * types such as DMA-FQ, identity, etc.
 	 */
-	ret = iommu_attach_device_pasid(domain, dev, pasid);
+	ret = iommu_attach_device_pasid(domain, dev, pasid, NULL);
 	if (ret) {
 		dev_err(dev, "failed to attach device pasid %d, domain type %d",
 			pasid, domain->type);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 18a35e798b72..0fb923254062 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -99,7 +99,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 
 	/* Search for an existing domain. */
 	list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) {
-		ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
+		handle->handle.domain = domain;
+		ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid,
+						&handle->handle);
 		if (!ret) {
 			domain->users++;
 			goto out;
@@ -113,7 +115,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 		goto out_free_handle;
 	}
 
-	ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
+	handle->handle.domain = domain;
+	ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid,
+					&handle->handle);
 	if (ret)
 		goto out_free_domain;
 	domain->users = 1;
@@ -124,7 +128,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 	list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
 	mutex_unlock(&iommu_sva_lock);
 	handle->dev = dev;
-	handle->domain = domain;
 	return handle;
 
 out_free_domain:
@@ -147,7 +150,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  */
 void iommu_sva_unbind_device(struct iommu_sva *handle)
 {
-	struct iommu_domain *domain = handle->domain;
+	struct iommu_domain *domain = handle->handle.domain;
 	struct iommu_mm_data *iommu_mm = domain->mm->iommu_mm;
 	struct device *dev = handle->dev;
 
@@ -170,7 +173,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
 
 u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 {
-	struct iommu_domain *domain = handle->domain;
+	struct iommu_domain *domain = handle->handle.domain;
 
 	return mm_get_enqcmd_pasid(domain->mm);
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9df7cc75c1bc..a712b0cc3a1d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3352,16 +3352,17 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
  * @domain: the iommu domain.
  * @dev: the attached device.
  * @pasid: the pasid of the device.
+ * @handle: the attach handle.
  *
  * Return: 0 on success, or an error.
  */
 int iommu_attach_device_pasid(struct iommu_domain *domain,
-			      struct device *dev, ioasid_t pasid)
+			      struct device *dev, ioasid_t pasid,
+			      struct iommu_attach_handle *handle)
 {
 	/* Caller must be a probed driver on dev */
 	struct iommu_group *group = dev->iommu_group;
 	struct group_device *device;
-	void *curr;
 	int ret;
 
 	if (!domain->ops->set_dev_pasid)
@@ -3382,11 +3383,12 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 		}
 	}
 
-	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
-	if (curr) {
-		ret = xa_err(curr) ? : -EBUSY;
+	if (handle)
+		handle->domain = domain;
+
+	ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
+	if (ret)
 		goto out_unlock;
-	}
 
 	ret = __iommu_set_group_pasid(domain, group, pasid);
 	if (ret)
@@ -3414,7 +3416,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
 
 	mutex_lock(&group->mutex);
 	__iommu_remove_group_pasid(group, pasid, domain);
-	WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+	xa_erase(&group->pasid_array, pasid);
 	mutex_unlock(&group->mutex);
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
@@ -3439,15 +3441,19 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
 {
 	/* Caller must be a probed driver on dev */
 	struct iommu_group *group = dev->iommu_group;
-	struct iommu_domain *domain;
+	struct iommu_attach_handle *handle;
+	struct iommu_domain *domain = NULL;
 
 	if (!group)
 		return NULL;
 
 	xa_lock(&group->pasid_array);
-	domain = xa_load(&group->pasid_array, pasid);
+	handle = xa_load(&group->pasid_array, pasid);
+	if (handle)
+		domain = handle->domain;
+
 	if (type && domain && domain->type != type)
-		domain = ERR_PTR(-EBUSY);
+		domain = NULL;
 	xa_unlock(&group->pasid_array);
 
 	return domain;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 02/10] iommu: Remove sva handle list
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 01/10] iommu: Introduce domain attachment handle Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 03/10] iommu: Add attach handle to struct iopf_group Lu Baolu
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu, Jason Gunthorpe

The struct sva_iommu represents an association between an SVA domain and
a PASID of a device. It's stored in the iommu group's pasid array and also
tracked by a list in the per-mm data structure. Removes duplicate tracking
of sva_iommu by eliminating the list.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/linux/iommu.h      |  2 --
 drivers/iommu/iommu-priv.h |  3 +++
 drivers/iommu/iommu-sva.c  | 30 ++++++++++++++++++++----------
 drivers/iommu/iommu.c      | 31 +++++++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 12 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index afc5af0069bb..87ebcc29020e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1005,14 +1005,12 @@ struct iommu_attach_handle {
 struct iommu_sva {
 	struct iommu_attach_handle	handle;
 	struct device			*dev;
-	struct list_head		handle_item;
 	refcount_t			users;
 };
 
 struct iommu_mm_data {
 	u32			pasid;
 	struct list_head	sva_domains;
-	struct list_head	sva_handles;
 };
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 5f731d994803..f1536a5ebb0d 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -28,4 +28,7 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
 				 const struct bus_type *bus,
 				 struct notifier_block *nb);
 
+struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group,
+						    ioasid_t pasid,
+						    unsigned int type);
 #endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 0fb923254062..9b7f62517419 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -41,7 +41,6 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
 	}
 	iommu_mm->pasid = pasid;
 	INIT_LIST_HEAD(&iommu_mm->sva_domains);
-	INIT_LIST_HEAD(&iommu_mm->sva_handles);
 	/*
 	 * Make sure the write to mm->iommu_mm is not reordered in front of
 	 * initialization to iommu_mm fields. If it does, readers may see a
@@ -69,11 +68,16 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
  */
 struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
 {
+	struct iommu_group *group = dev->iommu_group;
+	struct iommu_attach_handle *attach_handle;
 	struct iommu_mm_data *iommu_mm;
 	struct iommu_domain *domain;
 	struct iommu_sva *handle;
 	int ret;
 
+	if (!group)
+		return ERR_PTR(-ENODEV);
+
 	mutex_lock(&iommu_sva_lock);
 
 	/* Allocate mm->pasid if necessary. */
@@ -83,12 +87,22 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 		goto out_unlock;
 	}
 
-	list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) {
-		if (handle->dev == dev) {
-			refcount_inc(&handle->users);
-			mutex_unlock(&iommu_sva_lock);
-			return handle;
+	/* A bond already exists, just take a reference`. */
+	attach_handle = iommu_attach_handle_get(group, iommu_mm->pasid, IOMMU_DOMAIN_SVA);
+	if (!IS_ERR(attach_handle)) {
+		handle = container_of(attach_handle, struct iommu_sva, handle);
+		if (attach_handle->domain->mm != mm) {
+			ret = -EBUSY;
+			goto out_unlock;
 		}
+		refcount_inc(&handle->users);
+		mutex_unlock(&iommu_sva_lock);
+		return handle;
+	}
+
+	if (PTR_ERR(attach_handle) != -ENOENT) {
+		ret = PTR_ERR(attach_handle);
+		goto out_unlock;
 	}
 
 	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
@@ -99,7 +113,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 
 	/* Search for an existing domain. */
 	list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) {
-		handle->handle.domain = domain;
 		ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid,
 						&handle->handle);
 		if (!ret) {
@@ -115,7 +128,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 		goto out_free_handle;
 	}
 
-	handle->handle.domain = domain;
 	ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid,
 					&handle->handle);
 	if (ret)
@@ -125,7 +137,6 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 
 out:
 	refcount_set(&handle->users, 1);
-	list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
 	mutex_unlock(&iommu_sva_lock);
 	handle->dev = dev;
 	return handle;
@@ -159,7 +170,6 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
 		mutex_unlock(&iommu_sva_lock);
 		return;
 	}
-	list_del(&handle->handle_item);
 
 	iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
 	if (--domain->users == 0) {
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a712b0cc3a1d..7890bd21dff6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3486,3 +3486,34 @@ void iommu_free_global_pasid(ioasid_t pasid)
 	ida_free(&iommu_global_pasid_ida, pasid);
 }
 EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
+
+/**
+ * iommu_attach_handle_get - Return the attach handle
+ * @group: the iommu group that domain was attached to
+ * @pasid: the pasid within the group
+ * @type: matched domain type, 0 for any match
+ *
+ * Return handle or ERR_PTR(-ENOENT) on none, ERR_PTR(-EBUSY) on mismatch.
+ *
+ * Return the attach handle to the caller. The life cycle of an iommu attach
+ * handle is from the time when the domain is attached to the time when the
+ * domain is detached. Callers are required to synchronize the call of
+ * iommu_attach_handle_get() with domain attachment and detachment. The attach
+ * handle can only be used during its life cycle.
+ */
+struct iommu_attach_handle *
+iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type)
+{
+	struct iommu_attach_handle *handle;
+
+	xa_lock(&group->pasid_array);
+	handle = xa_load(&group->pasid_array, pasid);
+	if (!handle)
+		handle = ERR_PTR(-ENOENT);
+	else if (type && handle->domain->type != type)
+		handle = ERR_PTR(-EBUSY);
+	xa_unlock(&group->pasid_array);
+
+	return handle;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 03/10] iommu: Add attach handle to struct iopf_group
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 01/10] iommu: Introduce domain attachment handle Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 02/10] iommu: Remove sva handle list Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 04/10] iommu: Extend domain attach group with handle support Lu Baolu
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu, Jason Gunthorpe

Previously, the domain that a page fault targets is stored in an
iopf_group, which represents a minimal set of page faults. With the
introduction of attach handle, replace the domain with the handle
so that the fault handler can obtain more information as needed
when handling the faults.

iommu_report_device_fault() is currently used for SVA page faults,
which handles the page fault in an internal cycle. The domain is retrieved
with iommu_get_domain_for_dev_pasid() if the pasid in the fault message
is valid. This doesn't work in IOMMUFD case, where if the pasid table of
a device is wholly managed by user space, there is no domain attached to
the PASID of the device, and all page faults are forwarded through a
NESTING domain attaching to RID.

Add a static flag in iommu ops, which indicates if the IOMMU driver
supports user-managed PASID tables. In the iopf deliver path, if no
attach handle found for the iopf PASID, roll back to RID domain when
the IOMMU driver supports this capability.

iommu_get_domain_for_dev_pasid() is no longer used and can be removed.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/linux/iommu.h      | 17 ++++-------
 drivers/iommu/io-pgfault.c | 61 +++++++++++++++++++++-----------------
 drivers/iommu/iommu-sva.c  |  3 +-
 drivers/iommu/iommu.c      | 39 ------------------------
 4 files changed, 42 insertions(+), 78 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87ebcc29020e..910aec80886e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,7 +127,7 @@ struct iopf_group {
 	/* list node for iommu_fault_param::faults */
 	struct list_head pending_node;
 	struct work_struct work;
-	struct iommu_domain *domain;
+	struct iommu_attach_handle *attach_handle;
 	/* The device's fault data parameter. */
 	struct iommu_fault_param *fault_param;
 };
@@ -547,6 +547,10 @@ static inline int __iommu_copy_struct_from_user_array(
  * @default_domain: If not NULL this will always be set as the default domain.
  *                  This should be an IDENTITY/BLOCKED/PLATFORM domain.
  *                  Do not use in new drivers.
+ * @user_pasid_table: IOMMU driver supports user-managed PASID table. There is
+ *                    no user domain for each PASID and the I/O page faults are
+ *                    forwarded through the user domain attached to the device
+ *                    RID.
  */
 struct iommu_ops {
 	bool (*capable)(struct device *dev, enum iommu_cap);
@@ -590,6 +594,7 @@ struct iommu_ops {
 	struct iommu_domain *blocked_domain;
 	struct iommu_domain *release_domain;
 	struct iommu_domain *default_domain;
+	u8 user_pasid_table:1;
 };
 
 /**
@@ -1064,9 +1069,6 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 			      struct iommu_attach_handle *handle);
 void iommu_detach_device_pasid(struct iommu_domain *domain,
 			       struct device *dev, ioasid_t pasid);
-struct iommu_domain *
-iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
-			       unsigned int type);
 ioasid_t iommu_alloc_global_pasid(struct device *dev);
 void iommu_free_global_pasid(ioasid_t pasid);
 #else /* CONFIG_IOMMU_API */
@@ -1408,13 +1410,6 @@ static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
 {
 }
 
-static inline struct iommu_domain *
-iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
-			       unsigned int type)
-{
-	return NULL;
-}
-
 static inline ioasid_t iommu_alloc_global_pasid(struct device *dev)
 {
 	return IOMMU_PASID_INVALID;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 06d78fcc79fd..7c9011992d3f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -59,30 +59,6 @@ void iopf_free_group(struct iopf_group *group)
 }
 EXPORT_SYMBOL_GPL(iopf_free_group);
 
-static struct iommu_domain *get_domain_for_iopf(struct device *dev,
-						struct iommu_fault *fault)
-{
-	struct iommu_domain *domain;
-
-	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
-		domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
-		if (IS_ERR(domain))
-			domain = NULL;
-	} else {
-		domain = iommu_get_domain_for_dev(dev);
-	}
-
-	if (!domain || !domain->iopf_handler) {
-		dev_warn_ratelimited(dev,
-			"iopf (pasid %d) without domain attached or handler installed\n",
-			 fault->prm.pasid);
-
-		return NULL;
-	}
-
-	return domain;
-}
-
 /* Non-last request of a group. Postpone until the last one. */
 static int report_partial_fault(struct iommu_fault_param *fault_param,
 				struct iommu_fault *fault)
@@ -206,20 +182,51 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 	if (group == &abort_group)
 		goto err_abort;
 
-	group->domain = get_domain_for_iopf(dev, fault);
-	if (!group->domain)
+	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
+		group->attach_handle = iommu_attach_handle_get(dev->iommu_group,
+							       fault->prm.pasid,
+							       0);
+		if (IS_ERR(group->attach_handle)) {
+			const struct iommu_ops *ops = dev_iommu_ops(dev);
+
+			if (!ops->user_pasid_table)
+				goto err_abort;
+
+			/*
+			 * The iommu driver for this device supports user-
+			 * managed PASID table. Therefore page faults for
+			 * any PASID should go through the NESTING domain
+			 * attached to the device RID.
+			 */
+			group->attach_handle =
+				iommu_attach_handle_get(dev->iommu_group,
+							IOMMU_NO_PASID,
+							IOMMU_DOMAIN_NESTED);
+			if (IS_ERR(group->attach_handle))
+				goto err_abort;
+		}
+	} else {
+		group->attach_handle =
+			iommu_attach_handle_get(dev->iommu_group, IOMMU_NO_PASID, 0);
+		if (IS_ERR(group->attach_handle))
+			goto err_abort;
+	}
+
+	if (!group->attach_handle->domain->iopf_handler)
 		goto err_abort;
 
 	/*
 	 * On success iopf_handler must call iopf_group_response() and
 	 * iopf_free_group()
 	 */
-	if (group->domain->iopf_handler(group))
+	if (group->attach_handle->domain->iopf_handler(group))
 		goto err_abort;
 
 	return;
 
 err_abort:
+	dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n",
+			     fault->prm.pasid);
 	iopf_group_response(group, IOMMU_PAGE_RESP_FAILURE);
 	if (group == &abort_group)
 		__iopf_free_group(group);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 9b7f62517419..69cde094440e 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -272,7 +272,8 @@ static void iommu_sva_handle_iopf(struct work_struct *work)
 		if (status != IOMMU_PAGE_RESP_SUCCESS)
 			break;
 
-		status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
+		status = iommu_sva_handle_mm(&iopf->fault,
+					     group->attach_handle->domain->mm);
 	}
 
 	iopf_group_response(group, status);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7890bd21dff6..5a7e874abb36 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3421,45 +3421,6 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
 
-/*
- * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
- * @dev: the queried device
- * @pasid: the pasid of the device
- * @type: matched domain type, 0 for any match
- *
- * This is a variant of iommu_get_domain_for_dev(). It returns the existing
- * domain attached to pasid of a device. Callers must hold a lock around this
- * function, and both iommu_attach/detach_dev_pasid() whenever a domain of
- * type is being manipulated. This API does not internally resolve races with
- * attach/detach.
- *
- * Return: attached domain on success, NULL otherwise.
- */
-struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
-						    ioasid_t pasid,
-						    unsigned int type)
-{
-	/* Caller must be a probed driver on dev */
-	struct iommu_group *group = dev->iommu_group;
-	struct iommu_attach_handle *handle;
-	struct iommu_domain *domain = NULL;
-
-	if (!group)
-		return NULL;
-
-	xa_lock(&group->pasid_array);
-	handle = xa_load(&group->pasid_array, pasid);
-	if (handle)
-		domain = handle->domain;
-
-	if (type && domain && domain->type != type)
-		domain = NULL;
-	xa_unlock(&group->pasid_array);
-
-	return domain;
-}
-EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
-
 ioasid_t iommu_alloc_global_pasid(struct device *dev)
 {
 	int ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 04/10] iommu: Extend domain attach group with handle support
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (2 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 03/10] iommu: Add attach handle to struct iopf_group Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 05/10] iommufd: Add fault and response message definitions Lu Baolu
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Unlike the SVA case where each PASID of a device has an SVA domain
attached to it, the I/O page faults are handled by the fault handler
of the SVA domain. The I/O page faults for a user page table might
be handled by the domain attached to RID or the domain attached to
the PASID, depending on whether the PASID table is managed by user
space or kernel. As a result, there is a need for the domain attach
group interfaces to have attach handle support. The attach handle
will be forwarded to the fault handler of the user domain.

Add some variants of the domain attaching group interfaces so that they
could support the attach handle and export them for use in IOMMUFD.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/iommu-priv.h |   8 +++
 drivers/iommu/iommu.c      | 103 +++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index f1536a5ebb0d..c37801c32f33 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -31,4 +31,12 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
 struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group,
 						    ioasid_t pasid,
 						    unsigned int type);
+int iommu_attach_group_handle(struct iommu_domain *domain,
+			      struct iommu_group *group,
+			      struct iommu_attach_handle *handle);
+void iommu_detach_group_handle(struct iommu_domain *domain,
+			       struct iommu_group *group);
+int iommu_replace_group_handle(struct iommu_group *group,
+			       struct iommu_domain *new_domain,
+			       struct iommu_attach_handle *handle);
 #endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5a7e874abb36..8484285fbaa8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3478,3 +3478,106 @@ iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int
 	return handle;
 }
 EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
+
+/**
+ * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ * @handle: attach handle
+ *
+ * Returns 0 on success and error code on failure.
+ *
+ * This is a variant of iommu_attach_group(). It allows the caller to provide
+ * an attach handle and use it when the domain is attached. This is currently
+ * used by IOMMUFD to deliver the I/O page faults.
+ */
+int iommu_attach_group_handle(struct iommu_domain *domain,
+			      struct iommu_group *group,
+			      struct iommu_attach_handle *handle)
+{
+	int ret;
+
+	if (handle)
+		handle->domain = domain;
+
+	mutex_lock(&group->mutex);
+	ret = xa_insert(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL);
+	if (ret)
+		goto err_unlock;
+
+	ret = __iommu_attach_group(domain, group);
+	if (ret)
+		goto err_erase;
+	mutex_unlock(&group->mutex);
+
+	return 0;
+err_erase:
+	xa_erase(&group->pasid_array, IOMMU_NO_PASID);
+err_unlock:
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_group_handle, IOMMUFD_INTERNAL);
+
+/**
+ * iommu_detach_group_handle - Detach an IOMMU domain from an IOMMU group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ *
+ * Detach the specified IOMMU domain from the specified IOMMU group.
+ * It must be used in conjunction with iommu_attach_group_handle().
+ */
+void iommu_detach_group_handle(struct iommu_domain *domain,
+			       struct iommu_group *group)
+{
+	mutex_lock(&group->mutex);
+	__iommu_group_set_core_domain(group);
+	xa_erase(&group->pasid_array, IOMMU_NO_PASID);
+	mutex_unlock(&group->mutex);
+}
+EXPORT_SYMBOL_NS_GPL(iommu_detach_group_handle, IOMMUFD_INTERNAL);
+
+/**
+ * iommu_replace_group_handle - replace the domain that a group is attached to
+ * @group: IOMMU group that will be attached to the new domain
+ * @new_domain: new IOMMU domain to replace with
+ * @handle: attach handle
+ *
+ * This is a variant of iommu_group_replace_domain(). It allows the caller to
+ * provide an attach handle for the new domain and use it when the domain is
+ * attached.
+ */
+int iommu_replace_group_handle(struct iommu_group *group,
+			       struct iommu_domain *new_domain,
+			       struct iommu_attach_handle *handle)
+{
+	void *curr;
+	int ret;
+
+	if (!new_domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	if (handle) {
+		ret = xa_reserve(&group->pasid_array, IOMMU_NO_PASID, GFP_KERNEL);
+		if (ret)
+			goto err_unlock;
+	}
+
+	ret = __iommu_group_set_domain(group, new_domain);
+	if (ret)
+		goto err_release;
+
+	curr = xa_store(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL);
+	WARN_ON(xa_is_err(curr));
+
+	mutex_unlock(&group->mutex);
+
+	return 0;
+err_release:
+	xa_release(&group->pasid_array, IOMMU_NO_PASID);
+err_unlock:
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, IOMMUFD_INTERNAL);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 05/10] iommufd: Add fault and response message definitions
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (3 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 04/10] iommu: Extend domain attach group with handle support Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 06/10] iommufd: Add iommufd fault object Lu Baolu
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.

An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @cookie field,
which matches the cookie field of the last fault in the group, will
be used by the kernel to look up the pending message.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/uapi/linux/iommufd.h | 83 ++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..4d89ed97b533 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,87 @@ struct iommu_hwpt_invalidate {
 	__u32 __reserved;
 };
 #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ *                                   valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
+	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the
+ *                           Execute Requested bit set in PASID TLP Prefix.
+ * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the
+ *                           Privileged Mode Requested bit set in PASID TLP
+ *                           Prefix.
+ */
+enum iommu_hwpt_pgfault_perm {
+	IOMMU_PGFAULT_PERM_READ			= (1 << 0),
+	IOMMU_PGFAULT_PERM_WRITE		= (1 << 1),
+	IOMMU_PGFAULT_PERM_EXEC			= (1 << 2),
+	IOMMU_PGFAULT_PERM_PRIV			= (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: Fault address
+ * @length: a hint of how much data the requestor is expecting to fetch. For
+ *          example, if the PRI initiator knows it is going to do a 10MB
+ *          transfer, it could fill in 10MB and the OS could pre-fault in
+ *          10MB of IOVA. It's default to 0 if there's no such hint.
+ * @cookie: kernel-managed cookie identifying a group of fault messages. The
+ *          cookie number encoded in the last page fault of the group should
+ *          be echoed back in the response message.
+ */
+struct iommu_hwpt_pgfault {
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 perm;
+	__u64 addr;
+	__u32 length;
+	__u32 cookie;
+};
+
+/**
+ * enum iommufd_page_response_code - Return status of fault handlers
+ * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ *                             populated, retry the access. This is the
+ *                             "Success" defined in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ *                             access. This is the "Invalid Request" in PCI
+ *                             10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ *                             this device if possible. This is the "Response
+ *                             Failure" in PCI 10.4.2.1.
+ */
+enum iommufd_page_response_code {
+	IOMMUFD_PAGE_RESP_SUCCESS = 0,
+	IOMMUFD_PAGE_RESP_INVALID,
+	IOMMUFD_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @cookie: The kernel-managed cookie reported in the fault message.
+ * @code: One of response code in enum iommufd_page_response_code.
+ */
+struct iommu_hwpt_page_response {
+	__u32 cookie;
+	__u32 code;
+};
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (4 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 05/10] iommufd: Add fault and response message definitions Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-03 23:06   ` Nicolin Chen
  2024-07-02  6:34 ` [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu, Jason Gunthorpe

An iommufd fault object provides an interface for delivering I/O page
faults to user space. These objects are created and destroyed by user
space, and they can be associated with or dissociated from hardware page
table objects during page table allocation or destruction.

User space interacts with the fault object through a file interface. This
interface offers a straightforward and efficient way for user space to
handle page faults. It allows user space to read fault messages
sequentially and respond to them by writing to the same file. The file
interface supports reading messages in poll mode, so it's recommended that
user space applications use io_uring to enhance read and write efficiency.

A fault object can be associated with any iopf-capable iommufd_hw_pgtable
during the pgtable's allocation. All I/O page faults triggered by devices
when accessing the I/O addresses of an iommufd_hw_pgtable are routed
through the fault object to user space. Similarly, user space's responses
to these page faults are routed back to the iommu device driver through
the same fault object.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/linux/iommu.h                   |   4 +
 drivers/iommu/iommufd/iommufd_private.h |  30 ++++
 include/uapi/linux/iommufd.h            |  18 ++
 drivers/iommu/io-pgfault.c              |   2 +
 drivers/iommu/iommufd/fault.c           | 226 ++++++++++++++++++++++++
 drivers/iommu/iommufd/main.c            |   6 +
 drivers/iommu/iommufd/Makefile          |   1 +
 7 files changed, 287 insertions(+)
 create mode 100644 drivers/iommu/iommufd/fault.c

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 910aec80886e..73bc3aee95a1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,12 +124,16 @@ struct iopf_fault {
 struct iopf_group {
 	struct iopf_fault last_fault;
 	struct list_head faults;
+	size_t fault_count;
 	/* list node for iommu_fault_param::faults */
 	struct list_head pending_node;
 	struct work_struct work;
 	struct iommu_attach_handle *attach_handle;
 	/* The device's fault data parameter. */
 	struct iommu_fault_param *fault_param;
+	/* Used by handler provider to hook the group on its own lists. */
+	struct list_head node;
+	u32 cookie;
 };
 
 /**
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..c8a4519f1405 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -128,6 +128,7 @@ enum iommufd_object_type {
 	IOMMUFD_OBJ_HWPT_NESTED,
 	IOMMUFD_OBJ_IOAS,
 	IOMMUFD_OBJ_ACCESS,
+	IOMMUFD_OBJ_FAULT,
 #ifdef CONFIG_IOMMUFD_TEST
 	IOMMUFD_OBJ_SELFTEST,
 #endif
@@ -426,6 +427,35 @@ void iopt_remove_access(struct io_pagetable *iopt,
 			u32 iopt_access_list_id);
 void iommufd_access_destroy_object(struct iommufd_object *obj);
 
+/*
+ * An iommufd_fault object represents an interface to deliver I/O page faults
+ * to the user space. These objects are created/destroyed by the user space and
+ * associated with hardware page table objects during page-table allocation.
+ */
+struct iommufd_fault {
+	struct iommufd_object obj;
+	struct iommufd_ctx *ictx;
+	struct file *filep;
+
+	/* The lists of outstanding faults protected by below mutex. */
+	struct mutex mutex;
+	struct list_head deliver;
+	struct xarray response;
+
+	struct wait_queue_head wait_queue;
+};
+
+struct iommufd_attach_handle {
+	struct iommu_attach_handle handle;
+	struct iommufd_device *idev;
+};
+
+/* Convert an iommu attach handle to iommufd handle. */
+#define to_iommufd_handle(hdl)	container_of(hdl, struct iommufd_attach_handle, handle)
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
+void iommufd_fault_destroy(struct iommufd_object *obj);
+
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
 void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4d89ed97b533..70b8a38fcd46 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -50,6 +50,7 @@ enum {
 	IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
 	IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
 	IOMMUFD_CMD_HWPT_INVALIDATE,
+	IOMMUFD_CMD_FAULT_QUEUE_ALLOC,
 };
 
 /**
@@ -775,4 +776,21 @@ struct iommu_hwpt_page_response {
 	__u32 cookie;
 	__u32 code;
 };
+
+/**
+ * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_QUEUE_ALLOC)
+ * @size: sizeof(struct iommu_fault_alloc)
+ * @flags: Must be 0
+ * @out_fault_id: The ID of the new FAULT
+ * @out_fault_fd: The fd of the new FAULT
+ *
+ * Explicitly allocate a fault handling object.
+ */
+struct iommu_fault_alloc {
+	__u32 size;
+	__u32 flags;
+	__u32 out_fault_id;
+	__u32 out_fault_fd;
+};
+#define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC)
 #endif
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 7c9011992d3f..cd679c13752e 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -110,6 +110,8 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param,
 	list_add(&group->pending_node, &iopf_param->faults);
 	mutex_unlock(&iopf_param->lock);
 
+	group->fault_count = list_count_nodes(&group->faults);
+
 	return group;
 }
 
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
new file mode 100644
index 000000000000..68ff94671d48
--- /dev/null
+++ b/drivers/iommu/iommufd/fault.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Intel Corporation
+ */
+#define pr_fmt(fmt) "iommufd: " fmt
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/iommufd.h>
+#include <linux/poll.h>
+#include <linux/anon_inodes.h>
+#include <uapi/linux/iommufd.h>
+
+#include "../iommu-priv.h"
+#include "iommufd_private.h"
+
+void iommufd_fault_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
+	struct iopf_group *group, *next;
+
+	/*
+	 * The iommufd object's reference count is zero at this point.
+	 * We can be confident that no other threads are currently
+	 * accessing this pointer. Therefore, acquiring the mutex here
+	 * is unnecessary.
+	 */
+	list_for_each_entry_safe(group, next, &fault->deliver, node) {
+		list_del(&group->node);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+}
+
+static void iommufd_compose_fault_message(struct iommu_fault *fault,
+					  struct iommu_hwpt_pgfault *hwpt_fault,
+					  struct iommufd_device *idev,
+					  u32 cookie)
+{
+	hwpt_fault->flags = fault->prm.flags;
+	hwpt_fault->dev_id = idev->obj.id;
+	hwpt_fault->pasid = fault->prm.pasid;
+	hwpt_fault->grpid = fault->prm.grpid;
+	hwpt_fault->perm = fault->prm.perm;
+	hwpt_fault->addr = fault->prm.addr;
+	hwpt_fault->length = 0;
+	hwpt_fault->cookie = cookie;
+}
+
+static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
+				       size_t count, loff_t *ppos)
+{
+	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
+	struct iommufd_fault *fault = filep->private_data;
+	struct iommu_hwpt_pgfault data;
+	struct iommufd_device *idev;
+	struct iopf_group *group;
+	struct iopf_fault *iopf;
+	size_t done = 0;
+	int rc = 0;
+
+	if (*ppos || count % fault_size)
+		return -ESPIPE;
+
+	mutex_lock(&fault->mutex);
+	while (!list_empty(&fault->deliver) && count > done) {
+		group = list_first_entry(&fault->deliver,
+					 struct iopf_group, node);
+
+		if (group->fault_count * fault_size > count - done)
+			break;
+
+		rc = xa_alloc(&fault->response, &group->cookie, group,
+			      xa_limit_32b, GFP_KERNEL);
+		if (rc)
+			break;
+
+		idev = to_iommufd_handle(group->attach_handle)->idev;
+		list_for_each_entry(iopf, &group->faults, list) {
+			iommufd_compose_fault_message(&iopf->fault,
+						      &data, idev,
+						      group->cookie);
+			if (copy_to_user(buf + done, &data, fault_size)) {
+				xa_erase(&fault->response, group->cookie);
+				rc = -EFAULT;
+				break;
+			}
+			done += fault_size;
+		}
+
+		list_del(&group->node);
+	}
+	mutex_unlock(&fault->mutex);
+
+	return done == 0 ? rc : done;
+}
+
+static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	size_t response_size = sizeof(struct iommu_hwpt_page_response);
+	struct iommufd_fault *fault = filep->private_data;
+	struct iommu_hwpt_page_response response;
+	struct iopf_group *group;
+	size_t done = 0;
+	int rc = 0;
+
+	if (*ppos || count % response_size)
+		return -ESPIPE;
+
+	mutex_lock(&fault->mutex);
+	while (count > done) {
+		rc = copy_from_user(&response, buf + done, response_size);
+		if (rc)
+			break;
+
+		group = xa_erase(&fault->response, response.cookie);
+		if (!group) {
+			rc = -EINVAL;
+			break;
+		}
+
+		iopf_group_response(group, response.code);
+		iopf_free_group(group);
+		done += response_size;
+	}
+	mutex_unlock(&fault->mutex);
+
+	return done == 0 ? rc : done;
+}
+
+static __poll_t iommufd_fault_fops_poll(struct file *filep,
+					struct poll_table_struct *wait)
+{
+	struct iommufd_fault *fault = filep->private_data;
+	__poll_t pollflags = EPOLLOUT;
+
+	poll_wait(filep, &fault->wait_queue, wait);
+	mutex_lock(&fault->mutex);
+	if (!list_empty(&fault->deliver))
+		pollflags |= EPOLLIN | EPOLLRDNORM;
+	mutex_unlock(&fault->mutex);
+
+	return pollflags;
+}
+
+static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
+{
+	struct iommufd_fault *fault = filep->private_data;
+
+	refcount_dec(&fault->obj.users);
+	iommufd_ctx_put(fault->ictx);
+	return 0;
+}
+
+static const struct file_operations iommufd_fault_fops = {
+	.owner		= THIS_MODULE,
+	.open		= nonseekable_open,
+	.read		= iommufd_fault_fops_read,
+	.write		= iommufd_fault_fops_write,
+	.poll		= iommufd_fault_fops_poll,
+	.release	= iommufd_fault_fops_release,
+	.llseek		= no_llseek,
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_fault_alloc *cmd = ucmd->cmd;
+	struct iommufd_fault *fault;
+	struct file *filep;
+	int fdno;
+	int rc;
+
+	if (cmd->flags)
+		return -EOPNOTSUPP;
+
+	fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
+	if (IS_ERR(fault))
+		return PTR_ERR(fault);
+
+	fault->ictx = ucmd->ictx;
+	INIT_LIST_HEAD(&fault->deliver);
+	xa_init_flags(&fault->response, XA_FLAGS_ALLOC1);
+	mutex_init(&fault->mutex);
+	init_waitqueue_head(&fault->wait_queue);
+
+	filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
+				   fault, O_RDWR);
+	if (IS_ERR(filep)) {
+		rc = PTR_ERR(filep);
+		goto out_abort;
+	}
+
+	refcount_inc(&fault->obj.users);
+	iommufd_ctx_get(fault->ictx);
+	fault->filep = filep;
+
+	fdno = get_unused_fd_flags(O_CLOEXEC);
+	if (fdno < 0) {
+		rc = fdno;
+		goto out_fput;
+	}
+
+	cmd->out_fault_id = fault->obj.id;
+	cmd->out_fault_fd = fdno;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_put_fdno;
+	iommufd_object_finalize(ucmd->ictx, &fault->obj);
+
+	fd_install(fdno, fault->filep);
+
+	return 0;
+out_put_fdno:
+	put_unused_fd(fdno);
+out_fput:
+	fput(filep);
+	refcount_dec(&fault->obj.users);
+	iommufd_ctx_put(fault->ictx);
+out_abort:
+	iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
+
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39b32932c61e..83bbd7c5d160 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -319,6 +319,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd)
 
 union ucmd_buffer {
 	struct iommu_destroy destroy;
+	struct iommu_fault_alloc fault;
 	struct iommu_hw_info info;
 	struct iommu_hwpt_alloc hwpt;
 	struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap;
@@ -355,6 +356,8 @@ struct iommufd_ioctl_op {
 	}
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
+	IOCTL_OP(IOMMU_FAULT_QUEUE_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc,
+		 out_fault_fd),
 	IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info,
 		 __reserved),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
@@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
 		.destroy = iommufd_hwpt_nested_destroy,
 		.abort = iommufd_hwpt_nested_abort,
 	},
+	[IOMMUFD_OBJ_FAULT] = {
+		.destroy = iommufd_fault_destroy,
+	},
 #ifdef CONFIG_IOMMUFD_TEST
 	[IOMMUFD_OBJ_SELFTEST] = {
 		.destroy = iommufd_selftest_destroy,
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 34b446146961..cf4605962bea 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 iommufd-y := \
 	device.o \
+	fault.o \
 	hw_pagetable.o \
 	io_pagetable.o \
 	ioas.o \
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (5 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 06/10] iommufd: Add iommufd fault object Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-10-15  3:19   ` Zhangfei Gao
  2024-07-02  6:34 ` [PATCH v8 08/10] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Add iopf-capable hw page table attach/detach/replace helpers. The pointer
to iommufd_device is stored in the domain attachment handle, so that it
can be echo'ed back in the iopf_group.

The iopf-capable hw page tables can only be attached to devices that
support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
the device. Similarly, after the last iopf-capable hwpt is detached from
the device, the IOPF feature is disabled on the device.

The current implementation allows a replacement between iopf-capable and
non-iopf-capable hw page tables. This matches the nested translation use
case, where a parent domain is attached by default and can then be
replaced with a nested user domain with iopf support.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  41 +++++
 drivers/iommu/iommufd/device.c          |   7 +-
 drivers/iommu/iommufd/fault.c           | 190 ++++++++++++++++++++++++
 3 files changed, 235 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index c8a4519f1405..aa4c26c87cb9 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -11,6 +11,7 @@
 #include <linux/iommu.h>
 #include <linux/iova_bitmap.h>
 #include <uapi/linux/iommufd.h>
+#include "../iommu-priv.h"
 
 struct iommu_domain;
 struct iommu_group;
@@ -293,6 +294,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
 struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
 	struct iommu_domain *domain;
+	struct iommufd_fault *fault;
 };
 
 struct iommufd_hwpt_paging {
@@ -396,6 +398,9 @@ struct iommufd_device {
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
+	/* protect iopf_enabled counter */
+	struct mutex iopf_lock;
+	unsigned int iopf_enabled;
 };
 
 static inline struct iommufd_device *
@@ -456,6 +461,42 @@ struct iommufd_attach_handle {
 int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
 void iommufd_fault_destroy(struct iommufd_object *obj);
 
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				    struct iommufd_device *idev);
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev);
+int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+				     struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_hw_pagetable *old);
+
+static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
+					     struct iommufd_device *idev)
+{
+	if (hwpt->fault)
+		return iommufd_fault_domain_attach_dev(hwpt, idev);
+
+	return iommu_attach_group(hwpt->domain, idev->igroup->group);
+}
+
+static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
+					      struct iommufd_device *idev)
+{
+	if (hwpt->fault)
+		iommufd_fault_domain_detach_dev(hwpt, idev);
+
+	iommu_detach_group(hwpt->domain, idev->igroup->group);
+}
+
+static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
+					      struct iommufd_hw_pagetable *hwpt,
+					      struct iommufd_hw_pagetable *old)
+{
+	if (old->fault || hwpt->fault)
+		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
+
+	return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
+}
+
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
 void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 873630c111c1..9a7ec5997c61 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	refcount_inc(&idev->obj.users);
 	/* igroup refcount moves into iommufd_device */
 	idev->igroup = igroup;
+	mutex_init(&idev->iopf_lock);
 
 	/*
 	 * If the caller fails after this success it must call
@@ -376,7 +377,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
+		rc = iommufd_hwpt_attach_device(hwpt, idev);
 		if (rc)
 			goto err_unresv;
 		idev->igroup->hwpt = hwpt;
@@ -402,7 +403,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	mutex_lock(&idev->igroup->lock);
 	list_del(&idev->group_item);
 	if (list_empty(&idev->igroup->device_list)) {
-		iommu_detach_group(hwpt->domain, idev->igroup->group);
+		iommufd_hwpt_detach_device(hwpt, idev);
 		idev->igroup->hwpt = NULL;
 	}
 	if (hwpt_is_paging(hwpt))
@@ -497,7 +498,7 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 			goto err_unlock;
 	}
 
-	rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
+	rc = iommufd_hwpt_replace_device(idev, hwpt, old_hwpt);
 	if (rc)
 		goto err_unresv;
 
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 68ff94671d48..4934ae572638 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/iommufd.h>
+#include <linux/pci.h>
 #include <linux/poll.h>
 #include <linux/anon_inodes.h>
 #include <uapi/linux/iommufd.h>
@@ -15,6 +16,195 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
+static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
+{
+	struct device *dev = idev->dev;
+	int ret;
+
+	/*
+	 * Once we turn on PCI/PRI support for VF, the response failure code
+	 * should not be forwarded to the hardware due to PRI being a shared
+	 * resource between PF and VFs. There is no coordination for this
+	 * shared capability. This waits for a vPRI reset to recover.
+	 */
+	if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
+		return -EINVAL;
+
+	mutex_lock(&idev->iopf_lock);
+	/* Device iopf has already been on. */
+	if (++idev->iopf_enabled > 1) {
+		mutex_unlock(&idev->iopf_lock);
+		return 0;
+	}
+
+	ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
+	if (ret)
+		--idev->iopf_enabled;
+	mutex_unlock(&idev->iopf_lock);
+
+	return ret;
+}
+
+static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
+{
+	mutex_lock(&idev->iopf_lock);
+	if (!WARN_ON(idev->iopf_enabled == 0)) {
+		if (--idev->iopf_enabled == 0)
+			iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+	}
+	mutex_unlock(&idev->iopf_lock);
+}
+
+static int __fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev)
+{
+	struct iommufd_attach_handle *handle;
+	int ret;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return -ENOMEM;
+
+	handle->idev = idev;
+	ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
+					&handle->handle);
+	if (ret)
+		kfree(handle);
+
+	return ret;
+}
+
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				    struct iommufd_device *idev)
+{
+	int ret;
+
+	if (!hwpt->fault)
+		return -EINVAL;
+
+	ret = iommufd_fault_iopf_enable(idev);
+	if (ret)
+		return ret;
+
+	ret = __fault_domain_attach_dev(hwpt, idev);
+	if (ret)
+		iommufd_fault_iopf_disable(idev);
+
+	return ret;
+}
+
+static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt,
+					 struct iommufd_attach_handle *handle)
+{
+	struct iommufd_fault *fault = hwpt->fault;
+	struct iopf_group *group, *next;
+	unsigned long index;
+
+	if (!fault)
+		return;
+
+	mutex_lock(&fault->mutex);
+	list_for_each_entry_safe(group, next, &fault->deliver, node) {
+		if (group->attach_handle != &handle->handle)
+			continue;
+		list_del(&group->node);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+
+	xa_for_each(&fault->response, index, group) {
+		if (group->attach_handle != &handle->handle)
+			continue;
+		xa_erase(&fault->response, index);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+	mutex_unlock(&fault->mutex);
+}
+
+static struct iommufd_attach_handle *
+iommufd_device_get_attach_handle(struct iommufd_device *idev)
+{
+	struct iommu_attach_handle *handle;
+
+	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+	if (!handle)
+		return NULL;
+
+	return to_iommufd_handle(handle);
+}
+
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev)
+{
+	struct iommufd_attach_handle *handle;
+
+	handle = iommufd_device_get_attach_handle(idev);
+	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	iommufd_auto_response_faults(hwpt, handle);
+	iommufd_fault_iopf_disable(idev);
+	kfree(handle);
+}
+
+static int __fault_domain_replace_dev(struct iommufd_device *idev,
+				      struct iommufd_hw_pagetable *hwpt,
+				      struct iommufd_hw_pagetable *old)
+{
+	struct iommufd_attach_handle *handle, *curr = NULL;
+	int ret;
+
+	if (old->fault)
+		curr = iommufd_device_get_attach_handle(idev);
+
+	if (hwpt->fault) {
+		handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+		if (!handle)
+			return -ENOMEM;
+
+		handle->handle.domain = hwpt->domain;
+		handle->idev = idev;
+		ret = iommu_replace_group_handle(idev->igroup->group,
+						 hwpt->domain, &handle->handle);
+	} else {
+		ret = iommu_replace_group_handle(idev->igroup->group,
+						 hwpt->domain, NULL);
+	}
+
+	if (!ret && curr) {
+		iommufd_auto_response_faults(old, curr);
+		kfree(curr);
+	}
+
+	return ret;
+}
+
+int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+				     struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_hw_pagetable *old)
+{
+	bool iopf_off = !hwpt->fault && old->fault;
+	bool iopf_on = hwpt->fault && !old->fault;
+	int ret;
+
+	if (iopf_on) {
+		ret = iommufd_fault_iopf_enable(idev);
+		if (ret)
+			return ret;
+	}
+
+	ret = __fault_domain_replace_dev(idev, hwpt, old);
+	if (ret) {
+		if (iopf_on)
+			iommufd_fault_iopf_disable(idev);
+		return ret;
+	}
+
+	if (iopf_off)
+		iommufd_fault_iopf_disable(idev);
+
+	return 0;
+}
+
 void iommufd_fault_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 08/10] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (6 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 09/10] iommufd/selftest: Add IOPF support for mock device Lu Baolu
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

When allocating a user iommufd_hw_pagetable, the user space is allowed to
associate a fault object with the hw_pagetable by specifying the fault
object ID in the page table allocation data and setting the
IOMMU_HWPT_FAULT_ID_VALID flag bit.

On a successful return of hwpt allocation, the user can retrieve and
respond to page faults by reading and writing the file interface of the
fault object.

Once a fault object has been associated with a hwpt, the hwpt is
iopf-capable, indicated by hwpt->fault is non NULL. Attaching,
detaching, or replacing an iopf-capable hwpt to an RID or PASID will
differ from those that are not iopf-capable.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  9 ++++++
 include/uapi/linux/iommufd.h            |  8 ++++++
 drivers/iommu/iommufd/fault.c           | 17 +++++++++++
 drivers/iommu/iommufd/hw_pagetable.c    | 38 +++++++++++++++++++------
 4 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index aa4c26c87cb9..92efe30a8f0d 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -458,8 +458,17 @@ struct iommufd_attach_handle {
 /* Convert an iommu attach handle to iommufd handle. */
 #define to_iommufd_handle(hdl)	container_of(hdl, struct iommufd_attach_handle, handle)
 
+static inline struct iommufd_fault *
+iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_FAULT),
+			    struct iommufd_fault, obj);
+}
+
 int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
 void iommufd_fault_destroy(struct iommufd_object *obj);
+int iommufd_fault_iopf_handler(struct iopf_group *group);
 
 int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
 				    struct iommufd_device *idev);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 70b8a38fcd46..ede2b464a761 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
  *                                the parent HWPT in a nesting configuration.
  * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
  *                                   enforced on device attachment
+ * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
+ *                             valid.
  */
 enum iommufd_hwpt_alloc_flags {
 	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
 	IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
+	IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
 };
 
 /**
@@ -412,6 +415,9 @@ enum iommu_hwpt_data_type {
  * @data_type: One of enum iommu_hwpt_data_type
  * @data_len: Length of the type specific data
  * @data_uptr: User pointer to the type specific data
+ * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
+ *            IOMMU_HWPT_FAULT_ID_VALID is set.
+ * @__reserved2: Padding to 64-bit alignment. Must be 0.
  *
  * Explicitly allocate a hardware page table object. This is the same object
  * type that is returned by iommufd_device_attach() and represents the
@@ -442,6 +448,8 @@ struct iommu_hwpt_alloc {
 	__u32 data_type;
 	__u32 data_len;
 	__aligned_u64 data_uptr;
+	__u32 fault_id;
+	__u32 __reserved2;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 4934ae572638..54d6cd20a673 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -414,3 +414,20 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
 
 	return rc;
 }
+
+int iommufd_fault_iopf_handler(struct iopf_group *group)
+{
+	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_fault *fault;
+
+	hwpt = group->attach_handle->domain->fault_data;
+	fault = hwpt->fault;
+
+	mutex_lock(&fault->mutex);
+	list_add_tail(&group->node, &fault->deliver);
+	mutex_unlock(&fault->mutex);
+
+	wake_up_interruptible(&fault->wait_queue);
+
+	return 0;
+}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 33d142f8057d..e63f80f087d1 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
+{
+	if (hwpt->domain)
+		iommu_domain_free(hwpt->domain);
+
+	if (hwpt->fault)
+		refcount_dec(&hwpt->fault->obj.users);
+}
+
 void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_hwpt_paging *hwpt_paging =
@@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 					 hwpt_paging->common.domain);
 	}
 
-	if (hwpt_paging->common.domain)
-		iommu_domain_free(hwpt_paging->common.domain);
-
+	__iommufd_hwpt_destroy(&hwpt_paging->common);
 	refcount_dec(&hwpt_paging->ioas->obj.users);
 }
 
@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
 	struct iommufd_hwpt_nested *hwpt_nested =
 		container_of(obj, struct iommufd_hwpt_nested, common.obj);
 
-	if (hwpt_nested->common.domain)
-		iommu_domain_free(hwpt_nested->common.domain);
-
+	__iommufd_hwpt_destroy(&hwpt_nested->common);
 	refcount_dec(&hwpt_nested->parent->common.obj.users);
 }
 
@@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if (flags || !user_data->len || !ops->domain_alloc_user)
+	if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+	    !user_data->len || !ops->domain_alloc_user)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (parent->auto_domain || !parent->nest_parent)
 		return ERR_PTR(-EINVAL);
@@ -227,7 +233,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	refcount_inc(&parent->common.obj.users);
 	hwpt_nested->parent = parent;
 
-	hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+	hwpt->domain = ops->domain_alloc_user(idev->dev,
+					      flags & ~IOMMU_HWPT_FAULT_ID_VALID,
 					      parent->common.domain, user_data);
 	if (IS_ERR(hwpt->domain)) {
 		rc = PTR_ERR(hwpt->domain);
@@ -308,6 +315,21 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_pt;
 	}
 
+	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
+		struct iommufd_fault *fault;
+
+		fault = iommufd_get_fault(ucmd, cmd->fault_id);
+		if (IS_ERR(fault)) {
+			rc = PTR_ERR(fault);
+			goto out_hwpt;
+		}
+		hwpt->fault = fault;
+		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
+		hwpt->domain->fault_data = hwpt;
+		refcount_inc(&fault->obj.users);
+		iommufd_put_object(ucmd->ictx, &fault->obj);
+	}
+
 	cmd->out_hwpt_id = hwpt->obj.id;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 09/10] iommufd/selftest: Add IOPF support for mock device
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (7 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 08/10] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-02  6:34 ` [PATCH v8 10/10] iommufd/selftest: Add coverage for IOPF test Lu Baolu
  2024-07-04 14:18 ` [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Will Deacon
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Extend the selftest mock device to support generating and responding to
an IOPF. Also add an ioctl interface to userspace applications to trigger
the IOPF on the mock device. This would allow userspace applications to
test the IOMMUFD's handling of IOPFs without having to rely on any real
hardware.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  8 ++++
 drivers/iommu/iommufd/selftest.c     | 64 ++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index e854d3f67205..acbbba1c6671 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -22,6 +22,7 @@ enum {
 	IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
 	IOMMU_TEST_OP_DIRTY,
 	IOMMU_TEST_OP_MD_CHECK_IOTLB,
+	IOMMU_TEST_OP_TRIGGER_IOPF,
 };
 
 enum {
@@ -127,6 +128,13 @@ struct iommu_test_cmd {
 			__u32 id;
 			__u32 iotlb;
 		} check_iotlb;
+		struct {
+			__u32 dev_id;
+			__u32 pasid;
+			__u32 grpid;
+			__u32 perm;
+			__u64 addr;
+		} trigger_iopf;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 7a2199470f31..1133f1b2362f 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -504,6 +504,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+static struct iopf_queue *mock_iommu_iopf_queue;
+
 static struct iommu_device mock_iommu_device = {
 };
 
@@ -514,6 +516,29 @@ static struct iommu_device *mock_probe_device(struct device *dev)
 	return &mock_iommu_device;
 }
 
+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt,
+				      struct iommu_page_response *msg)
+{
+}
+
+static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+		return -ENODEV;
+
+	return iopf_queue_add_device(mock_iommu_iopf_queue, dev);
+}
+
+static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+		return -ENODEV;
+
+	iopf_queue_remove_device(mock_iommu_iopf_queue, dev);
+
+	return 0;
+}
+
 static const struct iommu_ops mock_ops = {
 	/*
 	 * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type()
@@ -529,6 +554,10 @@ static const struct iommu_ops mock_ops = {
 	.capable = mock_domain_capable,
 	.device_group = generic_device_group,
 	.probe_device = mock_probe_device,
+	.page_response = mock_domain_page_response,
+	.dev_enable_feat = mock_dev_enable_feat,
+	.dev_disable_feat = mock_dev_disable_feat,
+	.user_pasid_table = true,
 	.default_domain_ops =
 		&(struct iommu_domain_ops){
 			.free = mock_domain_free,
@@ -1375,6 +1404,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id,
 	return rc;
 }
 
+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct iopf_fault event = { };
+	struct iommufd_device *idev;
+
+	idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+
+	event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+	if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID)
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	event.fault.type = IOMMU_FAULT_PAGE_REQ;
+	event.fault.prm.addr = cmd->trigger_iopf.addr;
+	event.fault.prm.pasid = cmd->trigger_iopf.pasid;
+	event.fault.prm.grpid = cmd->trigger_iopf.grpid;
+	event.fault.prm.perm = cmd->trigger_iopf.perm;
+
+	iommu_report_device_fault(idev->dev, &event);
+	iommufd_put_object(ucmd->ictx, &idev->obj);
+
+	return 0;
+}
+
 void iommufd_selftest_destroy(struct iommufd_object *obj)
 {
 	struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1450,6 +1504,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 					  cmd->dirty.page_size,
 					  u64_to_user_ptr(cmd->dirty.uptr),
 					  cmd->dirty.flags);
+	case IOMMU_TEST_OP_TRIGGER_IOPF:
+		return iommufd_test_trigger_iopf(ucmd, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -1491,6 +1547,9 @@ int __init iommufd_test_init(void)
 				  &iommufd_mock_bus_type.nb);
 	if (rc)
 		goto err_sysfs;
+
+	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+
 	return 0;
 
 err_sysfs:
@@ -1506,6 +1565,11 @@ int __init iommufd_test_init(void)
 
 void iommufd_test_exit(void)
 {
+	if (mock_iommu_iopf_queue) {
+		iopf_queue_free(mock_iommu_iopf_queue);
+		mock_iommu_iopf_queue = NULL;
+	}
+
 	iommu_device_sysfs_remove(&mock_iommu_device);
 	iommu_device_unregister_bus(&mock_iommu_device,
 				    &iommufd_mock_bus_type.bus,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v8 10/10] iommufd/selftest: Add coverage for IOPF test
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (8 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 09/10] iommufd/selftest: Add IOPF support for mock device Lu Baolu
@ 2024-07-02  6:34 ` Lu Baolu
  2024-07-04 14:18 ` [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Will Deacon
  10 siblings, 0 replies; 46+ messages in thread
From: Lu Baolu @ 2024-07-02  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Extend the selftest tool to add coverage of testing IOPF handling. This
would include the following tests:

- Allocating and destroying an iommufd fault object.
- Allocating and destroying an IOPF-capable HWPT.
- Attaching/detaching/replacing an IOPF-capable HWPT on a device.
- Triggering an IOPF on the mock device.
- Retrieving and responding to the IOPF through the file interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h | 86 +++++++++++++++++--
 tools/testing/selftests/iommu/iommufd.c       | 22 +++++
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 3 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8d2b46b2114d..a70e35a78584 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
 							   pt_id, NULL))
 
-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id,
 				__u32 flags, __u32 *hwpt_id, __u32 data_type,
 				void *data, size_t data_len)
 {
@@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 		.data_type = data_type,
 		.data_len = data_len,
 		.data_uptr = (uint64_t)data,
+		.fault_id = ft_id,
 	};
 	int ret;
 
@@ -177,24 +178,36 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 }
 
 #define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id)                  \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags,   \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags,   \
 					  hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \
 					  0))
 #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id)   \
 	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(                      \
-				     self->fd, device_id, pt_id, flags, \
+				     self->fd, device_id, pt_id, 0, flags, \
 				     hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))
 
 #define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id,         \
 				   data_type, data, data_len)                \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
 					  hwpt_id, data_type, data, data_len))
 #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \
 				   data_type, data, data_len)                \
 	EXPECT_ERRNO(_errno,                                                 \
-		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
 					  hwpt_id, data_type, data, data_len))
 
+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id,    \
+				   data_type, data, data_len)                   \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \
+					  flags, hwpt_id, data_type, data,      \
+					  data_len))
+#define test_err_hwpt_alloc_iopf(_errno, device_id, pt_id, fault_id, flags,     \
+				 hwpt_id, data_type, data, data_len)            \
+	EXPECT_ERRNO(_errno,                                                    \
+		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \
+					  flags, hwpt_id, data_type, data,      \
+					  data_len))
+
 #define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected)                 \
 	({                                                                     \
 		struct iommu_test_cmd test_cmd = {                             \
@@ -684,3 +697,66 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
 
 #define test_cmd_get_hw_capabilities(device_id, caps, mask) \
 	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+
+static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd)
+{
+	struct iommu_fault_alloc cmd = {
+		.size = sizeof(cmd),
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_FAULT_QUEUE_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	*fault_id = cmd.out_fault_id;
+	*fault_fd = cmd.out_fault_fd;
+	return 0;
+}
+
+#define test_ioctl_fault_alloc(fault_id, fault_fd)                       \
+	({                                                               \
+		ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \
+						     fault_fd));         \
+		ASSERT_NE(0, *(fault_id));                               \
+		ASSERT_NE(0, *(fault_fd));                               \
+	})
+
+static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
+{
+	struct iommu_test_cmd trigger_iopf_cmd = {
+		.size = sizeof(trigger_iopf_cmd),
+		.op = IOMMU_TEST_OP_TRIGGER_IOPF,
+		.trigger_iopf = {
+			.dev_id = device_id,
+			.pasid = 0x1,
+			.grpid = 0x2,
+			.perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE,
+			.addr = 0xdeadbeaf,
+		},
+	};
+	struct iommu_hwpt_page_response response = {
+		.code = IOMMUFD_PAGE_RESP_SUCCESS,
+	};
+	struct iommu_hwpt_pgfault fault = {};
+	ssize_t bytes;
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd);
+	if (ret)
+		return ret;
+
+	bytes = read(fault_fd, &fault, sizeof(fault));
+	if (bytes <= 0)
+		return -EIO;
+
+	response.cookie = fault.cookie;
+
+	bytes = write(fault_fd, &response, sizeof(response));
+	if (bytes <= 0)
+		return -EIO;
+
+	return 0;
+}
+
+#define test_cmd_trigger_iopf(device_id, fault_fd) \
+	ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index edf1c99c9936..93634e53e95e 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -279,6 +279,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 	uint32_t parent_hwpt_id = 0;
 	uint32_t parent_hwpt_id_not_work = 0;
 	uint32_t test_hwpt_id = 0;
+	uint32_t iopf_hwpt_id;
+	uint32_t fault_id;
+	uint32_t fault_fd;
 
 	if (self->device_id) {
 		/* Negative tests */
@@ -326,6 +329,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   sizeof(data));
 
 		/* Allocate two nested hwpts sharing one common parent hwpt */
+		test_ioctl_fault_alloc(&fault_id, &fault_fd);
 		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
 					   &nested_hwpt_id[0],
 					   IOMMU_HWPT_DATA_SELFTEST, &data,
@@ -334,6 +338,14 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   &nested_hwpt_id[1],
 					   IOMMU_HWPT_DATA_SELFTEST, &data,
 					   sizeof(data));
+		test_err_hwpt_alloc_iopf(ENOENT, self->device_id, parent_hwpt_id,
+					 UINT32_MAX, IOMMU_HWPT_FAULT_ID_VALID,
+					 &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST,
+					 &data, sizeof(data));
+		test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+					 IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+					 IOMMU_HWPT_DATA_SELFTEST, &data,
+					 sizeof(data));
 		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0],
 					      IOMMU_TEST_IOTLB_DEFAULT);
 		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1],
@@ -504,14 +516,24 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 			     _test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
 		test_ioctl_destroy(nested_hwpt_id[0]);
 
+		/* Switch from nested_hwpt_id[1] to iopf_hwpt_id */
+		test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+		/* Trigger an IOPF on the device */
+		test_cmd_trigger_iopf(self->device_id, fault_fd);
+
 		/* Detach from nested_hwpt_id[1] and destroy it */
 		test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
 		test_ioctl_destroy(nested_hwpt_id[1]);
+		test_ioctl_destroy(iopf_hwpt_id);
 
 		/* Detach from the parent hw_pagetable and destroy it */
 		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
 		test_ioctl_destroy(parent_hwpt_id);
 		test_ioctl_destroy(parent_hwpt_id_not_work);
+		close(fault_fd);
+		test_ioctl_destroy(fault_id);
 	} else {
 		test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0,
 				    &parent_hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index f590417cd67a..c5d5e69452b0 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id,
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-02  6:34 ` [PATCH v8 06/10] iommufd: Add iommufd fault object Lu Baolu
@ 2024-07-03 23:06   ` Nicolin Chen
  2024-07-04  2:59     ` Baolu Lu
  2024-07-08 16:29     ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Nicolin Chen @ 2024-07-03 23:06 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Jason Gunthorpe

Hi Baolu,

On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:

> An iommufd fault object provides an interface for delivering I/O page
> faults to user space. These objects are created and destroyed by user
> space, and they can be associated with or dissociated from hardware page
> table objects during page table allocation or destruction.
> 
> User space interacts with the fault object through a file interface. This
> interface offers a straightforward and efficient way for user space to
> handle page faults. It allows user space to read fault messages
> sequentially and respond to them by writing to the same file. The file
> interface supports reading messages in poll mode, so it's recommended that
> user space applications use io_uring to enhance read and write efficiency.
> 
> A fault object can be associated with any iopf-capable iommufd_hw_pgtable
> during the pgtable's allocation. All I/O page faults triggered by devices
> when accessing the I/O addresses of an iommufd_hw_pgtable are routed
> through the fault object to user space. Similarly, user space's responses
> to these page faults are routed back to the iommu device driver through
> the same fault object.

There is a need for VIOMMU object to report HW fault to VMM. For
example, a HW-accelerated VCMDQ may encounter HW errors. HW will
raise an IRQ to the host kernel and the host kernel will forward
it to the guest. I think we can have a viommu->fault, similar to
the hwpt->fault introduced by this series. This viommu->fault can
also benefit nested IOMMU for reporting translation error.

I learned that this hwpt->fault is exclusively for IOPF/PRI. And
Jason suggested me to add a different one for VIOMMU. Yet, after
taking a closer look, I found the fault object in this series is
seemingly quite generic at the uAPI level: its naming/structure,
and the way how it's allocated and passed to hwpt, despite being
highly correlated with IOPF in its fops code. So, I feel that we
might have a chance of reusing it for different fault types:

+enum iommu_fault_type {
+	IOMMU_FAULT_TYPE_HWPT_IOPF,
+	IOMMU_FAULT_TYPE_VIOMMU_IRQ,
+};

 struct iommu_fault_alloc {
 	__u32 size;
 	__u32 flags;
+	__u32 type;  /* enum iommu_fault_type */
 	__u32 out_fault_id;
 	__u32 out_fault_fd;
 };

I understand that this is already v8. So, maybe we can, for now,
apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF type
check in the ioctl handler. And a decoupling for the iopf fops in
the ioctl handler can come later in the viommu series:
	switch (type) {
	case IOMMU_FAULT_TYPE_HWPT_IOPF:
		filep = anon_inode_getfile("[iommufd-pgfault]",
					   &iommufd_fault_fops_iopf);
	case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
		filep = anon_inode_getfile("[iommufd-viommu-irq]",
					   &iommufd_fault_fops_viommu);
	default:
		return -EOPNOSUPP;
	}

Since you are the designer here, I think you have a better 10000
foot view -- maybe I am missing something here implying that the
fault object can't be really reused by viommu.

Would you mind sharing some thoughts here?

Thanks
Nic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-03 23:06   ` Nicolin Chen
@ 2024-07-04  2:59     ` Baolu Lu
  2024-07-04  5:36       ` Nicolin Chen
  2024-07-08 16:29     ` Jason Gunthorpe
  1 sibling, 1 reply; 46+ messages in thread
From: Baolu Lu @ 2024-07-04  2:59 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Jason Gunthorpe

On 7/4/24 7:06 AM, Nicolin Chen wrote:
> Hi Baolu,

Hi Nicolin,

> On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
> 
>> An iommufd fault object provides an interface for delivering I/O page
>> faults to user space. These objects are created and destroyed by user
>> space, and they can be associated with or dissociated from hardware page
>> table objects during page table allocation or destruction.
>>
>> User space interacts with the fault object through a file interface. This
>> interface offers a straightforward and efficient way for user space to
>> handle page faults. It allows user space to read fault messages
>> sequentially and respond to them by writing to the same file. The file
>> interface supports reading messages in poll mode, so it's recommended that
>> user space applications use io_uring to enhance read and write efficiency.
>>
>> A fault object can be associated with any iopf-capable iommufd_hw_pgtable
>> during the pgtable's allocation. All I/O page faults triggered by devices
>> when accessing the I/O addresses of an iommufd_hw_pgtable are routed
>> through the fault object to user space. Similarly, user space's responses
>> to these page faults are routed back to the iommu device driver through
>> the same fault object.
> There is a need for VIOMMU object to report HW fault to VMM. For
> example, a HW-accelerated VCMDQ may encounter HW errors. HW will
> raise an IRQ to the host kernel and the host kernel will forward
> it to the guest. I think we can have a viommu->fault, similar to
> the hwpt->fault introduced by this series. This viommu->fault can
> also benefit nested IOMMU for reporting translation error.
> 
> I learned that this hwpt->fault is exclusively for IOPF/PRI. And
> Jason suggested me to add a different one for VIOMMU. Yet, after
> taking a closer look, I found the fault object in this series is
> seemingly quite generic at the uAPI level: its naming/structure,
> and the way how it's allocated and passed to hwpt, despite being
> highly correlated with IOPF in its fops code. So, I feel that we
> might have a chance of reusing it for different fault types:
> 
> +enum iommu_fault_type {
> +	IOMMU_FAULT_TYPE_HWPT_IOPF,
> +	IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> +};
> 
>   struct iommu_fault_alloc {
>   	__u32 size;
>   	__u32 flags;
> +	__u32 type;  /* enum iommu_fault_type */
>   	__u32 out_fault_id;
>   	__u32 out_fault_fd;
>   };
> 
> I understand that this is already v8. So, maybe we can, for now,
> apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF type
> check in the ioctl handler. And a decoupling for the iopf fops in
> the ioctl handler can come later in the viommu series:
> 	switch (type) {
> 	case IOMMU_FAULT_TYPE_HWPT_IOPF:
> 		filep = anon_inode_getfile("[iommufd-pgfault]",
> 					   &iommufd_fault_fops_iopf);
> 	case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
> 		filep = anon_inode_getfile("[iommufd-viommu-irq]",
> 					   &iommufd_fault_fops_viommu);
> 	default:
> 		return -EOPNOSUPP;
> 	}
> 
> Since you are the designer here, I think you have a better 10000
> foot view -- maybe I am missing something here implying that the
> fault object can't be really reused by viommu.
> 
> Would you mind sharing some thoughts here?

I think this is a choice between "two different objects" vs. "same
object with different FD interfaces". If I understand it correctly, your
proposal of unrecoverable fault delivery is not limited to vcmdq, but
generic to all unrecoverable events that userspace should be aware of
when the passed-through device is affected.

 From a hardware architecture perspective, the interfaces for
unrecoverable events don't always match the page faults. For example,
VT-d architecture defines a PR queue for page faults, but uses a
register set to report unrecoverable events. The 'reason', 'request id'
and 'pasid' fields of the register set indicate what happened on the
hardware. New unrecoverable events will not be reported until the
previous one has been fetched.

With the above being said, I have no strong opinions between these two
choices. Jason and Kevin should have more insights.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-04  2:59     ` Baolu Lu
@ 2024-07-04  5:36       ` Nicolin Chen
  2024-07-04  6:37         ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolin Chen @ 2024-07-04  5:36 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Jason Gunthorpe

On Thu, Jul 04, 2024 at 10:59:45AM +0800, Baolu Lu wrote:
> > On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
> > 
> > > An iommufd fault object provides an interface for delivering I/O page
> > > faults to user space. These objects are created and destroyed by user
> > > space, and they can be associated with or dissociated from hardware page
> > > table objects during page table allocation or destruction.
> > > 
> > > User space interacts with the fault object through a file interface. This
> > > interface offers a straightforward and efficient way for user space to
> > > handle page faults. It allows user space to read fault messages
> > > sequentially and respond to them by writing to the same file. The file
> > > interface supports reading messages in poll mode, so it's recommended that
> > > user space applications use io_uring to enhance read and write efficiency.
> > > 
> > > A fault object can be associated with any iopf-capable iommufd_hw_pgtable
> > > during the pgtable's allocation. All I/O page faults triggered by devices
> > > when accessing the I/O addresses of an iommufd_hw_pgtable are routed
> > > through the fault object to user space. Similarly, user space's responses
> > > to these page faults are routed back to the iommu device driver through
> > > the same fault object.
> > There is a need for VIOMMU object to report HW fault to VMM. For
> > example, a HW-accelerated VCMDQ may encounter HW errors. HW will
> > raise an IRQ to the host kernel and the host kernel will forward
> > it to the guest. I think we can have a viommu->fault, similar to
> > the hwpt->fault introduced by this series. This viommu->fault can
> > also benefit nested IOMMU for reporting translation error.
> > 
> > I learned that this hwpt->fault is exclusively for IOPF/PRI. And
> > Jason suggested me to add a different one for VIOMMU. Yet, after
> > taking a closer look, I found the fault object in this series is
> > seemingly quite generic at the uAPI level: its naming/structure,
> > and the way how it's allocated and passed to hwpt, despite being
> > highly correlated with IOPF in its fops code. So, I feel that we
> > might have a chance of reusing it for different fault types:
> > 
> > +enum iommu_fault_type {
> > +     IOMMU_FAULT_TYPE_HWPT_IOPF,
> > +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > +};
> > 
> >   struct iommu_fault_alloc {
> >       __u32 size;
> >       __u32 flags;
> > +     __u32 type;  /* enum iommu_fault_type */
> >       __u32 out_fault_id;
> >       __u32 out_fault_fd;
> >   };
> > 
> > I understand that this is already v8. So, maybe we can, for now,
> > apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF type
> > check in the ioctl handler. And a decoupling for the iopf fops in
> > the ioctl handler can come later in the viommu series:
> >       switch (type) {
> >       case IOMMU_FAULT_TYPE_HWPT_IOPF:
> >               filep = anon_inode_getfile("[iommufd-pgfault]",
> >                                          &iommufd_fault_fops_iopf);
> >       case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
> >               filep = anon_inode_getfile("[iommufd-viommu-irq]",
> >                                          &iommufd_fault_fops_viommu);
> >       default:
> >               return -EOPNOSUPP;
> >       }
> > 
> > Since you are the designer here, I think you have a better 10000
> > foot view -- maybe I am missing something here implying that the
> > fault object can't be really reused by viommu.
> > 
> > Would you mind sharing some thoughts here?
> 
> I think this is a choice between "two different objects" vs. "same
> object with different FD interfaces". If I understand it correctly, your
> proposal of unrecoverable fault delivery is not limited to vcmdq, but
> generic to all unrecoverable events that userspace should be aware of
> when the passed-through device is affected.

It's basically IRQ forwarding, not confined to unrecoverable
faults. For example, a VCMDQ used by the guest kernel would
raise an HW IRQ if the guest kernel issues an illegal command
to the HW Queue assigned to it. The host kernel will receive
the IRQ, so it needs a way to forward it to the VM for guest
kernel to recover the HW queue.

The way that we define the structure can follow what we have
for hwpt_alloc/invalidate uAPIs, i.e. driver data/event. And
such an event can carry unrecoverable translation faults too.
SMMU at least reports DMA translation faults using an eventQ
in its own native language.

> From a hardware architecture perspective, the interfaces for
> unrecoverable events don't always match the page faults. For example,
> VT-d architecture defines a PR queue for page faults, but uses a
> register set to report unrecoverable events. The 'reason', 'request id'
> and 'pasid' fields of the register set indicate what happened on the
> hardware. New unrecoverable events will not be reported until the
> previous one has been fetched.

Understood. I don't think we can share the majority pieces in
the fault.c. Just the "IOMMU_FAULT_QUEUE_ALLOC" ioctl itself
looks way too general to be limited to page-fault usage only.
So, I feel we can share, for example:
    IOMMU_FAULT_QUEUE_ALLOC (type=hwpt_iopf) -> fault_id=1
    IOMMU_HWPT_ALLOC (fault_id=1) -> hwpt_id=2
    IOMMU_FAULT_QUEUE_ALLOC (type=viommu_irq) -> fault_id=3
    IOMMU_VIOMMU_ALLOC (fault_id=2) -> viommu_id=4
The handler will direct to different fops as I drafted in my
previous mail.

> With the above being said, I have no strong opinions between these two
> choices. Jason and Kevin should have more insights.

Thanks. Jason is out of office this week, so hopefully Kevin
may shed some light. I personally feel that we don't need to
largely update this series until we add VIOMMU. Yet, it would
be convenient if we add a "type" in the uAPI with this series.

Thank you
Nic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-04  5:36       ` Nicolin Chen
@ 2024-07-04  6:37         ` Tian, Kevin
  2024-07-04  7:32           ` Baolu Lu
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2024-07-04  6:37 UTC (permalink / raw)
  To: Nicolin Chen, Baolu Lu
  Cc: Jason Gunthorpe, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Liu, Yi L, Jacob Pan, Joel Granados,
	iommu@lists.linux.dev, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Jason Gunthorpe

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, July 4, 2024 1:36 PM
> 
> On Thu, Jul 04, 2024 at 10:59:45AM +0800, Baolu Lu wrote:
> > > On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
> > >
> > > +enum iommu_fault_type {
> > > +     IOMMU_FAULT_TYPE_HWPT_IOPF,
> > > +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > > +};
> > >
> > >   struct iommu_fault_alloc {
> > >       __u32 size;
> > >       __u32 flags;
> > > +     __u32 type;  /* enum iommu_fault_type */
> > >       __u32 out_fault_id;
> > >       __u32 out_fault_fd;

and need a new reserved field for alignment.

> > >   };
> > >
> > > I understand that this is already v8. So, maybe we can, for now,
> > > apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF
> type
> > > check in the ioctl handler. And a decoupling for the iopf fops in
> > > the ioctl handler can come later in the viommu series:
> > >       switch (type) {
> > >       case IOMMU_FAULT_TYPE_HWPT_IOPF:
> > >               filep = anon_inode_getfile("[iommufd-pgfault]",
> > >                                          &iommufd_fault_fops_iopf);
> > >       case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
> > >               filep = anon_inode_getfile("[iommufd-viommu-irq]",
> > >                                          &iommufd_fault_fops_viommu);
> > >       default:
> > >               return -EOPNOSUPP;
> > >       }
> > >
> > > Since you are the designer here, I think you have a better 10000
> > > foot view -- maybe I am missing something here implying that the
> > > fault object can't be really reused by viommu.
> > >
> > > Would you mind sharing some thoughts here?
> >
> > I think this is a choice between "two different objects" vs. "same
> > object with different FD interfaces". If I understand it correctly, your
> > proposal of unrecoverable fault delivery is not limited to vcmdq, but
> > generic to all unrecoverable events that userspace should be aware of
> > when the passed-through device is affected.
> 
> It's basically IRQ forwarding, not confined to unrecoverable
> faults. For example, a VCMDQ used by the guest kernel would
> raise an HW IRQ if the guest kernel issues an illegal command
> to the HW Queue assigned to it. The host kernel will receive
> the IRQ, so it needs a way to forward it to the VM for guest
> kernel to recover the HW queue.
> 
> The way that we define the structure can follow what we have
> for hwpt_alloc/invalidate uAPIs, i.e. driver data/event. And
> such an event can carry unrecoverable translation faults too.
> SMMU at least reports DMA translation faults using an eventQ
> in its own native language.
> 
> > From a hardware architecture perspective, the interfaces for
> > unrecoverable events don't always match the page faults. For example,
> > VT-d architecture defines a PR queue for page faults, but uses a
> > register set to report unrecoverable events. The 'reason', 'request id'
> > and 'pasid' fields of the register set indicate what happened on the
> > hardware. New unrecoverable events will not be reported until the
> > previous one has been fetched.
> 
> Understood. I don't think we can share the majority pieces in
> the fault.c. Just the "IOMMU_FAULT_QUEUE_ALLOC" ioctl itself
> looks way too general to be limited to page-fault usage only.
> So, I feel we can share, for example:
>     IOMMU_FAULT_QUEUE_ALLOC (type=hwpt_iopf) -> fault_id=1
>     IOMMU_HWPT_ALLOC (fault_id=1) -> hwpt_id=2
>     IOMMU_FAULT_QUEUE_ALLOC (type=viommu_irq) -> fault_id=3
>     IOMMU_VIOMMU_ALLOC (fault_id=2) -> viommu_id=4
> The handler will direct to different fops as I drafted in my
> previous mail.
> 
> > With the above being said, I have no strong opinions between these two
> > choices. Jason and Kevin should have more insights.
> 
> Thanks. Jason is out of office this week, so hopefully Kevin
> may shed some light. I personally feel that we don't need to
> largely update this series until we add VIOMMU. Yet, it would
> be convenient if we add a "type" in the uAPI with this series.
> 

This is ok to me.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-04  6:37         ` Tian, Kevin
@ 2024-07-04  7:32           ` Baolu Lu
  2024-07-04 23:18             ` Nicolin Chen
  0 siblings, 1 reply; 46+ messages in thread
From: Baolu Lu @ 2024-07-04  7:32 UTC (permalink / raw)
  To: Tian, Kevin, Nicolin Chen
  Cc: baolu.lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Liu, Yi L, Jacob Pan,
	Joel Granados, iommu@lists.linux.dev,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Jason Gunthorpe

On 2024/7/4 14:37, Tian, Kevin wrote:
>> From: Nicolin Chen<nicolinc@nvidia.com>
>> Sent: Thursday, July 4, 2024 1:36 PM
>>
>> On Thu, Jul 04, 2024 at 10:59:45AM +0800, Baolu Lu wrote:
>>>> On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
>>>>
>>>> +enum iommu_fault_type {
>>>> +     IOMMU_FAULT_TYPE_HWPT_IOPF,
>>>> +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
>>>> +};
>>>>
>>>>    struct iommu_fault_alloc {
>>>>        __u32 size;
>>>>        __u32 flags;
>>>> +     __u32 type;  /* enum iommu_fault_type */
>>>>        __u32 out_fault_id;
>>>>        __u32 out_fault_fd;
> and need a new reserved field for alignment.
> 
>>>>    };
>>>>
>>>> I understand that this is already v8. So, maybe we can, for now,
>>>> apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF
>> type
>>>> check in the ioctl handler. And a decoupling for the iopf fops in
>>>> the ioctl handler can come later in the viommu series:
>>>>        switch (type) {
>>>>        case IOMMU_FAULT_TYPE_HWPT_IOPF:
>>>>                filep = anon_inode_getfile("[iommufd-pgfault]",
>>>>                                           &iommufd_fault_fops_iopf);
>>>>        case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
>>>>                filep = anon_inode_getfile("[iommufd-viommu-irq]",
>>>>                                           &iommufd_fault_fops_viommu);
>>>>        default:
>>>>                return -EOPNOSUPP;
>>>>        }
>>>>
>>>> Since you are the designer here, I think you have a better 10000
>>>> foot view -- maybe I am missing something here implying that the
>>>> fault object can't be really reused by viommu.
>>>>
>>>> Would you mind sharing some thoughts here?
>>> I think this is a choice between "two different objects" vs. "same
>>> object with different FD interfaces". If I understand it correctly, your
>>> proposal of unrecoverable fault delivery is not limited to vcmdq, but
>>> generic to all unrecoverable events that userspace should be aware of
>>> when the passed-through device is affected.
>> It's basically IRQ forwarding, not confined to unrecoverable
>> faults. For example, a VCMDQ used by the guest kernel would
>> raise an HW IRQ if the guest kernel issues an illegal command
>> to the HW Queue assigned to it. The host kernel will receive
>> the IRQ, so it needs a way to forward it to the VM for guest
>> kernel to recover the HW queue.
>>
>> The way that we define the structure can follow what we have
>> for hwpt_alloc/invalidate uAPIs, i.e. driver data/event. And
>> such an event can carry unrecoverable translation faults too.
>> SMMU at least reports DMA translation faults using an eventQ
>> in its own native language.
>>
>>>  From a hardware architecture perspective, the interfaces for
>>> unrecoverable events don't always match the page faults. For example,
>>> VT-d architecture defines a PR queue for page faults, but uses a
>>> register set to report unrecoverable events. The 'reason', 'request id'
>>> and 'pasid' fields of the register set indicate what happened on the
>>> hardware. New unrecoverable events will not be reported until the
>>> previous one has been fetched.
>> Understood. I don't think we can share the majority pieces in
>> the fault.c. Just the "IOMMU_FAULT_QUEUE_ALLOC" ioctl itself
>> looks way too general to be limited to page-fault usage only.
>> So, I feel we can share, for example:
>>      IOMMU_FAULT_QUEUE_ALLOC (type=hwpt_iopf) -> fault_id=1
>>      IOMMU_HWPT_ALLOC (fault_id=1) -> hwpt_id=2
>>      IOMMU_FAULT_QUEUE_ALLOC (type=viommu_irq) -> fault_id=3
>>      IOMMU_VIOMMU_ALLOC (fault_id=2) -> viommu_id=4
>> The handler will direct to different fops as I drafted in my
>> previous mail.
>>
>>> With the above being said, I have no strong opinions between these two
>>> choices. Jason and Kevin should have more insights.
>> Thanks. Jason is out of office this week, so hopefully Kevin
>> may shed some light. I personally feel that we don't need to
>> largely update this series until we add VIOMMU. Yet, it would
>> be convenient if we add a "type" in the uAPI with this series.
>>
> This is ok to me.

So

Nicolin, perhaps can you please cook an additional patch on top of this
series and post it for further review?

Thanks,
baolu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space
  2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (9 preceding siblings ...)
  2024-07-02  6:34 ` [PATCH v8 10/10] iommufd/selftest: Add coverage for IOPF test Lu Baolu
@ 2024-07-04 14:18 ` Will Deacon
  2024-07-09 17:23   ` Jason Gunthorpe
  10 siblings, 1 reply; 46+ messages in thread
From: Will Deacon @ 2024-07-04 14:18 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, Lu Baolu
  Cc: catalin.marinas, kernel-team, Will Deacon, iommu, virtualization,
	linux-kernel

On Tue, 02 Jul 2024 14:34:34 +0800, Lu Baolu wrote:
> This series implements the functionality of delivering IO page faults to
> user space through the IOMMUFD framework. One feasible use case is the
> nested translation. Nested translation is a hardware feature that
> supports two-stage translation tables for IOMMU. The second-stage
> translation table is managed by the host VMM, while the first-stage
> translation table is owned by user space. This allows user space to
> control the IOMMU mappings for its devices.
> 
> [...]

Applied first four to iommu (iommufd/attach-handles), thanks!

[01/10] iommu: Introduce domain attachment handle
        https://git.kernel.org/iommu/c/14678219cf40
[02/10] iommu: Remove sva handle list
        https://git.kernel.org/iommu/c/3e7f57d1ef3f
[03/10] iommu: Add attach handle to struct iopf_group
        https://git.kernel.org/iommu/c/06cdcc32d657
[04/10] iommu: Extend domain attach group with handle support
        https://git.kernel.org/iommu/c/8519e689834a

Jason -- feel free to use this branch as a base on which you can take
the iommufd parts.

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-04  7:32           ` Baolu Lu
@ 2024-07-04 23:18             ` Nicolin Chen
  2024-07-05  0:49               ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolin Chen @ 2024-07-04 23:18 UTC (permalink / raw)
  To: Tian, Kevin, Baolu Lu
  Cc: Jason Gunthorpe, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Liu, Yi L, Jacob Pan, Joel Granados,
	iommu@lists.linux.dev, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Jason Gunthorpe

On Thu, Jul 04, 2024 at 03:32:32PM +0800, Baolu Lu wrote:
> On 2024/7/4 14:37, Tian, Kevin wrote:
> > > From: Nicolin Chen<nicolinc@nvidia.com>
> > > Sent: Thursday, July 4, 2024 1:36 PM
> > > 
> > > On Thu, Jul 04, 2024 at 10:59:45AM +0800, Baolu Lu wrote:
> > > > > On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
> > > > > 
> > > > > +enum iommu_fault_type {
> > > > > +     IOMMU_FAULT_TYPE_HWPT_IOPF,
> > > > > +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > > > > +};
> > > > > 
> > > > >    struct iommu_fault_alloc {
> > > > >        __u32 size;
> > > > >        __u32 flags;
> > > > > +     __u32 type;  /* enum iommu_fault_type */
> > > > >        __u32 out_fault_id;
> > > > >        __u32 out_fault_fd;
> > and need a new reserved field for alignment.

Hmm, what's the reason for enforcing a 64-bit alignment to an
all-u32 struct though? I thought we need a reserved field only
for padding. The struct iommu_ioas_alloc has three u32 members
for example?

> > > > >    };
> > > > > 
> > > > > I understand that this is already v8. So, maybe we can, for now,
> > > > > apply the small diff above with an IOMMU_FAULT_TYPE_HWPT_IOPF
> > > type
> > > > > check in the ioctl handler. And a decoupling for the iopf fops in
> > > > > the ioctl handler can come later in the viommu series:
> > > > >        switch (type) {
> > > > >        case IOMMU_FAULT_TYPE_HWPT_IOPF:
> > > > >                filep = anon_inode_getfile("[iommufd-pgfault]",
> > > > >                                           &iommufd_fault_fops_iopf);
> > > > >        case IOMMU_FAULT_TYPE_VIOMMU_IRQ:
> > > > >                filep = anon_inode_getfile("[iommufd-viommu-irq]",
> > > > >                                           &iommufd_fault_fops_viommu);
> > > > >        default:
> > > > >                return -EOPNOSUPP;
> > > > >        }
> > > > > 
> > > > > Since you are the designer here, I think you have a better 10000
> > > > > foot view -- maybe I am missing something here implying that the
> > > > > fault object can't be really reused by viommu.
> > > > > 
> > > > > Would you mind sharing some thoughts here?
> > > > I think this is a choice between "two different objects" vs. "same
> > > > object with different FD interfaces". If I understand it correctly, your
> > > > proposal of unrecoverable fault delivery is not limited to vcmdq, but
> > > > generic to all unrecoverable events that userspace should be aware of
> > > > when the passed-through device is affected.
> > > It's basically IRQ forwarding, not confined to unrecoverable
> > > faults. For example, a VCMDQ used by the guest kernel would
> > > raise an HW IRQ if the guest kernel issues an illegal command
> > > to the HW Queue assigned to it. The host kernel will receive
> > > the IRQ, so it needs a way to forward it to the VM for guest
> > > kernel to recover the HW queue.
> > > 
> > > The way that we define the structure can follow what we have
> > > for hwpt_alloc/invalidate uAPIs, i.e. driver data/event. And
> > > such an event can carry unrecoverable translation faults too.
> > > SMMU at least reports DMA translation faults using an eventQ
> > > in its own native language.
> > > 
> > > >  From a hardware architecture perspective, the interfaces for
> > > > unrecoverable events don't always match the page faults. For example,
> > > > VT-d architecture defines a PR queue for page faults, but uses a
> > > > register set to report unrecoverable events. The 'reason', 'request id'
> > > > and 'pasid' fields of the register set indicate what happened on the
> > > > hardware. New unrecoverable events will not be reported until the
> > > > previous one has been fetched.
> > > Understood. I don't think we can share the majority pieces in
> > > the fault.c. Just the "IOMMU_FAULT_QUEUE_ALLOC" ioctl itself
> > > looks way too general to be limited to page-fault usage only.
> > > So, I feel we can share, for example:
> > >      IOMMU_FAULT_QUEUE_ALLOC (type=hwpt_iopf) -> fault_id=1
> > >      IOMMU_HWPT_ALLOC (fault_id=1) -> hwpt_id=2
> > >      IOMMU_FAULT_QUEUE_ALLOC (type=viommu_irq) -> fault_id=3
> > >      IOMMU_VIOMMU_ALLOC (fault_id=2) -> viommu_id=4
> > > The handler will direct to different fops as I drafted in my
> > > previous mail.
> > > 
> > > > With the above being said, I have no strong opinions between these two
> > > > choices. Jason and Kevin should have more insights.
> > > Thanks. Jason is out of office this week, so hopefully Kevin
> > > may shed some light. I personally feel that we don't need to
> > > largely update this series until we add VIOMMU. Yet, it would
> > > be convenient if we add a "type" in the uAPI with this series.
> > > 
> > This is ok to me.
> 
> So
> 
> Nicolin, perhaps can you please cook an additional patch on top of this
> series and post it for further review?

Thank you both for the inputs. Yea, so long as we merge them
in the same cycle, it won't be a uAPI breakage. I will draft
an incremental one. And Jason can make a final call.

Nicolin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-04 23:18             ` Nicolin Chen
@ 2024-07-05  0:49               ` Tian, Kevin
  2024-07-08 16:22                 ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2024-07-05  0:49 UTC (permalink / raw)
  To: Nicolin Chen, Baolu Lu
  Cc: Jason Gunthorpe, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Liu, Yi L, Jacob Pan, Joel Granados,
	iommu@lists.linux.dev, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Jason Gunthorpe

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, July 5, 2024 7:19 AM
> 
> On Thu, Jul 04, 2024 at 03:32:32PM +0800, Baolu Lu wrote:
> > On 2024/7/4 14:37, Tian, Kevin wrote:
> > > > From: Nicolin Chen<nicolinc@nvidia.com>
> > > > Sent: Thursday, July 4, 2024 1:36 PM
> > > >
> > > > On Thu, Jul 04, 2024 at 10:59:45AM +0800, Baolu Lu wrote:
> > > > > > On Tue, Jul 02, 2024 at 02:34:40PM +0800, Lu Baolu wrote:
> > > > > >
> > > > > > +enum iommu_fault_type {
> > > > > > +     IOMMU_FAULT_TYPE_HWPT_IOPF,
> > > > > > +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > > > > > +};
> > > > > >
> > > > > >    struct iommu_fault_alloc {
> > > > > >        __u32 size;
> > > > > >        __u32 flags;
> > > > > > +     __u32 type;  /* enum iommu_fault_type */
> > > > > >        __u32 out_fault_id;
> > > > > >        __u32 out_fault_fd;
> > > and need a new reserved field for alignment.
> 
> Hmm, what's the reason for enforcing a 64-bit alignment to an
> all-u32 struct though? I thought we need a reserved field only
> for padding. The struct iommu_ioas_alloc has three u32 members
> for example?
> 

yeah please ignore this comment.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-05  0:49               ` Tian, Kevin
@ 2024-07-08 16:22                 ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2024-07-08 16:22 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Liu, Yi L, Jacob Pan, Joel Granados,
	iommu@lists.linux.dev, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org

On Fri, Jul 05, 2024 at 12:49:10AM +0000, Tian, Kevin wrote:

> > > > > > > +enum iommu_fault_type {
> > > > > > > +     IOMMU_FAULT_TYPE_HWPT_IOPF,
> > > > > > > +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > > > > > > +};
> > > > > > >
> > > > > > >    struct iommu_fault_alloc {
> > > > > > >        __u32 size;
> > > > > > >        __u32 flags;
> > > > > > > +     __u32 type;  /* enum iommu_fault_type */
> > > > > > >        __u32 out_fault_id;
> > > > > > >        __u32 out_fault_fd;
> > > > and need a new reserved field for alignment.
> > 
> > Hmm, what's the reason for enforcing a 64-bit alignment to an
> > all-u32 struct though? I thought we need a reserved field only
> > for padding. The struct iommu_ioas_alloc has three u32 members
> > for example?
> 
> yeah please ignore this comment.

Sometimes I encourage it so that people notice the if the structure is
changed later. Almost all structs here are 8 byte aligned. It is OK
like this too.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-03 23:06   ` Nicolin Chen
  2024-07-04  2:59     ` Baolu Lu
@ 2024-07-08 16:29     ` Jason Gunthorpe
  2024-07-08 18:36       ` Nicolin Chen
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-07-08 16:29 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Yi Liu, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

On Wed, Jul 03, 2024 at 04:06:15PM -0700, Nicolin Chen wrote:

> I learned that this hwpt->fault is exclusively for IOPF/PRI. And
> Jason suggested me to add a different one for VIOMMU. Yet, after
> taking a closer look, I found the fault object in this series is
> seemingly quite generic at the uAPI level: its naming/structure,
> and the way how it's allocated and passed to hwpt, despite being
> highly correlated with IOPF in its fops code. So, I feel that we
> might have a chance of reusing it for different fault types:
> 
> +enum iommu_fault_type {
> +	IOMMU_FAULT_TYPE_HWPT_IOPF,
> +	IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> +};
> 
>  struct iommu_fault_alloc {
>  	__u32 size;
>  	__u32 flags;
> +	__u32 type;  /* enum iommu_fault_type */
>  	__u32 out_fault_id;
>  	__u32 out_fault_fd;
>  };

I think I would just add the type at the end of the struct and rely on
our existing 0 is backwards compat mechanism. 0 means HWPT_IOPF. ie no
need to do anything now.

It would make some sense to call this a "report" object than "fault"
if we are going to use it for different things. We could probably
rename it without much trouble. There is also not a significant issue
with having two alloc commands for FDs.

I'd also think VIOMMU_IRQ is probably not that right abstraction,
likely it makes more sense to push driver-specific event messages sort
of like IOPF and one of the messages can indicate a arm-smmu-v3 VCDMQ
interrupt, other messages could indicate BAD_CD and similar sorts of
events we might want to capture and forward.

So, I'm inclined to just take this series as-is

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-08 16:29     ` Jason Gunthorpe
@ 2024-07-08 18:36       ` Nicolin Chen
  2024-07-09 17:00         ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolin Chen @ 2024-07-08 18:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Yi Liu, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

On Mon, Jul 08, 2024 at 01:29:57PM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 03, 2024 at 04:06:15PM -0700, Nicolin Chen wrote:
> 
> > I learned that this hwpt->fault is exclusively for IOPF/PRI. And
> > Jason suggested me to add a different one for VIOMMU. Yet, after
> > taking a closer look, I found the fault object in this series is
> > seemingly quite generic at the uAPI level: its naming/structure,
> > and the way how it's allocated and passed to hwpt, despite being
> > highly correlated with IOPF in its fops code. So, I feel that we
> > might have a chance of reusing it for different fault types:
> >
> > +enum iommu_fault_type {
> > +     IOMMU_FAULT_TYPE_HWPT_IOPF,
> > +     IOMMU_FAULT_TYPE_VIOMMU_IRQ,
> > +};
> >
> >  struct iommu_fault_alloc {
> >       __u32 size;
> >       __u32 flags;
> > +     __u32 type;  /* enum iommu_fault_type */
> >       __u32 out_fault_id;
> >       __u32 out_fault_fd;
> >  };
> 
> I think I would just add the type at the end of the struct and rely on
> our existing 0 is backwards compat mechanism. 0 means HWPT_IOPF. ie no
> need to do anything now.

Yea, I figured that it would work too, so let's add one in the
VIOMMU series (if we eventually decide to reuse the same ioctl).

> It would make some sense to call this a "report" object than "fault"
> if we are going to use it for different things. We could probably
> rename it without much trouble. There is also not a significant issue
> with having two alloc commands for FDs.

Ack.

> I'd also think VIOMMU_IRQ is probably not that right abstraction,
> likely it makes more sense to push driver-specific event messages sort
> of like IOPF and one of the messages can indicate a arm-smmu-v3 VCDMQ
> interrupt, other messages could indicate BAD_CD and similar sorts of
> events we might want to capture and forward.

Maybe something like this?

struct iommu_viommu_event_arm_smmuv3 {
	u64 evt[4];
};

struct iommu_viommu_event_tegra241_cmdqv {
	u64 vcmdq_err_map[2];
};

enum iommu_event_type {
	IOMMM_HWPT_EVENT_TYPE_IOPF,
	IOMMU_VIOMMU_EVENT_TYPE_SMMUv3,
	IOMMU_VIOMMU_EVENT_TYPE_TEGRA241_CMDQV,
};

struct iommu_event_alloc {
	__u32 size;
	__u32 flags;
	__u32 out_event_id;
	__u32 out_event_fd;
	__u32 type;
	__u32 _reserved;
};

It can be "report" if you prefer.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-08 18:36       ` Nicolin Chen
@ 2024-07-09 17:00         ` Jason Gunthorpe
  2024-07-09 17:33           ` Nicolin Chen
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-07-09 17:00 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Yi Liu, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

On Mon, Jul 08, 2024 at 11:36:57AM -0700, Nicolin Chen wrote:
> Maybe something like this?
> 
> struct iommu_viommu_event_arm_smmuv3 {
> 	u64 evt[4];
> };
> 
> struct iommu_viommu_event_tegra241_cmdqv {
> 	u64 vcmdq_err_map[2];
> };
> 
> enum iommu_event_type {
> 	IOMMM_HWPT_EVENT_TYPE_IOPF,
> 	IOMMU_VIOMMU_EVENT_TYPE_SMMUv3,
> 	IOMMU_VIOMMU_EVENT_TYPE_TEGRA241_CMDQV,
> };
> 
> struct iommu_event_alloc {
> 	__u32 size;
> 	__u32 flags;
> 	__u32 out_event_id;
> 	__u32 out_event_fd;
> 	__u32 type;
> 	__u32 _reserved;
> };
> 
> It can be "report" if you prefer.

Yeah, something like that makes sense to me. The other question is if
you want to multiplex the SMMUv3 and CMDQV on the same FD?

Or multiplex physical smmus on the same FD.

We are potentially talking about 5-10 physical smmus and 2-3 FDs per
physical? Does that scare anyone?

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space
  2024-07-04 14:18 ` [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Will Deacon
@ 2024-07-09 17:23   ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2024-07-09 17:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kevin Tian, Joerg Roedel, Robin Murphy, Jean-Philippe Brucker,
	Nicolin Chen, Yi Liu, Jacob Pan, Joel Granados, Lu Baolu,
	catalin.marinas, kernel-team, iommu, virtualization, linux-kernel

On Thu, Jul 04, 2024 at 03:18:57PM +0100, Will Deacon wrote:
> On Tue, 02 Jul 2024 14:34:34 +0800, Lu Baolu wrote:
> > This series implements the functionality of delivering IO page faults to
> > user space through the IOMMUFD framework. One feasible use case is the
> > nested translation. Nested translation is a hardware feature that
> > supports two-stage translation tables for IOMMU. The second-stage
> > translation table is managed by the host VMM, while the first-stage
> > translation table is owned by user space. This allows user space to
> > control the IOMMU mappings for its devices.
> 
> Applied first four to iommu (iommufd/attach-handles), thanks!
> 
> [01/10] iommu: Introduce domain attachment handle
>         https://git.kernel.org/iommu/c/14678219cf40
> [02/10] iommu: Remove sva handle list
>         https://git.kernel.org/iommu/c/3e7f57d1ef3f
> [03/10] iommu: Add attach handle to struct iopf_group
>         https://git.kernel.org/iommu/c/06cdcc32d657
> [04/10] iommu: Extend domain attach group with handle support
>         https://git.kernel.org/iommu/c/8519e689834a
> 
> Jason -- feel free to use this branch as a base on which you can take
> the iommufd parts.

Done, the whole series is in iommufd for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-09 17:00         ` Jason Gunthorpe
@ 2024-07-09 17:33           ` Nicolin Chen
  2024-07-12 13:00             ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Nicolin Chen @ 2024-07-09 17:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Yi Liu, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

On Tue, Jul 09, 2024 at 02:00:38PM -0300, Jason Gunthorpe wrote:
> On Mon, Jul 08, 2024 at 11:36:57AM -0700, Nicolin Chen wrote:
> > Maybe something like this?
> >
> > struct iommu_viommu_event_arm_smmuv3 {
> >       u64 evt[4];
> > };
> >
> > struct iommu_viommu_event_tegra241_cmdqv {
> >       u64 vcmdq_err_map[2];
> > };
> >
> > enum iommu_event_type {
> >       IOMMM_HWPT_EVENT_TYPE_IOPF,
> >       IOMMU_VIOMMU_EVENT_TYPE_SMMUv3,
> >       IOMMU_VIOMMU_EVENT_TYPE_TEGRA241_CMDQV,
> > };
> >
> > struct iommu_event_alloc {
> >       __u32 size;
> >       __u32 flags;
> >       __u32 out_event_id;
> >       __u32 out_event_fd;
> >       __u32 type;
> >       __u32 _reserved;
> > };
> >
> > It can be "report" if you prefer.
> 
> Yeah, something like that makes sense to me. The other question is if
> you want to multiplex the SMMUv3 and CMDQV on the same FD?

I think at least SMMUv3 and CMDQV could be the same FD. IMHO,
a TEGRA241_CMDQV type VIOMMU should include all the features
of SMMUv3 type... otherwise, we would have two VIOMMU objects
per pSMMU on Grace, which doesn't seem to make sense either.

> Or multiplex physical smmus on the same FD.
> 
> We are potentially talking about 5-10 physical smmus and 2-3 FDs per
> physical? Does that scare anyone?

I think we can share the same FD by adding a viommu_id somewhere
to indicate what the data/event belongs to. Yet, it seemed that
you had a counter-argument that a shared FD (queue) might have a
security concern as well?
https://lore.kernel.org/linux-iommu/20240522232833.GH20229@nvidia.com/

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 06/10] iommufd: Add iommufd fault object
  2024-07-09 17:33           ` Nicolin Chen
@ 2024-07-12 13:00             ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2024-07-12 13:00 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Yi Liu, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

On Tue, Jul 09, 2024 at 10:33:42AM -0700, Nicolin Chen wrote:

> > We are potentially talking about 5-10 physical smmus and 2-3 FDs per
> > physical? Does that scare anyone?
> 
> I think we can share the same FD by adding a viommu_id somewhere
> to indicate what the data/event belongs to. Yet, it seemed that
> you had a counter-argument that a shared FD (queue) might have a
> security concern as well?
> https://lore.kernel.org/linux-iommu/20240522232833.GH20229@nvidia.com/

That was for the physical HW queue not so much the FD.

We need to be mindful that these events can't DOS the hypervisor, that
constrains how we track pending events in the kernel, not how they get
marshaled to FDs to deliver to user space.

Thinking more, it makes sense that a FD would tie 1:1 with a queue in
the VM.

That way backpressure on a queue will not cause head of line blocking
to other queues because they multiplex onto a single FD.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-07-02  6:34 ` [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
@ 2024-10-15  3:19   ` Zhangfei Gao
  2024-10-15 12:54     ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-15  3:19 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

Hi, Baolu

On Tue, 2 Jul 2024 at 14:39, Lu Baolu <baolu.lu@linux.intel.com> wrote:
>
> Add iopf-capable hw page table attach/detach/replace helpers. The pointer
> to iommufd_device is stored in the domain attachment handle, so that it
> can be echo'ed back in the iopf_group.
>
> The iopf-capable hw page tables can only be attached to devices that
> support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
> iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
> the device. Similarly, after the last iopf-capable hwpt is detached from
> the device, the IOPF feature is disabled on the device.
>
> The current implementation allows a replacement between iopf-capable and
> non-iopf-capable hw page tables. This matches the nested translation use
> case, where a parent domain is attached by default and can then be
> replaced with a nested user domain with iopf support.
>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h |  41 +++++
>  drivers/iommu/iommufd/device.c          |   7 +-
>  drivers/iommu/iommufd/fault.c           | 190 ++++++++++++++++++++++++
>  3 files changed, 235 insertions(+), 3 deletions(-)
>

> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> index 68ff94671d48..4934ae572638 100644
> --- a/drivers/iommu/iommufd/fault.c
> +++ b/drivers/iommu/iommufd/fault.c
> @@ -8,6 +8,7 @@
>  #include <linux/module.h>
>  #include <linux/mutex.h>
>  #include <linux/iommufd.h>
> +#include <linux/pci.h>
>  #include <linux/poll.h>
>  #include <linux/anon_inodes.h>
>  #include <uapi/linux/iommufd.h>
> @@ -15,6 +16,195 @@
>  #include "../iommu-priv.h"
>  #include "iommufd_private.h"
>
> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> +{
> +       struct device *dev = idev->dev;
> +       int ret;
> +
> +       /*
> +        * Once we turn on PCI/PRI support for VF, the response failure code
> +        * should not be forwarded to the hardware due to PRI being a shared
> +        * resource between PF and VFs. There is no coordination for this
> +        * shared capability. This waits for a vPRI reset to recover.
> +        */
> +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> +               return -EINVAL;

I am using the SMMUv3 stall feature, and need to forward this to hardware,
And now I am hacking to comment this check.
Any suggestions?

> +
> +       mutex_lock(&idev->iopf_lock);
> +       /* Device iopf has already been on. */
> +       if (++idev->iopf_enabled > 1) {
> +               mutex_unlock(&idev->iopf_lock);
> +               return 0;
> +       }
> +
> +       ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
> +       if (ret)
> +               --idev->iopf_enabled;
> +       mutex_unlock(&idev->iopf_lock);

Also iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); is required
In thinking how to add it properly.

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-15  3:19   ` Zhangfei Gao
@ 2024-10-15 12:54     ` Jason Gunthorpe
  2024-10-16  1:58       ` Zhangfei Gao
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-15 12:54 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > +{
> > +       struct device *dev = idev->dev;
> > +       int ret;
> > +
> > +       /*
> > +        * Once we turn on PCI/PRI support for VF, the response failure code
> > +        * should not be forwarded to the hardware due to PRI being a shared
> > +        * resource between PF and VFs. There is no coordination for this
> > +        * shared capability. This waits for a vPRI reset to recover.
> > +        */
> > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > +               return -EINVAL;
> 
> I am using the SMMUv3 stall feature, and need to forward this to hardware,
> And now I am hacking to comment this check.
> Any suggestions?

Are you using PCI SRIOV and stall together?

> > +       mutex_lock(&idev->iopf_lock);
> > +       /* Device iopf has already been on. */
> > +       if (++idev->iopf_enabled > 1) {
> > +               mutex_unlock(&idev->iopf_lock);
> > +               return 0;
> > +       }
> > +
> > +       ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
> > +       if (ret)
> > +               --idev->iopf_enabled;
> > +       mutex_unlock(&idev->iopf_lock);
> 
> Also iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); is required
> In thinking how to add it properly.

FEAT_SVA needs to be deleted, not added too.

smmu-v3 needs some more fixing to move that
arm_smmu_master_enable_sva() logic into domain attachment.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-15 12:54     ` Jason Gunthorpe
@ 2024-10-16  1:58       ` Zhangfei Gao
  2024-10-16 15:25         ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-16  1:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Tue, 15 Oct 2024 at 20:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > > +{
> > > +       struct device *dev = idev->dev;
> > > +       int ret;
> > > +
> > > +       /*
> > > +        * Once we turn on PCI/PRI support for VF, the response failure code
> > > +        * should not be forwarded to the hardware due to PRI being a shared
> > > +        * resource between PF and VFs. There is no coordination for this
> > > +        * shared capability. This waits for a vPRI reset to recover.
> > > +        */
> > > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > > +               return -EINVAL;
> >
> > I am using the SMMUv3 stall feature, and need to forward this to hardware,
> > And now I am hacking to comment this check.
> > Any suggestions?
>
> Are you using PCI SRIOV and stall together?

Only use smmuv3 stall feature.

>
> > > +       mutex_lock(&idev->iopf_lock);
> > > +       /* Device iopf has already been on. */
> > > +       if (++idev->iopf_enabled > 1) {
> > > +               mutex_unlock(&idev->iopf_lock);
> > > +               return 0;
> > > +       }
> > > +
> > > +       ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
> > > +       if (ret)
> > > +               --idev->iopf_enabled;
> > > +       mutex_unlock(&idev->iopf_lock);
> >
> > Also iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA); is required
> > In thinking how to add it properly.
>
> FEAT_SVA needs to be deleted, not added too.
>
> smmu-v3 needs some more fixing to move that
> arm_smmu_master_enable_sva() logic into domain attachment.

Will think about this, Thanks Jason

>
> Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-16  1:58       ` Zhangfei Gao
@ 2024-10-16 15:25         ` Jason Gunthorpe
  2024-10-17  1:44           ` Zhangfei Gao
  2024-10-18 13:53           ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-16 15:25 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Wed, Oct 16, 2024 at 09:58:36AM +0800, Zhangfei Gao wrote:
> On Tue, 15 Oct 2024 at 20:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > > > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > > > +{
> > > > +       struct device *dev = idev->dev;
> > > > +       int ret;
> > > > +
> > > > +       /*
> > > > +        * Once we turn on PCI/PRI support for VF, the response failure code
> > > > +        * should not be forwarded to the hardware due to PRI being a shared
> > > > +        * resource between PF and VFs. There is no coordination for this
> > > > +        * shared capability. This waits for a vPRI reset to recover.
> > > > +        */
> > > > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > > > +               return -EINVAL;
> > >
> > > I am using the SMMUv3 stall feature, and need to forward this to hardware,
> > > And now I am hacking to comment this check.
> > > Any suggestions?
> >
> > Are you using PCI SRIOV and stall together?
> 
> Only use smmuv3 stall feature.

Then isn't to_pci_dev(dev)->is_virtfn == false?

That should only be true with SRIOV

> > FEAT_SVA needs to be deleted, not added too.
> >
> > smmu-v3 needs some more fixing to move that
> > arm_smmu_master_enable_sva() logic into domain attachment.
> 
> Will think about this, Thanks Jason

Can you test it if a patch is made?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-16 15:25         ` Jason Gunthorpe
@ 2024-10-17  1:44           ` Zhangfei Gao
  2024-10-17 12:05             ` Jason Gunthorpe
  2024-10-18 13:53           ` Jason Gunthorpe
  1 sibling, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-17  1:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Wed, 16 Oct 2024 at 23:25, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 16, 2024 at 09:58:36AM +0800, Zhangfei Gao wrote:
> > On Tue, 15 Oct 2024 at 20:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > > > > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > > > > +{
> > > > > +       struct device *dev = idev->dev;
> > > > > +       int ret;
> > > > > +
> > > > > +       /*
> > > > > +        * Once we turn on PCI/PRI support for VF, the response failure code
> > > > > +        * should not be forwarded to the hardware due to PRI being a shared
> > > > > +        * resource between PF and VFs. There is no coordination for this
> > > > > +        * shared capability. This waits for a vPRI reset to recover.
> > > > > +        */
> > > > > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > > > > +               return -EINVAL;
> > > >
> > > > I am using the SMMUv3 stall feature, and need to forward this to hardware,
> > > > And now I am hacking to comment this check.
> > > > Any suggestions?
> > >
> > > Are you using PCI SRIOV and stall together?
> >
> > Only use smmuv3 stall feature.
>
> Then isn't to_pci_dev(dev)->is_virtfn == false?
>
> That should only be true with SRIOV

Do you mean
if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn == false)
    return -EINVAL;

This is fine

>
> > > FEAT_SVA needs to be deleted, not added too.
> > >
> > > smmu-v3 needs some more fixing to move that
> > > arm_smmu_master_enable_sva() logic into domain attachment.
> >
> > Will think about this, Thanks Jason
>
> Can you test it if a patch is made?

Yes, sure.

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-17  1:44           ` Zhangfei Gao
@ 2024-10-17 12:05             ` Jason Gunthorpe
  2024-10-17 12:35               ` Zhangfei Gao
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-17 12:05 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Thu, Oct 17, 2024 at 09:44:18AM +0800, Zhangfei Gao wrote:
> On Wed, 16 Oct 2024 at 23:25, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 16, 2024 at 09:58:36AM +0800, Zhangfei Gao wrote:
> > > On Tue, 15 Oct 2024 at 20:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > > > > > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > > > > > +{
> > > > > > +       struct device *dev = idev->dev;
> > > > > > +       int ret;
> > > > > > +
> > > > > > +       /*
> > > > > > +        * Once we turn on PCI/PRI support for VF, the response failure code
> > > > > > +        * should not be forwarded to the hardware due to PRI being a shared
> > > > > > +        * resource between PF and VFs. There is no coordination for this
> > > > > > +        * shared capability. This waits for a vPRI reset to recover.
> > > > > > +        */
> > > > > > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > > > > > +               return -EINVAL;
> > > > >
> > > > > I am using the SMMUv3 stall feature, and need to forward this to hardware,
> > > > > And now I am hacking to comment this check.
> > > > > Any suggestions?
> > > >
> > > > Are you using PCI SRIOV and stall together?
> > >
> > > Only use smmuv3 stall feature.
> >
> > Then isn't to_pci_dev(dev)->is_virtfn == false?
> >
> > That should only be true with SRIOV
> 
> Do you mean
> if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn == false)
>     return -EINVAL;
>
> This is fine

No, I mean on your test system you are not using SRIOV so all your PCI
devices will have is_virtfn == false and the above if shouldn't be a
problem. is_virtfn indicates the PCI device is a SRIOV VF.

Your explanation for your problem doesn't really make sense, or there
is something wrong someplace else to get a bogus is_virtfn..

If you are doing SRIOV with stall, then that is understandable.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-17 12:05             ` Jason Gunthorpe
@ 2024-10-17 12:35               ` Zhangfei Gao
  2024-10-17 12:58                 ` Shameerali Kolothum Thodi
  2024-10-17 13:08                 ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-17 12:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Thu, 17 Oct 2024 at 20:05, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Oct 17, 2024 at 09:44:18AM +0800, Zhangfei Gao wrote:
> > On Wed, 16 Oct 2024 at 23:25, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 16, 2024 at 09:58:36AM +0800, Zhangfei Gao wrote:
> > > > On Tue, 15 Oct 2024 at 20:54, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > > > > > > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > > > > > > +{
> > > > > > > +       struct device *dev = idev->dev;
> > > > > > > +       int ret;
> > > > > > > +
> > > > > > > +       /*
> > > > > > > +        * Once we turn on PCI/PRI support for VF, the response failure code
> > > > > > > +        * should not be forwarded to the hardware due to PRI being a shared
> > > > > > > +        * resource between PF and VFs. There is no coordination for this
> > > > > > > +        * shared capability. This waits for a vPRI reset to recover.
> > > > > > > +        */
> > > > > > > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > > > > > > +               return -EINVAL;
> > > > > >
> > > > > > I am using the SMMUv3 stall feature, and need to forward this to hardware,
> > > > > > And now I am hacking to comment this check.
> > > > > > Any suggestions?
> > > > >
> > > > > Are you using PCI SRIOV and stall together?
> > > >
> > > > Only use smmuv3 stall feature.
Sorry, this is not correct

> > >
> > > Then isn't to_pci_dev(dev)->is_virtfn == false?
> > >
> > > That should only be true with SRIOV
> >
> > Do you mean
> > if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn == false)
> >     return -EINVAL;
> >
> > This is fine
>
> No, I mean on your test system you are not using SRIOV so all your PCI
> devices will have is_virtfn == false and the above if shouldn't be a
> problem. is_virtfn indicates the PCI device is a SRIOV VF.
>
> Your explanation for your problem doesn't really make sense, or there
> is something wrong someplace else to get a bogus is_virtfn..
>
> If you are doing SRIOV with stall, then that is understandable.

Yes, you are right
 I am using SRIOV vf and stall feature, so is_virtfn == true

Our ACC devices are fake pci endpoint devices which supports stall,
And they also supports sriov

So I have to ignore the limitation.

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-17 12:35               ` Zhangfei Gao
@ 2024-10-17 12:58                 ` Shameerali Kolothum Thodi
  2024-10-17 13:08                 ` Jason Gunthorpe
  1 sibling, 0 replies; 46+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-10-17 12:58 UTC (permalink / raw)
  To: Zhangfei Gao, Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu@lists.linux.dev,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org



> -----Original Message-----
> From: Zhangfei Gao <zhangfei.gao@linaro.org>
> Sent: Thursday, October 17, 2024 1:35 PM
> To: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>; Kevin Tian
> <kevin.tian@intel.com>; Joerg Roedel <joro@8bytes.org>; Will Deacon
> <will@kernel.org>; Robin Murphy <robin.murphy@arm.com>; Jean-
> Philippe Brucker <jean-philippe@linaro.org>; Nicolin Chen
> <nicolinc@nvidia.com>; Yi Liu <yi.l.liu@intel.com>; Jacob Pan
> <jacob.jun.pan@linux.intel.com>; Joel Granados
> <j.granados@samsung.com>; iommu@lists.linux.dev;
> virtualization@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Subject: Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt
> attach/detach/replace
> 
> On Thu, 17 Oct 2024 at 20:05, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Oct 17, 2024 at 09:44:18AM +0800, Zhangfei Gao wrote:
> > > On Wed, 16 Oct 2024 at 23:25, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Oct 16, 2024 at 09:58:36AM +0800, Zhangfei Gao wrote:
> > > > > On Tue, 15 Oct 2024 at 20:54, Jason Gunthorpe <jgg@ziepe.ca>
> wrote:
> > > > > >
> > > > > > On Tue, Oct 15, 2024 at 11:19:33AM +0800, Zhangfei Gao wrote:
> > > > > > > > +static int iommufd_fault_iopf_enable(struct
> > > > > > > > +iommufd_device *idev) {
> > > > > > > > +       struct device *dev = idev->dev;
> > > > > > > > +       int ret;
> > > > > > > > +
> > > > > > > > +       /*
> > > > > > > > +        * Once we turn on PCI/PRI support for VF, the response
> failure code
> > > > > > > > +        * should not be forwarded to the hardware due to PRI
> being a shared
> > > > > > > > +        * resource between PF and VFs. There is no coordination
> for this
> > > > > > > > +        * shared capability. This waits for a vPRI reset to recover.
> > > > > > > > +        */
> > > > > > > > +       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> > > > > > > > +               return -EINVAL;
> > > > > > >
> > > > > > > I am using the SMMUv3 stall feature, and need to forward
> > > > > > > this to hardware, And now I am hacking to comment this check.
> > > > > > > Any suggestions?
> > > > > >
> > > > > > Are you using PCI SRIOV and stall together?
> > > > >
> > > > > Only use smmuv3 stall feature.
> Sorry, this is not correct
> 
> > > >
> > > > Then isn't to_pci_dev(dev)->is_virtfn == false?
> > > >
> > > > That should only be true with SRIOV
> > >
> > > Do you mean
> > > if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn == false)
> > >     return -EINVAL;
> > >
> > > This is fine
> >
> > No, I mean on your test system you are not using SRIOV so all your PCI
> > devices will have is_virtfn == false and the above if shouldn't be a
> > problem. is_virtfn indicates the PCI device is a SRIOV VF.
> >
> > Your explanation for your problem doesn't really make sense, or there
> > is something wrong someplace else to get a bogus is_virtfn..
> >
> > If you are doing SRIOV with stall, then that is understandable.
> 
> Yes, you are right
>  I am using SRIOV vf and stall feature, so is_virtfn == true
> 
> Our ACC devices are fake pci endpoint devices which supports stall, And
> they also supports sriov

May be this will help to get the background:
https://lore.kernel.org/all/1626144876-11352-1-git-send-email-zhangfei.gao@linaro.org/

Shameer


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-17 12:35               ` Zhangfei Gao
  2024-10-17 12:58                 ` Shameerali Kolothum Thodi
@ 2024-10-17 13:08                 ` Jason Gunthorpe
  2024-10-18  1:58                   ` Baolu Lu
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-17 13:08 UTC (permalink / raw)
  To: Zhangfei Gao, Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Thu, Oct 17, 2024 at 08:35:24PM +0800, Zhangfei Gao wrote:

> Yes, you are right
>  I am using SRIOV vf and stall feature, so is_virtfn == true
> 
> Our ACC devices are fake pci endpoint devices which supports stall,
> And they also supports sriov
> 
> So I have to ignore the limitation.

I see, so that is more complicated. 

Lu, what do you think about also checking if the PCI function has PRI
? If not PRI assume the fault is special and doesn't follow PRI rules?

Or maybe we can have the iommu driver tag the event as a PRI/not-PRI
fault?

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-17 13:08                 ` Jason Gunthorpe
@ 2024-10-18  1:58                   ` Baolu Lu
  2024-10-18  2:45                     ` Zhangfei Gao
  2024-10-18 14:33                     ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Baolu Lu @ 2024-10-18  1:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Zhangfei Gao
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On 2024/10/17 21:08, Jason Gunthorpe wrote:
> On Thu, Oct 17, 2024 at 08:35:24PM +0800, Zhangfei Gao wrote:
> 
>> Yes, you are right
>>   I am using SRIOV vf and stall feature, so is_virtfn == true
>>
>> Our ACC devices are fake pci endpoint devices which supports stall,
>> And they also supports sriov
>>
>> So I have to ignore the limitation.
> I see, so that is more complicated.
> 
> Lu, what do you think about also checking if the PCI function has PRI
> ? If not PRI assume the fault is special and doesn't follow PRI rules?
> 
> Or maybe we can have the iommu driver tag the event as a PRI/not-PRI
> fault?

This limitation applies to PRI on PCI/SRIOV VFs because the PRI might be
a shared resource and current iommu subsystem is not ready to support
enabling/disabling PRI on a VF without any impact on others.

In my understanding, it's fine to remove this limitation from the use
case of non-PRI on SRIOV VFs. Perhaps something like below?

	if (dev_is_pci(dev)) {
		struct pci_dev *pdev = to_pci_dev(dev);
		if (pdev->is_virtfn && pci_pri_supported(pdev))
			return -EINVAL;
	}

Thanks,
baolu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-18  1:58                   ` Baolu Lu
@ 2024-10-18  2:45                     ` Zhangfei Gao
  2024-10-27 14:12                       ` Zhangfei Gao
  2024-10-18 14:33                     ` Jason Gunthorpe
  1 sibling, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-18  2:45 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

Hi, Baolu

On Fri, 18 Oct 2024 at 09:58, Baolu Lu <baolu.lu@linux.intel.com> wrote:
>
> On 2024/10/17 21:08, Jason Gunthorpe wrote:
> > On Thu, Oct 17, 2024 at 08:35:24PM +0800, Zhangfei Gao wrote:
> >
> >> Yes, you are right
> >>   I am using SRIOV vf and stall feature, so is_virtfn == true
> >>
> >> Our ACC devices are fake pci endpoint devices which supports stall,
> >> And they also supports sriov
> >>
> >> So I have to ignore the limitation.
> > I see, so that is more complicated.
> >
> > Lu, what do you think about also checking if the PCI function has PRI
> > ? If not PRI assume the fault is special and doesn't follow PRI rules?
> >
> > Or maybe we can have the iommu driver tag the event as a PRI/not-PRI
> > fault?
>
> This limitation applies to PRI on PCI/SRIOV VFs because the PRI might be
> a shared resource and current iommu subsystem is not ready to support
> enabling/disabling PRI on a VF without any impact on others.
>
> In my understanding, it's fine to remove this limitation from the use
> case of non-PRI on SRIOV VFs. Perhaps something like below?
>
#include <linux/pci-ats.h>
>         if (dev_is_pci(dev)) {
>                 struct pci_dev *pdev = to_pci_dev(dev);
>                 if (pdev->is_virtfn && pci_pri_supported(pdev))
>                         return -EINVAL;
>         }

Yes, this works on our platform.

Thanks Baolu.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-16 15:25         ` Jason Gunthorpe
  2024-10-17  1:44           ` Zhangfei Gao
@ 2024-10-18 13:53           ` Jason Gunthorpe
  2024-10-22 14:30             ` Zhangfei Gao
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-18 13:53 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Wed, Oct 16, 2024 at 12:25:03PM -0300, Jason Gunthorpe wrote:
> > > smmu-v3 needs some more fixing to move that
> > > arm_smmu_master_enable_sva() logic into domain attachment.
> > 
> > Will think about this, Thanks Jason
> 
> Can you test it if a patch is made?

Here it is:

https://github.com/jgunthorpe/linux/commits/smmuv3_nesting/

fa1528253d2210 iommu: Remove IOMMU_DEV_FEAT_SVA
5675560a272cf5 iommu/vt-d: Check if SVA is supported when attaching the SVA domain
94bc2b9525b508 iommu/arm-smmu-v3: Put iopf enablement in the domain attach path

Let me know..

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-18  1:58                   ` Baolu Lu
  2024-10-18  2:45                     ` Zhangfei Gao
@ 2024-10-18 14:33                     ` Jason Gunthorpe
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-18 14:33 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Zhangfei Gao, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Fri, Oct 18, 2024 at 09:58:37AM +0800, Baolu Lu wrote:
> On 2024/10/17 21:08, Jason Gunthorpe wrote:
> > On Thu, Oct 17, 2024 at 08:35:24PM +0800, Zhangfei Gao wrote:
> > 
> > > Yes, you are right
> > >   I am using SRIOV vf and stall feature, so is_virtfn == true
> > > 
> > > Our ACC devices are fake pci endpoint devices which supports stall,
> > > And they also supports sriov
> > > 
> > > So I have to ignore the limitation.
> > I see, so that is more complicated.
> > 
> > Lu, what do you think about also checking if the PCI function has PRI
> > ? If not PRI assume the fault is special and doesn't follow PRI rules?
> > 
> > Or maybe we can have the iommu driver tag the event as a PRI/not-PRI
> > fault?
> 
> This limitation applies to PRI on PCI/SRIOV VFs because the PRI might be
> a shared resource and current iommu subsystem is not ready to support
> enabling/disabling PRI on a VF without any impact on others.
> 
> In my understanding, it's fine to remove this limitation from the use
> case of non-PRI on SRIOV VFs. Perhaps something like below?
> 
> 	if (dev_is_pci(dev)) {
> 		struct pci_dev *pdev = to_pci_dev(dev);
> 		if (pdev->is_virtfn && pci_pri_supported(pdev))
> 			return -EINVAL;
> 	}

Makes sense to me

Thanks,
Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-18 13:53           ` Jason Gunthorpe
@ 2024-10-22 14:30             ` Zhangfei Gao
  2024-10-22 14:52               ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-22 14:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Fri, 18 Oct 2024 at 21:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Oct 16, 2024 at 12:25:03PM -0300, Jason Gunthorpe wrote:
> > > > smmu-v3 needs some more fixing to move that
> > > > arm_smmu_master_enable_sva() logic into domain attachment.
> > >
> > > Will think about this, Thanks Jason
> >
> > Can you test it if a patch is made?
>
> Here it is:
>
> https://github.com/jgunthorpe/linux/commits/smmuv3_nesting/
>
> fa1528253d2210 iommu: Remove IOMMU_DEV_FEAT_SVA
> 5675560a272cf5 iommu/vt-d: Check if SVA is supported when attaching the SVA domain
> 94bc2b9525b508 iommu/arm-smmu-v3: Put iopf enablement in the domain attach path
>
> Let me know..

With these patches, do we still need ops->user_pasid_table?

if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
                attach_handle = iommu_attach_handle_get(dev->iommu_group,
                                fault->prm.pasid, 0);

// is attach_handle expected effect value here?
                if (IS_ERR(attach_handle)) {
                        const struct iommu_ops *ops = dev_iommu_ops(dev);

                        if (!ops->user_pasid_table)
                                return NULL;
                        /*
                         * The iommu driver for this device supports user-
                         * managed PASID table. Therefore page faults for
                         * any PASID should go through the NESTING domain
                         * attached to the device RID.
                         */
                        attach_handle = iommu_attach_handle_get(
                                        dev->iommu_group, IOMMU_NO_PASID,
                                        IOMMU_DOMAIN_NESTED);

Now I still need set ops->user_pasid_table, since attach_handle can not
return from the first iommu_attach_handle_get with fault->prm.pasid = 1,
but the second iommu_attach_handle_get with  IOMMU_NO_PASID,
suppose it is not expected?

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-22 14:30             ` Zhangfei Gao
@ 2024-10-22 14:52               ` Jason Gunthorpe
  2024-10-23 10:22                 ` Zhangfei Gao
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2024-10-22 14:52 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Tue, Oct 22, 2024 at 10:30:10PM +0800, Zhangfei Gao wrote:
> On Fri, 18 Oct 2024 at 21:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Oct 16, 2024 at 12:25:03PM -0300, Jason Gunthorpe wrote:
> > > > > smmu-v3 needs some more fixing to move that
> > > > > arm_smmu_master_enable_sva() logic into domain attachment.
> > > >
> > > > Will think about this, Thanks Jason
> > >
> > > Can you test it if a patch is made?
> >
> > Here it is:
> >
> > https://github.com/jgunthorpe/linux/commits/smmuv3_nesting/
> >
> > fa1528253d2210 iommu: Remove IOMMU_DEV_FEAT_SVA
> > 5675560a272cf5 iommu/vt-d: Check if SVA is supported when attaching the SVA domain
> > 94bc2b9525b508 iommu/arm-smmu-v3: Put iopf enablement in the domain attach path
> >
> > Let me know..
> 
> With these patches, do we still need ops->user_pasid_table?

It makes no change - you need user_pasid_table to make
IOMMU_DOMAIN_NESTED work.

If you aren't using IOMMU_DOMAIN_NESTED you shouldn't need it.

> if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
>                 attach_handle = iommu_attach_handle_get(dev->iommu_group,
>                                 fault->prm.pasid, 0);
> 
> // is attach_handle expected effect value here?
>                 if (IS_ERR(attach_handle)) {
>                         const struct iommu_ops *ops = dev_iommu_ops(dev);
>
>                         if (!ops->user_pasid_table)
>                                 return NULL;
>                         /*
>                          * The iommu driver for this device supports user-
>                          * managed PASID table. Therefore page faults for
>                          * any PASID should go through the NESTING domain
>                          * attached to the device RID.
>                          */
>                         attach_handle = iommu_attach_handle_get(
>                                         dev->iommu_group, IOMMU_NO_PASID,
>                                         IOMMU_DOMAIN_NESTED);
> 
> Now I still need set ops->user_pasid_table, since attach_handle can not
> return from the first iommu_attach_handle_get with fault->prm.pasid = 1,
> but the second iommu_attach_handle_get with  IOMMU_NO_PASID,
> suppose it is not expected?

The second handle_get will always fail unless you are using
IOMMU_DOMAIN_NESTED in userspace with iommufd.

What testing are you doing exactly?

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-22 14:52               ` Jason Gunthorpe
@ 2024-10-23 10:22                 ` Zhangfei Gao
  0 siblings, 0 replies; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-23 10:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Tue, 22 Oct 2024 at 22:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Oct 22, 2024 at 10:30:10PM +0800, Zhangfei Gao wrote:
> > On Fri, 18 Oct 2024 at 21:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Wed, Oct 16, 2024 at 12:25:03PM -0300, Jason Gunthorpe wrote:
> > > > > > smmu-v3 needs some more fixing to move that
> > > > > > arm_smmu_master_enable_sva() logic into domain attachment.
> > > > >
> > > > > Will think about this, Thanks Jason
> > > >
> > > > Can you test it if a patch is made?
> > >
> > > Here it is:
> > >
> > > https://github.com/jgunthorpe/linux/commits/smmuv3_nesting/
> > >
> > > fa1528253d2210 iommu: Remove IOMMU_DEV_FEAT_SVA
> > > 5675560a272cf5 iommu/vt-d: Check if SVA is supported when attaching the SVA domain
> > > 94bc2b9525b508 iommu/arm-smmu-v3: Put iopf enablement in the domain attach path
> > >
> > > Let me know..
> >
> > With these patches, do we still need ops->user_pasid_table?
>
> It makes no change - you need user_pasid_table to make
> IOMMU_DOMAIN_NESTED work.
>
> If you aren't using IOMMU_DOMAIN_NESTED you shouldn't need it.

OK, I misunderstood.

Then with user_pasid_table=1
both with these patches and without patches, user page fault is OK.

>
> > if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
> >                 attach_handle = iommu_attach_handle_get(dev->iommu_group,
> >                                 fault->prm.pasid, 0);
> >
> > // is attach_handle expected effect value here?
> >                 if (IS_ERR(attach_handle)) {
> >                         const struct iommu_ops *ops = dev_iommu_ops(dev);
> >
> >                         if (!ops->user_pasid_table)
> >                                 return NULL;
> >                         /*
> >                          * The iommu driver for this device supports user-
> >                          * managed PASID table. Therefore page faults for
> >                          * any PASID should go through the NESTING domain
> >                          * attached to the device RID.
> >                          */
> >                         attach_handle = iommu_attach_handle_get(
> >                                         dev->iommu_group, IOMMU_NO_PASID,
> >                                         IOMMU_DOMAIN_NESTED);
> >
> > Now I still need set ops->user_pasid_table, since attach_handle can not
> > return from the first iommu_attach_handle_get with fault->prm.pasid = 1,
> > but the second iommu_attach_handle_get with  IOMMU_NO_PASID,
> > suppose it is not expected?
>
> The second handle_get will always fail unless you are using
> IOMMU_DOMAIN_NESTED in userspace with iommufd.
>
> What testing are you doing exactly?

I am testing vsva based on Nico's iommufd_viommu_p2-v3 branch,
which requires IOMMU_DOMAIN_NESTED in user space with iommufd.

About the three patches
1. Tested host sva, OK
2. Simply tested vsva on guests, OK, will do more tests.

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-18  2:45                     ` Zhangfei Gao
@ 2024-10-27 14:12                       ` Zhangfei Gao
  2024-10-27 14:26                         ` Baolu Lu
  0 siblings, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-27 14:12 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

Hi, Baolu

On Fri, 18 Oct 2024 at 10:45, Zhangfei Gao <zhangfei.gao@linaro.org> wrote:
>
> Hi, Baolu
>
> On Fri, 18 Oct 2024 at 09:58, Baolu Lu <baolu.lu@linux.intel.com> wrote:
> >
> > On 2024/10/17 21:08, Jason Gunthorpe wrote:
> > > On Thu, Oct 17, 2024 at 08:35:24PM +0800, Zhangfei Gao wrote:
> > >
> > >> Yes, you are right
> > >>   I am using SRIOV vf and stall feature, so is_virtfn == true
> > >>
> > >> Our ACC devices are fake pci endpoint devices which supports stall,
> > >> And they also supports sriov
> > >>
> > >> So I have to ignore the limitation.
> > > I see, so that is more complicated.
> > >
> > > Lu, what do you think about also checking if the PCI function has PRI
> > > ? If not PRI assume the fault is special and doesn't follow PRI rules?
> > >
> > > Or maybe we can have the iommu driver tag the event as a PRI/not-PRI
> > > fault?
> >
> > This limitation applies to PRI on PCI/SRIOV VFs because the PRI might be
> > a shared resource and current iommu subsystem is not ready to support
> > enabling/disabling PRI on a VF without any impact on others.
> >
> > In my understanding, it's fine to remove this limitation from the use
> > case of non-PRI on SRIOV VFs. Perhaps something like below?
> >
> #include <linux/pci-ats.h>
> >         if (dev_is_pci(dev)) {
> >                 struct pci_dev *pdev = to_pci_dev(dev);
> >                 if (pdev->is_virtfn && pci_pri_supported(pdev))
> >                         return -EINVAL;
> >         }
>
> Yes, this works on our platform.

Will you send this patch?

Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>

Thanks

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-27 14:12                       ` Zhangfei Gao
@ 2024-10-27 14:26                         ` Baolu Lu
  2024-10-28  9:56                           ` Zhangfei Gao
  0 siblings, 1 reply; 46+ messages in thread
From: Baolu Lu @ 2024-10-27 14:26 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On 2024/10/27 22:12, Zhangfei Gao wrote:
> On Fri, 18 Oct 2024 at 10:45, Zhangfei Gao<zhangfei.gao@linaro.org> wrote:
>> Hi, Baolu
>>
>> On Fri, 18 Oct 2024 at 09:58, Baolu Lu<baolu.lu@linux.intel.com> wrote:
>>> On 2024/10/17 21:08, Jason Gunthorpe wrote:
>>>> On Thu, Oct 17, 2024 at 08:35:24PM +0800, Zhangfei Gao wrote:
>>>>
>>>>> Yes, you are right
>>>>>    I am using SRIOV vf and stall feature, so is_virtfn == true
>>>>>
>>>>> Our ACC devices are fake pci endpoint devices which supports stall,
>>>>> And they also supports sriov
>>>>>
>>>>> So I have to ignore the limitation.
>>>> I see, so that is more complicated.
>>>>
>>>> Lu, what do you think about also checking if the PCI function has PRI
>>>> ? If not PRI assume the fault is special and doesn't follow PRI rules?
>>>>
>>>> Or maybe we can have the iommu driver tag the event as a PRI/not-PRI
>>>> fault?
>>> This limitation applies to PRI on PCI/SRIOV VFs because the PRI might be
>>> a shared resource and current iommu subsystem is not ready to support
>>> enabling/disabling PRI on a VF without any impact on others.
>>>
>>> In my understanding, it's fine to remove this limitation from the use
>>> case of non-PRI on SRIOV VFs. Perhaps something like below?
>>>
>> #include <linux/pci-ats.h>
>>>          if (dev_is_pci(dev)) {
>>>                  struct pci_dev *pdev = to_pci_dev(dev);
>>>                  if (pdev->is_virtfn && pci_pri_supported(pdev))
>>>                          return -EINVAL;
>>>          }
>> Yes, this works on our platform.
> Will you send this patch?
> 
> Tested-by: Zhangfei Gao<zhangfei.gao@linaro.org>

Can you please make this change a formal patch by yourself? As I don't
have hardware in hand, I'm not confident to accurately describe the
requirement or verify the new version during the upstream process.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-27 14:26                         ` Baolu Lu
@ 2024-10-28  9:56                           ` Zhangfei Gao
  2024-10-28 11:17                             ` Baolu Lu
  0 siblings, 1 reply; 46+ messages in thread
From: Zhangfei Gao @ 2024-10-28  9:56 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On Sun, 27 Oct 2024 at 22:26, Baolu Lu <baolu.lu@linux.intel.com> wrote:

>
> Can you please make this change a formal patch by yourself? As I don't
> have hardware in hand, I'm not confident to accurately describe the
> requirement or verify the new version during the upstream process.
>

OK, how about this one

Subject: [PATCH] iommufd: modify iommufd_fault_iopf_enable limitation

iommufd_fault_iopf_enable has limitation to PRI on PCI/SRIOV VFs
because the PRI might be a shared resource and current iommu
subsystem is not ready to support enabling/disabling PRI on a VF
without any impact on others.

However, we have devices that appear as PCI but are actually on the
AMBA bus. These fake PCI devices have PASID capability, support
stall as well as SRIOV, so remove the limitation for these devices.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/fault.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index bca956d496bd..8b3e34250dae 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/pci-ats.h>
 #include <linux/poll.h>
 #include <uapi/linux/iommufd.h>

@@ -27,8 +28,12 @@ static int iommufd_fault_iopf_enable(struct
iommufd_device *idev)
         * resource between PF and VFs. There is no coordination for this
         * shared capability. This waits for a vPRI reset to recover.
         */
-       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
-               return -EINVAL;
+       if (dev_is_pci(dev)) {
+               struct pci_dev *pdev = to_pci_dev(dev);
+
+               if (pdev->is_virtfn && pci_pri_supported(pdev))
+                       return -EINVAL;
+       }

        mutex_lock(&idev->iopf_lock);
        /* Device iopf has already been on. */
--

2.25.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace
  2024-10-28  9:56                           ` Zhangfei Gao
@ 2024-10-28 11:17                             ` Baolu Lu
  0 siblings, 0 replies; 46+ messages in thread
From: Baolu Lu @ 2024-10-28 11:17 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel,
	Shameerali Kolothum Thodi

On 2024/10/28 17:56, Zhangfei Gao wrote:
> On Sun, 27 Oct 2024 at 22:26, Baolu Lu <baolu.lu@linux.intel.com> wrote:
> 
>>
>> Can you please make this change a formal patch by yourself? As I don't
>> have hardware in hand, I'm not confident to accurately describe the
>> requirement or verify the new version during the upstream process.
>>
> 
> OK, how about this one
> 
> Subject: [PATCH] iommufd: modify iommufd_fault_iopf_enable limitation
> 
> iommufd_fault_iopf_enable has limitation to PRI on PCI/SRIOV VFs
> because the PRI might be a shared resource and current iommu
> subsystem is not ready to support enabling/disabling PRI on a VF
> without any impact on others.
> 
> However, we have devices that appear as PCI but are actually on the
> AMBA bus. These fake PCI devices have PASID capability, support
> stall as well as SRIOV, so remove the limitation for these devices.
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>   drivers/iommu/iommufd/fault.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> index bca956d496bd..8b3e34250dae 100644
> --- a/drivers/iommu/iommufd/fault.c
> +++ b/drivers/iommu/iommufd/fault.c
> @@ -10,6 +10,7 @@
>   #include <linux/module.h>
>   #include <linux/mutex.h>
>   #include <linux/pci.h>
> +#include <linux/pci-ats.h>
>   #include <linux/poll.h>
>   #include <uapi/linux/iommufd.h>
> 
> @@ -27,8 +28,12 @@ static int iommufd_fault_iopf_enable(struct
> iommufd_device *idev)
>           * resource between PF and VFs. There is no coordination for this
>           * shared capability. This waits for a vPRI reset to recover.
>           */
> -       if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn)
> -               return -EINVAL;
> +       if (dev_is_pci(dev)) {
> +               struct pci_dev *pdev = to_pci_dev(dev);
> +
> +               if (pdev->is_virtfn && pci_pri_supported(pdev))
> +                       return -EINVAL;
> +       }
> 
>          mutex_lock(&idev->iopf_lock);
>          /* Device iopf has already been on. */

Looks fine to me, but you need to use the "git sendemail" command to
post the patch. Otherwise, the maintainer has no means to pick your
patch with tools like b4.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2024-10-28 11:18 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02  6:34 [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Lu Baolu
2024-07-02  6:34 ` [PATCH v8 01/10] iommu: Introduce domain attachment handle Lu Baolu
2024-07-02  6:34 ` [PATCH v8 02/10] iommu: Remove sva handle list Lu Baolu
2024-07-02  6:34 ` [PATCH v8 03/10] iommu: Add attach handle to struct iopf_group Lu Baolu
2024-07-02  6:34 ` [PATCH v8 04/10] iommu: Extend domain attach group with handle support Lu Baolu
2024-07-02  6:34 ` [PATCH v8 05/10] iommufd: Add fault and response message definitions Lu Baolu
2024-07-02  6:34 ` [PATCH v8 06/10] iommufd: Add iommufd fault object Lu Baolu
2024-07-03 23:06   ` Nicolin Chen
2024-07-04  2:59     ` Baolu Lu
2024-07-04  5:36       ` Nicolin Chen
2024-07-04  6:37         ` Tian, Kevin
2024-07-04  7:32           ` Baolu Lu
2024-07-04 23:18             ` Nicolin Chen
2024-07-05  0:49               ` Tian, Kevin
2024-07-08 16:22                 ` Jason Gunthorpe
2024-07-08 16:29     ` Jason Gunthorpe
2024-07-08 18:36       ` Nicolin Chen
2024-07-09 17:00         ` Jason Gunthorpe
2024-07-09 17:33           ` Nicolin Chen
2024-07-12 13:00             ` Jason Gunthorpe
2024-07-02  6:34 ` [PATCH v8 07/10] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
2024-10-15  3:19   ` Zhangfei Gao
2024-10-15 12:54     ` Jason Gunthorpe
2024-10-16  1:58       ` Zhangfei Gao
2024-10-16 15:25         ` Jason Gunthorpe
2024-10-17  1:44           ` Zhangfei Gao
2024-10-17 12:05             ` Jason Gunthorpe
2024-10-17 12:35               ` Zhangfei Gao
2024-10-17 12:58                 ` Shameerali Kolothum Thodi
2024-10-17 13:08                 ` Jason Gunthorpe
2024-10-18  1:58                   ` Baolu Lu
2024-10-18  2:45                     ` Zhangfei Gao
2024-10-27 14:12                       ` Zhangfei Gao
2024-10-27 14:26                         ` Baolu Lu
2024-10-28  9:56                           ` Zhangfei Gao
2024-10-28 11:17                             ` Baolu Lu
2024-10-18 14:33                     ` Jason Gunthorpe
2024-10-18 13:53           ` Jason Gunthorpe
2024-10-22 14:30             ` Zhangfei Gao
2024-10-22 14:52               ` Jason Gunthorpe
2024-10-23 10:22                 ` Zhangfei Gao
2024-07-02  6:34 ` [PATCH v8 08/10] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
2024-07-02  6:34 ` [PATCH v8 09/10] iommufd/selftest: Add IOPF support for mock device Lu Baolu
2024-07-02  6:34 ` [PATCH v8 10/10] iommufd/selftest: Add coverage for IOPF test Lu Baolu
2024-07-04 14:18 ` [PATCH v8 00/10] IOMMUFD: Deliver IO page faults to user space Will Deacon
2024-07-09 17:23   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).