All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/14] iommufd support pasid attach/replace
@ 2024-12-19 13:27 Yi Liu
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
                   ` (13 more replies)
  0 siblings, 14 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), and each
attachment is tagged with a pasid.

This series is based on the preparation series [1] [2], it first adds a
missing iommu API to replace the domain for a pasid. Based on the iommu
pasid attach/ replace/detach APIs, this series adds iommufd APIs for device
drivers to attach/replace/detach pasid to/from hwpt per userspace's request,
add PASID compat domain enforcement, add PASID compat hwpt allocation in
iommufd, and adds selftest to validate the iommufd APIs.

The completed code can be found in the below link [3]. Heads up! The existing
iommufd selftest was broken, there was a temp fix patch in the top of the
branch [3]. If want to run the iommufd selftest, please apply that fix. Sorry
for the inconvenience.

[1] https://lore.kernel.org/linux-iommu/20241104131842.13303-1-yi.l.liu@intel.com/ # done
[2] https://lore.kernel.org/linux-iommu/20241204122928.11987-1-yi.l.liu@intel.com/ # done
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid

Change log:

v6:
 - Add kdoc to iommufd_device_get_attach_handle() to note the returned handle
   should be used with care. (Baolu)
 - Reworked the patch 07 and 08 of v5 to avoid domain allocation failure on VT-d
   after applying patch 07 of v5.
     1) Split out the intel iommu driver IOMMU_HWPT_ALLOC_PASID support out of
        patch 08
     2) Rework the PASID-compatible domain enforcement by checking the RID domain
        and idev->pasid_hwpts under the idev->igroup->lock.
 - iommufd_device_pasid_do_attach() returns -EINVAL if there is old hwpt and it's
   not the same with new hwpt. This aligns with how the iommufd_device_do_attach()
   deals it. Otherwise, attaching the same pasid to the same ioas is going to fail
   before the auto_domain loop goes to the correct hwpt. Thsi is not reasonable. So
   make this change.
 - Enhanced the pasid selftest to have non-pasid-capable device and pasid-capable
   device.
 - The order of the series is tweaked to be prepare the iommufd for pasid attach,
   add pasid attach, add PASID-compat domain enforcement and then add the PASID-compat
   hwpt allocation.
 - Rebased on top of 6.13-rc3 and some already applied patches.

v5: https://lore.kernel.org/linux-iommu/20241104132513.15890-1-yi.l.liu@intel.com/
 - Fix a mistake in patch 02 of v4 (Kevin)
 - Move the iommufd_handle helpers to device.c
 - Add IOMMU_HWPT_ALLOC_PASID check to enforce pasid-compatible domain for pasid
   capable device in iommufd
 - Update the iommufd selftest to use IOMMU_HWPT_ALLOC_PASID

v4: https://lore.kernel.org/linux-iommu/20240912131255.13305-1-yi.l.liu@intel.com/
 - Replace remove_dev_pasid() by supporting set_dev_pasid() for blocking domain (Kevin)
	- This is done by the preparation series "Support attaching PASID to the blocked_domain"
 - Misc tweaks to foil the merging of the iommufd iopf series. Three new patches are added:
	- iommufd: Always pass iommu_attach_handle to iommu core
	- iommufd: Move the iommufd_handle helpers to iommufd_private.h
	- iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle()
 - Renmae patch 03 of v3 to be "iommufd: Support pasid attach/replace"
 - Add test case for attaching/replacing iopf-capable hwpt to pasid

v3: https://lore.kernel.org/kvm/20240628090557.50898-1-yi.l.liu@intel.com/
 - Split the set_dev_pasid op enhancements for domain replacement to be a
   separate series "Make set_dev_pasid op supportting domain replacement" [1].
   The below changes are made in the separate series.
   *) set_dev_pasid() callback should keep the old config if failed to attach to
      a domain. This simplifies the caller a lot as caller does not need to attach
      it back to old domain explicitly. This also avoids some corner cases in which
      the core may do duplicated domain attachment as described in below link (Jason)
      https://lore.kernel.org/linux-iommu/BN9PR11MB52768C98314A95AFCD2FA6478C0F2@BN9PR11MB5276.namprd11.prod.outlook.com/
   *) Drop patch 10 of v2 as it's a bug fix and can be submitted separately (Kevin)
   *) Rebase on top of Baolu's domain_alloc_paging refactor series (Jason)
 - Drop the attach_data which includes attach_fn and pasid, insteadly passing the
   pasid through the device attach path. (Jason)
 - Add a pasid-num-bits property to mock dev to make pasid selftest work (Kevin)

v2: https://lore.kernel.org/linux-iommu/20240412081516.31168-1-yi.l.liu@intel.com/
 - Domain replace for pasid should be handled in set_dev_pasid() callbacks
   instead of remove_dev_pasid and call set_dev_pasid afteward in iommu
   layer (Jason)
 - Make xarray operations more self-contained in iommufd pasid attach/replace/detach
   (Jason)
 - Tweak the dev_iommu_get_max_pasids() to allow iommu driver to populate the
   max_pasids. This makes the iommufd selftest simpler to meet the max_pasids
   check in iommu_attach_device_pasid()  (Jason)

v1: https://lore.kernel.org/kvm/20231127063428.127436-1-yi.l.liu@intel.com/#r
 - Implemnet iommu_replace_device_pasid() to fall back to the original domain
   if this replacement failed (Kevin)
 - Add check in do_attach() to check corressponding attach_fn per the pasid value.

rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (14):
  iommu: Introduce a replace API for device pasid
  iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of
    iommu_replace_group_handle()
  iommufd: Move the iommufd_handle helpers to device.c
  iommufd: Always pass iommu_attach_handle to iommu core
  iommufd: Pass pasid through the device attach/replace path
  iommufd: Mark PASID-compatible domain
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain for RID
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Allow allocating PASID-compatible domain
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add coverage for iommufd pasid attach/detach

 drivers/iommu/intel/iommu.c                   |   3 +-
 drivers/iommu/intel/nested.c                  |   2 +-
 drivers/iommu/iommu-priv.h                    |   4 +
 drivers/iommu/iommu.c                         |  90 ++++-
 drivers/iommu/iommufd/Makefile                |   1 +
 drivers/iommu/iommufd/device.c                | 123 +++++--
 drivers/iommu/iommufd/fault.c                 |  88 +----
 drivers/iommu/iommufd/hw_pagetable.c          |  15 +-
 drivers/iommu/iommufd/iommufd_private.h       |  95 ++++-
 drivers/iommu/iommufd/iommufd_test.h          |  31 ++
 drivers/iommu/iommufd/pasid.c                 | 173 +++++++++
 drivers/iommu/iommufd/selftest.c              | 231 +++++++++++-
 include/linux/iommufd.h                       |   7 +
 tools/testing/selftests/iommu/iommufd.c       | 348 ++++++++++++++++++
 .../selftests/iommu/iommufd_fail_nth.c        |  39 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 102 +++++
 16 files changed, 1216 insertions(+), 136 deletions(-)
 create mode 100644 drivers/iommu/iommufd/pasid.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-20  2:47   ` Baolu Lu
                     ` (3 more replies)
  2024-12-19 13:27 ` [PATCH v6 02/14] iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle() Yi Liu
                   ` (12 subsequent siblings)
  13 siblings, 4 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

Provide a high-level API to allow replacements of one domain with
another for specific pasid of a device. This is similar to
iommu_group_replace_domain() and it is expected to be used only by
IOMMUFD.

Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommu-priv.h |  4 ++
 drivers/iommu/iommu.c      | 90 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index de5b54eaa8bf..90b367de267e 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -27,6 +27,10 @@ static inline const struct iommu_ops *iommu_fwspec_ops(struct iommu_fwspec *fwsp
 int iommu_group_replace_domain(struct iommu_group *group,
 			       struct iommu_domain *new_domain);
 
+int iommu_replace_device_pasid(struct iommu_domain *domain,
+			       struct device *dev, ioasid_t pasid,
+			       struct iommu_attach_handle *handle);
+
 int iommu_device_register_bus(struct iommu_device *iommu,
 			      const struct iommu_ops *ops,
 			      const struct bus_type *bus,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 28ffd836592b..3ea62b9c7b2f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3323,14 +3323,15 @@ static void iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid,
 }
 
 static int __iommu_set_group_pasid(struct iommu_domain *domain,
-				   struct iommu_group *group, ioasid_t pasid)
+				   struct iommu_group *group, ioasid_t pasid,
+				   struct iommu_domain *old)
 {
 	struct group_device *device, *last_gdev;
 	int ret;
 
 	for_each_group_device(group, device) {
 		ret = domain->ops->set_dev_pasid(domain, device->dev,
-						 pasid, NULL);
+						 pasid, old);
 		if (ret)
 			goto err_revert;
 	}
@@ -3342,7 +3343,20 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain,
 	for_each_group_device(group, device) {
 		if (device == last_gdev)
 			break;
-		iommu_remove_dev_pasid(device->dev, pasid, domain);
+		/* If no old domain, undo the succeeded devices/pasid */
+		if (!old) {
+			iommu_remove_dev_pasid(device->dev, pasid, domain);
+			continue;
+		}
+
+		/*
+		 * Rollback the succeeded devices/pasid to the old domain.
+		 * And it is a driver bug to fail attaching with a previously
+		 * good domain.
+		 */
+		if (WARN_ON(old->ops->set_dev_pasid(old, device->dev,
+						    pasid, domain)))
+			iommu_remove_dev_pasid(device->dev, pasid, domain);
 	}
 	return ret;
 }
@@ -3404,7 +3418,7 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 	if (ret)
 		goto out_unlock;
 
-	ret = __iommu_set_group_pasid(domain, group, pasid);
+	ret = __iommu_set_group_pasid(domain, group, pasid, NULL);
 	if (ret)
 		xa_erase(&group->pasid_array, pasid);
 out_unlock:
@@ -3413,6 +3427,74 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
 
+/**
+ * iommu_replace_device_pasid - Replace the domain that a pasid is attached to
+ * @domain: the new iommu domain
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ * @handle: the attach handle.
+ *
+ * This API allows the pasid to switch domains. Return 0 on success, or an
+ * error. The pasid will keep the old configuration if replacement failed.
+ * This is supposed to be used by iommufd, and iommufd can guarantee that
+ * both iommu_attach_device_pasid() and iommu_replace_device_pasid() would
+ * pass in a valid @handle.
+ */
+int iommu_replace_device_pasid(struct iommu_domain *domain,
+			       struct device *dev, ioasid_t pasid,
+			       struct iommu_attach_handle *handle)
+{
+	/* Caller must be a probed driver on dev */
+	struct iommu_group *group = dev->iommu_group;
+	struct iommu_attach_handle *curr;
+	int ret;
+
+	if (!group)
+		return -ENODEV;
+
+	if (!domain->ops->set_dev_pasid)
+		return -EOPNOTSUPP;
+
+	if (dev_iommu_ops(dev) != domain->owner ||
+	    pasid == IOMMU_NO_PASID || !handle)
+		return -EINVAL;
+
+	handle->domain = domain;
+
+	mutex_lock(&group->mutex);
+	/*
+	 * The iommu_attach_handle of the pasid becomes inconsistent with the
+	 * actual handle per the below operation. The concurrent PRI path will
+	 * deliver the PRQs per the new handle, this does not have a functional
+	 * impact. The PRI path would eventually become consistent when the
+	 * replacement is done.
+	 */
+	curr = (struct iommu_attach_handle *)xa_store(&group->pasid_array,
+						      pasid, handle,
+						      GFP_KERNEL);
+	if (!curr) {
+		xa_erase(&group->pasid_array, pasid);
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	ret = xa_err(curr);
+	if (ret)
+		goto out_unlock;
+
+	if (curr->domain == domain)
+		goto out_unlock;
+
+	ret = __iommu_set_group_pasid(domain, group, pasid, curr->domain);
+	if (ret)
+		WARN_ON(handle != xa_store(&group->pasid_array, pasid,
+					   curr, GFP_KERNEL));
+out_unlock:
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
+
 /*
  * iommu_detach_device_pasid() - Detach the domain from pasid of device
  * @domain: the iommu domain.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 02/14] iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle()
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-19 13:27 ` [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c Yi Liu
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

There is a wrapper of iommu_attach_group_handle(), so making a wrapper for
iommu_replace_group_handle() for further code refactor. No functional change
intended.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/fault.c | 50 ++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 1fe804e28a86..d09f4594c67a 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -151,33 +151,23 @@ void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
 	kfree(handle);
 }
 
-static int __fault_domain_replace_dev(struct iommufd_device *idev,
-				      struct iommufd_hw_pagetable *hwpt,
-				      struct iommufd_hw_pagetable *old)
+static int
+__fault_domain_replace_dev(struct iommufd_device *idev,
+			   struct iommufd_hw_pagetable *hwpt,
+			   struct iommufd_hw_pagetable *old)
 {
-	struct iommufd_attach_handle *handle, *curr = NULL;
+	struct iommufd_attach_handle *handle;
 	int ret;
 
-	if (old->fault)
-		curr = iommufd_device_get_attach_handle(idev);
-
-	if (hwpt->fault) {
-		handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-		if (!handle)
-			return -ENOMEM;
-
-		handle->idev = idev;
-		ret = iommu_replace_group_handle(idev->igroup->group,
-						 hwpt->domain, &handle->handle);
-	} else {
-		ret = iommu_replace_group_handle(idev->igroup->group,
-						 hwpt->domain, NULL);
-	}
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return -ENOMEM;
 
-	if (!ret && curr) {
-		iommufd_auto_response_faults(old, curr);
-		kfree(curr);
-	}
+	handle->idev = idev;
+	ret = iommu_replace_group_handle(idev->igroup->group,
+					 hwpt->domain, &handle->handle);
+	if (ret)
+		kfree(handle);
 
 	return ret;
 }
@@ -188,6 +178,7 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 {
 	bool iopf_off = !hwpt->fault && old->fault;
 	bool iopf_on = hwpt->fault && !old->fault;
+	struct iommufd_attach_handle *curr;
 	int ret;
 
 	if (iopf_on) {
@@ -196,13 +187,24 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 			return ret;
 	}
 
-	ret = __fault_domain_replace_dev(idev, hwpt, old);
+	curr = iommufd_device_get_attach_handle(idev);
+
+	if (hwpt->fault)
+		ret = __fault_domain_replace_dev(idev, hwpt, old);
+	else
+		ret = iommu_replace_group_handle(idev->igroup->group,
+						 hwpt->domain, NULL);
 	if (ret) {
 		if (iopf_on)
 			iommufd_fault_iopf_disable(idev);
 		return ret;
 	}
 
+	if (curr) {
+		iommufd_auto_response_faults(old, curr);
+		kfree(curr);
+	}
+
 	if (iopf_off)
 		iommufd_fault_iopf_disable(idev);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
  2024-12-19 13:27 ` [PATCH v6 02/14] iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle() Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-20  3:31   ` Baolu Lu
  2024-12-19 13:27 ` [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core Yi Liu
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

The iommu_attach_handle is now only passed when attaching iopf-capable
domain, while it is not convenient for the iommu core to track the
attached domain of pasids. To address it, the iommu_attach_handle will
be passed to iommu core for non-fault-able domain as well. Hence the
iommufd_handle related helpers are no longer fault specific, it makes
more sense to move it out of fault.c.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 62 +++++++++++++++++++++++++
 drivers/iommu/iommufd/fault.c           | 56 +---------------------
 drivers/iommu/iommufd/iommufd_private.h |  8 ++++
 3 files changed, 72 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index dfd0898fb6c1..0e1baf84e887 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -293,6 +293,68 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
 
+/**
+ * iommufd_device_get_attach_handle - Return the attach handle for the RID
+ *
+ * @idev: The device to get attach_handle
+ *
+ * Currently there is no locking to synchronize threads that access the
+ * returned handle with those attaching or replacing the domain which might
+ * change the handle. It's caller's duty to guarantee no use-after-free.
+ *
+ * Return valid attach_handle if there is, otherwise NULL.
+ */
+struct iommufd_attach_handle *
+iommufd_device_get_attach_handle(struct iommufd_device *idev)
+{
+	struct iommu_attach_handle *handle;
+
+	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+	if (IS_ERR(handle))
+		return NULL;
+
+	return to_iommufd_handle(handle);
+}
+
+int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
+			      struct iommufd_device *idev)
+{
+	struct iommufd_attach_handle *handle;
+	int ret;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return -ENOMEM;
+
+	handle->idev = idev;
+	ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
+					&handle->handle);
+	if (ret)
+		kfree(handle);
+
+	return ret;
+}
+
+int iommufd_dev_replace_handle(struct iommufd_device *idev,
+			       struct iommufd_hw_pagetable *hwpt,
+			       struct iommufd_hw_pagetable *old)
+{
+	struct iommufd_attach_handle *handle;
+	int ret;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return -ENOMEM;
+
+	handle->idev = idev;
+	ret = iommu_replace_group_handle(idev->igroup->group,
+					 hwpt->domain, &handle->handle);
+	if (ret)
+		kfree(handle);
+
+	return ret;
+}
+
 static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 				   struct iommufd_hwpt_paging *hwpt_paging)
 {
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index d09f4594c67a..d06893226070 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -60,25 +60,6 @@ static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
 	mutex_unlock(&idev->iopf_lock);
 }
 
-static int __fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
-				     struct iommufd_device *idev)
-{
-	struct iommufd_attach_handle *handle;
-	int ret;
-
-	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-	if (!handle)
-		return -ENOMEM;
-
-	handle->idev = idev;
-	ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
-					&handle->handle);
-	if (ret)
-		kfree(handle);
-
-	return ret;
-}
-
 int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
 				    struct iommufd_device *idev)
 {
@@ -91,7 +72,7 @@ int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
 	if (ret)
 		return ret;
 
-	ret = __fault_domain_attach_dev(hwpt, idev);
+	ret = iommufd_dev_attach_handle(hwpt, idev);
 	if (ret)
 		iommufd_fault_iopf_disable(idev);
 
@@ -127,18 +108,6 @@ static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt,
 	mutex_unlock(&fault->mutex);
 }
 
-static struct iommufd_attach_handle *
-iommufd_device_get_attach_handle(struct iommufd_device *idev)
-{
-	struct iommu_attach_handle *handle;
-
-	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
-	if (IS_ERR(handle))
-		return NULL;
-
-	return to_iommufd_handle(handle);
-}
-
 void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
 				     struct iommufd_device *idev)
 {
@@ -151,27 +120,6 @@ void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
 	kfree(handle);
 }
 
-static int
-__fault_domain_replace_dev(struct iommufd_device *idev,
-			   struct iommufd_hw_pagetable *hwpt,
-			   struct iommufd_hw_pagetable *old)
-{
-	struct iommufd_attach_handle *handle;
-	int ret;
-
-	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-	if (!handle)
-		return -ENOMEM;
-
-	handle->idev = idev;
-	ret = iommu_replace_group_handle(idev->igroup->group,
-					 hwpt->domain, &handle->handle);
-	if (ret)
-		kfree(handle);
-
-	return ret;
-}
-
 int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 				     struct iommufd_hw_pagetable *hwpt,
 				     struct iommufd_hw_pagetable *old)
@@ -190,7 +138,7 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 	curr = iommufd_device_get_attach_handle(idev);
 
 	if (hwpt->fault)
-		ret = __fault_domain_replace_dev(idev, hwpt, old);
+		ret = iommufd_dev_replace_handle(idev, hwpt, old);
 	else
 		ret = iommu_replace_group_handle(idev->igroup->group,
 						 hwpt->domain, NULL);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index b6d706cf2c66..d5c83f10d83e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -459,6 +459,14 @@ struct iommufd_attach_handle {
 /* Convert an iommu attach handle to iommufd handle. */
 #define to_iommufd_handle(hdl)	container_of(hdl, struct iommufd_attach_handle, handle)
 
+struct iommufd_attach_handle *
+iommufd_device_get_attach_handle(struct iommufd_device *idev);
+int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
+			      struct iommufd_device *idev);
+int iommufd_dev_replace_handle(struct iommufd_device *idev,
+			       struct iommufd_hw_pagetable *hwpt,
+			       struct iommufd_hw_pagetable *old);
+
 static inline struct iommufd_fault *
 iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (2 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-20  4:35   ` Nicolin Chen
  2025-01-09  7:44   ` Tian, Kevin
  2024-12-19 13:27 ` [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path Yi Liu
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

The iommu_attach_handle is optional in the RID attach/replace API and the
PASID attach APIs. But it is a mandatory argument for the PASID replace API.
Without it, the PASID replace path cannot get the old domain. Hence, the
PASID path (attach/replace) requires the attach handle. As iommufd is the
major user of the RID attach/replace with iommu_attach_handle, this also
makes the iommufd always pass the attach handle for the RID path as well.
This keeps the RID and PASID path much aligned.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/fault.c           | 12 ++++--------
 drivers/iommu/iommufd/iommufd_private.h | 20 +++++++++++++++++---
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index d06893226070..19f6e2b84274 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -137,21 +137,17 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 
 	curr = iommufd_device_get_attach_handle(idev);
 
-	if (hwpt->fault)
-		ret = iommufd_dev_replace_handle(idev, hwpt, old);
-	else
-		ret = iommu_replace_group_handle(idev->igroup->group,
-						 hwpt->domain, NULL);
+	ret = iommufd_dev_replace_handle(idev, hwpt, old);
 	if (ret) {
 		if (iopf_on)
 			iommufd_fault_iopf_disable(idev);
 		return ret;
 	}
 
-	if (curr) {
+	if (old->fault)
 		iommufd_auto_response_faults(old, curr);
-		kfree(curr);
-	}
+
+	kfree(curr);
 
 	if (iopf_off)
 		iommufd_fault_iopf_disable(idev);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index d5c83f10d83e..6cf9c1f10e85 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -493,28 +493,42 @@ static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 	if (hwpt->fault)
 		return iommufd_fault_domain_attach_dev(hwpt, idev);
 
-	return iommu_attach_group(hwpt->domain, idev->igroup->group);
+	return iommufd_dev_attach_handle(hwpt, idev);
 }
 
 static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
 					      struct iommufd_device *idev)
 {
+	struct iommufd_attach_handle *handle;
+
 	if (hwpt->fault) {
 		iommufd_fault_domain_detach_dev(hwpt, idev);
 		return;
 	}
 
-	iommu_detach_group(hwpt->domain, idev->igroup->group);
+	handle = iommufd_device_get_attach_handle(idev);
+	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	kfree(handle);
 }
 
 static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 					      struct iommufd_hw_pagetable *hwpt,
 					      struct iommufd_hw_pagetable *old)
 {
+	struct iommufd_attach_handle *curr;
+	int ret;
+
 	if (old->fault || hwpt->fault)
 		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
 
-	return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
+	curr = iommufd_device_get_attach_handle(idev);
+
+	ret = iommufd_dev_replace_handle(idev, hwpt, old);
+	if (ret)
+		return ret;
+
+	kfree(curr);
+	return 0;
 }
 
 static inline struct iommufd_viommu *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (3 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2025-01-09  7:53   ` Tian, Kevin
  2024-12-19 13:27 ` [PATCH v6 06/14] iommufd: Mark PASID-compatible domain Yi Liu
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

Most of the core logic before conducting the actual device attach/
replace operation can be shared with pasid attach/replace. So pass
pasid through the device attach/replace helpers to prepare adding
pasid attach/replace.

So far the @pasid should only be IOMMU_NO_PASID. No functional change.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 57 ++++++++++++++-----------
 drivers/iommu/iommufd/fault.c           | 16 ++++---
 drivers/iommu/iommufd/hw_pagetable.c    |  5 ++-
 drivers/iommu/iommufd/iommufd_private.h | 41 +++++++++++-------
 4 files changed, 72 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 0e1baf84e887..c22ef4077348 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -294,9 +294,10 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
 EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
 
 /**
- * iommufd_device_get_attach_handle - Return the attach handle for the RID
+ * iommufd_device_get_attach_handle - Return the attach handle for the RID/PASID
  *
  * @idev: The device to get attach_handle
+ * @pasid: The pasid of the device to get attach_handle
  *
  * Currently there is no locking to synchronize threads that access the
  * returned handle with those attaching or replacing the domain which might
@@ -305,11 +306,11 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
  * Return valid attach_handle if there is, otherwise NULL.
  */
 struct iommufd_attach_handle *
-iommufd_device_get_attach_handle(struct iommufd_device *idev)
+iommufd_device_get_attach_handle(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommu_attach_handle *handle;
 
-	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+	handle = iommu_attach_handle_get(idev->igroup->group, pasid, 0);
 	if (IS_ERR(handle))
 		return NULL;
 
@@ -317,7 +318,8 @@ iommufd_device_get_attach_handle(struct iommufd_device *idev)
 }
 
 int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
-			      struct iommufd_device *idev)
+			      struct iommufd_device *idev,
+			      ioasid_t pasid)
 {
 	struct iommufd_attach_handle *handle;
 	int ret;
@@ -327,6 +329,7 @@ int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
 		return -ENOMEM;
 
 	handle->idev = idev;
+	WARN_ON(pasid != IOMMU_NO_PASID);
 	ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
 					&handle->handle);
 	if (ret)
@@ -336,6 +339,7 @@ int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
 }
 
 int iommufd_dev_replace_handle(struct iommufd_device *idev,
+			       ioasid_t pasid,
 			       struct iommufd_hw_pagetable *hwpt,
 			       struct iommufd_hw_pagetable *old)
 {
@@ -347,6 +351,7 @@ int iommufd_dev_replace_handle(struct iommufd_device *idev,
 		return -ENOMEM;
 
 	handle->idev = idev;
+	WARN_ON(pasid != IOMMU_NO_PASID);
 	ret = iommu_replace_group_handle(idev->igroup->group,
 					 hwpt->domain, &handle->handle);
 	if (ret)
@@ -415,7 +420,8 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
 }
 
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
-				struct iommufd_device *idev)
+				struct iommufd_device *idev,
+				ioasid_t pasid)
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
 	int rc;
@@ -441,7 +447,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommufd_hwpt_attach_device(hwpt, idev);
+		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
 		if (rc)
 			goto err_unresv;
 		idev->igroup->hwpt = hwpt;
@@ -459,7 +465,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 }
 
 struct iommufd_hw_pagetable *
-iommufd_hw_pagetable_detach(struct iommufd_device *idev)
+iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
@@ -467,7 +473,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	mutex_lock(&idev->igroup->lock);
 	list_del(&idev->group_item);
 	if (list_empty(&idev->igroup->device_list)) {
-		iommufd_hwpt_detach_device(hwpt, idev);
+		iommufd_hwpt_detach_device(hwpt, idev, pasid);
 		idev->igroup->hwpt = NULL;
 	}
 	if (hwpt_paging)
@@ -479,12 +485,12 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 }
 
 static struct iommufd_hw_pagetable *
-iommufd_device_do_attach(struct iommufd_device *idev,
+iommufd_device_do_attach(struct iommufd_device *idev, ioasid_t pasid,
 			 struct iommufd_hw_pagetable *hwpt)
 {
 	int rc;
 
-	rc = iommufd_hw_pagetable_attach(hwpt, idev);
+	rc = iommufd_hw_pagetable_attach(hwpt, idev, pasid);
 	if (rc)
 		return ERR_PTR(rc);
 	return NULL;
@@ -533,7 +539,7 @@ iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup,
 }
 
 static struct iommufd_hw_pagetable *
-iommufd_device_do_replace(struct iommufd_device *idev,
+iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 			  struct iommufd_hw_pagetable *hwpt)
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
@@ -562,7 +568,7 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 			goto err_unlock;
 	}
 
-	rc = iommufd_hwpt_replace_device(idev, hwpt, old_hwpt);
+	rc = iommufd_hwpt_replace_device(idev, pasid, hwpt, old_hwpt);
 	if (rc)
 		goto err_unresv;
 
@@ -595,7 +601,8 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 }
 
 typedef struct iommufd_hw_pagetable *(*attach_fn)(
-	struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
+			struct iommufd_device *idev, ioasid_t pasid,
+			struct iommufd_hw_pagetable *hwpt);
 
 /*
  * When automatically managing the domains we search for a compatible domain in
@@ -603,7 +610,7 @@ typedef struct iommufd_hw_pagetable *(*attach_fn)(
  * Automatic domain selection will never pick a manually created domain.
  */
 static struct iommufd_hw_pagetable *
-iommufd_device_auto_get_domain(struct iommufd_device *idev,
+iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
 			       struct iommufd_ioas *ioas, u32 *pt_id,
 			       attach_fn do_attach)
 {
@@ -632,7 +639,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		hwpt = &hwpt_paging->common;
 		if (!iommufd_lock_obj(&hwpt->obj))
 			continue;
-		destroy_hwpt = (*do_attach)(idev, hwpt);
+		destroy_hwpt = (*do_attach)(idev, pasid, hwpt);
 		if (IS_ERR(destroy_hwpt)) {
 			iommufd_put_object(idev->ictx, &hwpt->obj);
 			/*
@@ -659,7 +666,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	hwpt = &hwpt_paging->common;
 
 	if (!immediate_attach) {
-		destroy_hwpt = (*do_attach)(idev, hwpt);
+		destroy_hwpt = (*do_attach)(idev, pasid, hwpt);
 		if (IS_ERR(destroy_hwpt))
 			goto out_abort;
 	} else {
@@ -680,8 +687,9 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	return destroy_hwpt;
 }
 
-static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
-				    attach_fn do_attach)
+static int iommufd_device_change_pt(struct iommufd_device *idev,
+				    ioasid_t pasid,
+				    u32 *pt_id, attach_fn do_attach)
 {
 	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_object *pt_obj;
@@ -696,7 +704,7 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
 		struct iommufd_hw_pagetable *hwpt =
 			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
 
-		destroy_hwpt = (*do_attach)(idev, hwpt);
+		destroy_hwpt = (*do_attach)(idev, pasid, hwpt);
 		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
@@ -705,8 +713,8 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
 		struct iommufd_ioas *ioas =
 			container_of(pt_obj, struct iommufd_ioas, obj);
 
-		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
-							      do_attach);
+		destroy_hwpt = iommufd_device_auto_get_domain(idev, pasid, ioas,
+							      pt_id, do_attach);
 		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
@@ -743,7 +751,8 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 {
 	int rc;
 
-	rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach);
+	rc = iommufd_device_change_pt(idev, IOMMU_NO_PASID, pt_id,
+				      &iommufd_device_do_attach);
 	if (rc)
 		return rc;
 
@@ -773,7 +782,7 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, "IOMMUFD");
  */
 int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
 {
-	return iommufd_device_change_pt(idev, pt_id,
+	return iommufd_device_change_pt(idev, IOMMU_NO_PASID, pt_id,
 					&iommufd_device_do_replace);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, "IOMMUFD");
@@ -789,7 +798,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
 {
 	struct iommufd_hw_pagetable *hwpt;
 
-	hwpt = iommufd_hw_pagetable_detach(idev);
+	hwpt = iommufd_hw_pagetable_detach(idev, IOMMU_NO_PASID);
 	iommufd_hw_pagetable_put(idev->ictx, hwpt);
 	refcount_dec(&idev->obj.users);
 }
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 19f6e2b84274..99d58df8f6db 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -61,7 +61,8 @@ static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
 }
 
 int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
-				    struct iommufd_device *idev)
+				    struct iommufd_device *idev,
+				    ioasid_t pasid)
 {
 	int ret;
 
@@ -72,7 +73,7 @@ int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
 	if (ret)
 		return ret;
 
-	ret = iommufd_dev_attach_handle(hwpt, idev);
+	ret = iommufd_dev_attach_handle(hwpt, idev, pasid);
 	if (ret)
 		iommufd_fault_iopf_disable(idev);
 
@@ -109,11 +110,13 @@ static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt,
 }
 
 void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
-				     struct iommufd_device *idev)
+				     struct iommufd_device *idev,
+				     ioasid_t pasid)
 {
 	struct iommufd_attach_handle *handle;
 
-	handle = iommufd_device_get_attach_handle(idev);
+	handle = iommufd_device_get_attach_handle(idev, pasid);
+	WARN_ON(pasid != IOMMU_NO_PASID);
 	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
 	iommufd_auto_response_faults(hwpt, handle);
 	iommufd_fault_iopf_disable(idev);
@@ -121,6 +124,7 @@ void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
 }
 
 int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+				     ioasid_t pasid,
 				     struct iommufd_hw_pagetable *hwpt,
 				     struct iommufd_hw_pagetable *old)
 {
@@ -135,9 +139,9 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 			return ret;
 	}
 
-	curr = iommufd_device_get_attach_handle(idev);
+	curr = iommufd_device_get_attach_handle(idev, pasid);
 
-	ret = iommufd_dev_replace_handle(idev, hwpt, old);
+	ret = iommufd_dev_replace_handle(idev, pasid, hwpt, old);
 	if (ret) {
 		if (iopf_on)
 			iommufd_fault_iopf_disable(idev);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 598be26a14e2..af2b72647d5a 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -184,7 +184,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	 * sequence. Once those drivers are fixed this should be removed.
 	 */
 	if (immediate_attach) {
-		rc = iommufd_hw_pagetable_attach(hwpt, idev);
+		/* Sinc this is just a trick, so passing IOMMU_NO_PASID is enough */
+		rc = iommufd_hw_pagetable_attach(hwpt, idev, IOMMU_NO_PASID);
 		if (rc)
 			goto out_abort;
 	}
@@ -197,7 +198,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 
 out_detach:
 	if (immediate_attach)
-		iommufd_hw_pagetable_detach(idev);
+		iommufd_hw_pagetable_detach(idev, IOMMU_NO_PASID);
 out_abort:
 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
 	return ERR_PTR(rc);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6cf9c1f10e85..6d4e743ea8fe 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -350,9 +350,10 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			  bool immediate_attach,
 			  const struct iommu_user_data *user_data);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
-				struct iommufd_device *idev);
+				struct iommufd_device *idev,
+				ioasid_t pasid);
 struct iommufd_hw_pagetable *
-iommufd_hw_pagetable_detach(struct iommufd_device *idev);
+iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid);
 void iommufd_hwpt_paging_destroy(struct iommufd_object *obj);
 void iommufd_hwpt_paging_abort(struct iommufd_object *obj);
 void iommufd_hwpt_nested_destroy(struct iommufd_object *obj);
@@ -460,10 +461,12 @@ struct iommufd_attach_handle {
 #define to_iommufd_handle(hdl)	container_of(hdl, struct iommufd_attach_handle, handle)
 
 struct iommufd_attach_handle *
-iommufd_device_get_attach_handle(struct iommufd_device *idev);
+iommufd_device_get_attach_handle(struct iommufd_device *idev, ioasid_t pasid);
 int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
-			      struct iommufd_device *idev);
+			      struct iommufd_device *idev,
+			      ioasid_t pasid);
 int iommufd_dev_replace_handle(struct iommufd_device *idev,
+			       ioasid_t pasid,
 			       struct iommufd_hw_pagetable *hwpt,
 			       struct iommufd_hw_pagetable *old);
 
@@ -480,38 +483,45 @@ void iommufd_fault_destroy(struct iommufd_object *obj);
 int iommufd_fault_iopf_handler(struct iopf_group *group);
 
 int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
-				    struct iommufd_device *idev);
+				    struct iommufd_device *idev,
+				    ioasid_t pasid);
 void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
-				     struct iommufd_device *idev);
+				     struct iommufd_device *idev,
+				     ioasid_t pasid);
 int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+				     ioasid_t pasid,
 				     struct iommufd_hw_pagetable *hwpt,
 				     struct iommufd_hw_pagetable *old);
 
 static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
-					     struct iommufd_device *idev)
+					     struct iommufd_device *idev,
+					     ioasid_t pasid)
 {
 	if (hwpt->fault)
-		return iommufd_fault_domain_attach_dev(hwpt, idev);
+		return iommufd_fault_domain_attach_dev(hwpt, idev, pasid);
 
-	return iommufd_dev_attach_handle(hwpt, idev);
+	return iommufd_dev_attach_handle(hwpt, idev, pasid);
 }
 
 static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
-					      struct iommufd_device *idev)
+					      struct iommufd_device *idev,
+					      ioasid_t pasid)
 {
 	struct iommufd_attach_handle *handle;
 
 	if (hwpt->fault) {
-		iommufd_fault_domain_detach_dev(hwpt, idev);
+		iommufd_fault_domain_detach_dev(hwpt, idev, pasid);
 		return;
 	}
 
-	handle = iommufd_device_get_attach_handle(idev);
+	handle = iommufd_device_get_attach_handle(idev, pasid);
+	WARN_ON(pasid != IOMMU_NO_PASID);
 	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
 	kfree(handle);
 }
 
 static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
+					      ioasid_t pasid,
 					      struct iommufd_hw_pagetable *hwpt,
 					      struct iommufd_hw_pagetable *old)
 {
@@ -519,11 +529,12 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 	int ret;
 
 	if (old->fault || hwpt->fault)
-		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
+		return iommufd_fault_domain_replace_dev(idev, pasid,
+							hwpt, old);
 
-	curr = iommufd_device_get_attach_handle(idev);
+	curr = iommufd_device_get_attach_handle(idev, pasid);
 
-	ret = iommufd_dev_replace_handle(idev, hwpt, old);
+	ret = iommufd_dev_replace_handle(idev, pasid, hwpt, old);
 	if (ret)
 		return ret;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 06/14] iommufd: Mark PASID-compatible domain
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (4 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2025-01-09  7:56   ` Tian, Kevin
  2025-01-09 14:54   ` Jason Gunthorpe
  2024-12-19 13:27 ` [PATCH v6 07/14] iommufd: Support pasid attach/replace Yi Liu
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

AMD IOMMU requires attaching PASID-compatible domains to PASID-capable
devices. This includes the domains attached to RID and PASIDs. Related
discussions in link [1] and [2].  ARM has similar requirement but does
not need extra hint from iommufd, Intel does not have this requirement
but can live up with it. Hence, iommufd is going to enforce this
requirement as it's general requirement. Mark the PASID-capable domains
to prepare for adding this enforcement when PASID support is added.

[1] https://lore.kernel.org/linux-iommu/20240709182303.GK14050@ziepe.ca/
[2] https://lore.kernel.org/linux-iommu/20240822124433.GD3468552@ziepe.ca/

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 3 +++
 drivers/iommu/iommufd/iommufd_private.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index af2b72647d5a..6fc848d3ef47 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -132,6 +132,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	if (IS_ERR(hwpt_paging))
 		return ERR_CAST(hwpt_paging);
 	hwpt = &hwpt_paging->common;
+	hwpt->pasid_compat = flags & IOMMU_HWPT_ALLOC_PASID;
 
 	INIT_LIST_HEAD(&hwpt_paging->hwpt_item);
 	/* Pairs with iommufd_hw_pagetable_destroy() */
@@ -239,6 +240,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	if (IS_ERR(hwpt_nested))
 		return ERR_CAST(hwpt_nested);
 	hwpt = &hwpt_nested->common;
+	hwpt->pasid_compat = flags & IOMMU_HWPT_ALLOC_PASID;
 
 	refcount_inc(&parent->common.obj.users);
 	hwpt_nested->parent = parent;
@@ -293,6 +295,7 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
 	if (IS_ERR(hwpt_nested))
 		return ERR_CAST(hwpt_nested);
 	hwpt = &hwpt_nested->common;
+	hwpt->pasid_compat = flags & IOMMU_HWPT_ALLOC_PASID;
 
 	hwpt_nested->viommu = viommu;
 	refcount_inc(&viommu->obj.users);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6d4e743ea8fe..21d33a784193 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -276,6 +276,7 @@ struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
 	struct iommu_domain *domain;
 	struct iommufd_fault *fault;
+	bool pasid_compat : 1;
 };
 
 struct iommufd_hwpt_paging {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 07/14] iommufd: Support pasid attach/replace
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (5 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 06/14] iommufd: Mark PASID-compatible domain Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2025-01-09  8:25   ` Tian, Kevin
  2024-12-19 13:27 ` [PATCH v6 08/14] iommufd: Enforce PASID-compatible domain for RID Yi Liu
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

This introduces three APIs for device drivers to manage pasid attach/
replace/detach.

    int iommufd_device_pasid_attach(struct iommufd_device *idev,
				    ioasid_t pasid, u32 *pt_id);
    int iommufd_device_pasid_replace(struct iommufd_device *idev,
				     ioasid_t pasid, u32 *pt_id);
    void iommufd_device_pasid_detach(struct iommufd_device *idev,
				     ioasid_t pasid);

pasid operations have different implications when comparing to device
operations:

 - No connection to iommufd_group since pasid is a device capability
   and can be enabled only in singleton group;

 - no reserved region per pasid otherwise SVA architecture is already
   broken (CPU address space doesn't count device reserved regions);

 - accordingly no sw_msi trick;

 - immediated_attach is not supported, expecting that arm-smmu driver
   will already remove that requirement before supporting this pasid
   operation. This avoids unnecessary change in iommufd_hw_pagetable_alloc()
   to carry the pasid from device.c.

With above differences, this puts all pasid related logics into a new
pasid.c file.

Cache coherency enforcement is still applied to pasid operations since
it is about memory accesses post page table walking (no matter the walk
is per RID or per PASID).

Since the attach is per PASID, this introduces a pasid_hwpts xarray to
track the per-pasid attach data.

AMD requires using PASID-compatible domains for PASIDs, hence the hwpts
directing the PASID path requires to flagged with IOMMU_HWPT_ALLOC_PASID.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/Makefile          |   1 +
 drivers/iommu/iommufd/device.c          |  30 +++--
 drivers/iommu/iommufd/fault.c           |   6 +-
 drivers/iommu/iommufd/iommufd_private.h |  27 +++-
 drivers/iommu/iommufd/pasid.c           | 161 ++++++++++++++++++++++++
 include/linux/iommufd.h                 |   7 ++
 6 files changed, 215 insertions(+), 17 deletions(-)
 create mode 100644 drivers/iommu/iommufd/pasid.c

diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index cb784da6cddc..a64a67b502ae 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -7,6 +7,7 @@ iommufd-y := \
 	ioas.o \
 	main.o \
 	pages.o \
+	pasid.o \
 	vfio_compat.o \
 	viommu.o
 
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index c22ef4077348..c1ff3dbe109c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -136,6 +136,7 @@ void iommufd_device_destroy(struct iommufd_object *obj)
 	struct iommufd_device *idev =
 		container_of(obj, struct iommufd_device, obj);
 
+	WARN_ON(!xa_empty(&idev->pasid_hwpts));
 	iommu_device_release_dma_owner(idev->dev);
 	iommufd_put_group(idev->igroup);
 	if (!iommufd_selftest_is_mock_dev(idev->dev))
@@ -217,6 +218,8 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	idev->igroup = igroup;
 	mutex_init(&idev->iopf_lock);
 
+	xa_init(&idev->pasid_hwpts);
+
 	/*
 	 * If the caller fails after this success it must call
 	 * iommufd_unbind_device() which is safe since we hold this refcount.
@@ -329,9 +332,12 @@ int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
 		return -ENOMEM;
 
 	handle->idev = idev;
-	WARN_ON(pasid != IOMMU_NO_PASID);
-	ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
-					&handle->handle);
+	if (pasid == IOMMU_NO_PASID)
+		ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
+						&handle->handle);
+	else
+		ret = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid,
+						&handle->handle);
 	if (ret)
 		kfree(handle);
 
@@ -351,9 +357,12 @@ int iommufd_dev_replace_handle(struct iommufd_device *idev,
 		return -ENOMEM;
 
 	handle->idev = idev;
-	WARN_ON(pasid != IOMMU_NO_PASID);
-	ret = iommu_replace_group_handle(idev->igroup->group,
-					 hwpt->domain, &handle->handle);
+	if (pasid == IOMMU_NO_PASID)
+		ret = iommu_replace_group_handle(idev->igroup->group,
+						 hwpt->domain, &handle->handle);
+	else
+		ret = iommu_replace_device_pasid(hwpt->domain, idev->dev,
+						 pasid, &handle->handle);
 	if (ret)
 		kfree(handle);
 
@@ -600,10 +609,6 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	return ERR_PTR(rc);
 }
 
-typedef struct iommufd_hw_pagetable *(*attach_fn)(
-			struct iommufd_device *idev, ioasid_t pasid,
-			struct iommufd_hw_pagetable *hwpt);
-
 /*
  * When automatically managing the domains we search for a compatible domain in
  * the iopt and if one is found use it, otherwise create a new domain.
@@ -687,9 +692,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
 	return destroy_hwpt;
 }
 
-static int iommufd_device_change_pt(struct iommufd_device *idev,
-				    ioasid_t pasid,
-				    u32 *pt_id, attach_fn do_attach)
+int iommufd_device_change_pt(struct iommufd_device *idev, ioasid_t pasid,
+			     u32 *pt_id, attach_fn do_attach)
 {
 	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_object *pt_obj;
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 99d58df8f6db..96ca28ff31f7 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -116,8 +116,10 @@ void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
 	struct iommufd_attach_handle *handle;
 
 	handle = iommufd_device_get_attach_handle(idev, pasid);
-	WARN_ON(pasid != IOMMU_NO_PASID);
-	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	if (pasid == IOMMU_NO_PASID)
+		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	else
+		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
 	iommufd_auto_response_faults(hwpt, handle);
 	iommufd_fault_iopf_disable(idev);
 	kfree(handle);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 21d33a784193..e9d6bd8b44bc 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -400,6 +400,7 @@ struct iommufd_device {
 	struct list_head group_item;
 	/* always the physical device */
 	struct device *dev;
+	struct xarray pasid_hwpts;
 	bool enforce_cache_coherency;
 	/* protect iopf_enabled counter */
 	struct mutex iopf_lock;
@@ -417,6 +418,20 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
 void iommufd_device_destroy(struct iommufd_object *obj);
 int iommufd_get_hw_info(struct iommufd_ucmd *ucmd);
 
+typedef struct iommufd_hw_pagetable *(*attach_fn)(
+			struct iommufd_device *idev, ioasid_t pasid,
+			struct iommufd_hw_pagetable *hwpt);
+
+int iommufd_device_change_pt(struct iommufd_device *idev, ioasid_t pasid,
+			     u32 *pt_id, attach_fn do_attach);
+
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_attach(struct iommufd_device *idev, ioasid_t pasid,
+			       struct iommufd_hw_pagetable *hwpt);
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_replace(struct iommufd_device *idev, ioasid_t pasid,
+				struct iommufd_hw_pagetable *hwpt);
+
 struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
@@ -498,6 +513,9 @@ static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 					     struct iommufd_device *idev,
 					     ioasid_t pasid)
 {
+	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
+		return -EINVAL;
+
 	if (hwpt->fault)
 		return iommufd_fault_domain_attach_dev(hwpt, idev, pasid);
 
@@ -516,8 +534,10 @@ static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
 	}
 
 	handle = iommufd_device_get_attach_handle(idev, pasid);
-	WARN_ON(pasid != IOMMU_NO_PASID);
-	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	if (pasid == IOMMU_NO_PASID)
+		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	else
+		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
 	kfree(handle);
 }
 
@@ -529,6 +549,9 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 	struct iommufd_attach_handle *curr;
 	int ret;
 
+	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
+		return -EINVAL;
+
 	if (old->fault || hwpt->fault)
 		return iommufd_fault_domain_replace_dev(idev, pasid,
 							hwpt, old);
diff --git a/drivers/iommu/iommufd/pasid.c b/drivers/iommu/iommufd/pasid.c
new file mode 100644
index 000000000000..fcdfbc01dcbb
--- /dev/null
+++ b/drivers/iommu/iommufd/pasid.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024, Intel Corporation
+ */
+#include <linux/iommufd.h>
+#include <linux/iommu.h>
+#include "../iommu-priv.h"
+
+#include "iommufd_private.h"
+
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_attach(struct iommufd_device *idev, ioasid_t pasid,
+			       struct iommufd_hw_pagetable *hwpt)
+{
+	void *curr;
+	int rc;
+
+	refcount_inc(&hwpt->obj.users);
+	curr = xa_cmpxchg(&idev->pasid_hwpts, pasid, NULL, hwpt, GFP_KERNEL);
+	if (curr) {
+		if (curr == hwpt)
+			rc = 0;
+		else
+			rc = xa_err(curr) ? : -EINVAL;
+		goto err_put_hwpt;
+	}
+
+	rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
+	if (rc) {
+		xa_erase(&idev->pasid_hwpts, pasid);
+		goto err_put_hwpt;
+	}
+
+	return NULL;
+
+err_put_hwpt:
+	refcount_dec(&hwpt->obj.users);
+	return rc ? ERR_PTR(rc) : NULL;
+}
+
+struct iommufd_hw_pagetable *
+iommufd_device_pasid_do_replace(struct iommufd_device *idev, ioasid_t pasid,
+				struct iommufd_hw_pagetable *hwpt)
+{
+	void *curr;
+	int rc;
+
+	refcount_inc(&hwpt->obj.users);
+	curr = xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
+	rc = xa_err(curr);
+	if (rc)
+		goto out_put_hwpt;
+
+	if (!curr) {
+		xa_erase(&idev->pasid_hwpts, pasid);
+		rc = -EINVAL;
+		goto out_put_hwpt;
+	}
+
+	if (curr == hwpt)
+		goto out_put_hwpt;
+
+	/*
+	 * After replacement, the reference on the old hwpt is retained
+	 * in this thread as caller would free it.
+	 */
+	rc = iommufd_hwpt_replace_device(idev, pasid, hwpt, curr);
+	if (rc) {
+		WARN_ON(xa_err(xa_store(&idev->pasid_hwpts, pasid,
+					curr, GFP_KERNEL)));
+		goto out_put_hwpt;
+	}
+
+	/* Caller must destroy old_hwpt */
+	return curr;
+
+out_put_hwpt:
+	refcount_dec(&hwpt->obj.users);
+	return rc ? ERR_PTR(rc) : NULL;
+}
+
+/**
+ * iommufd_device_pasid_attach - Connect a {device, pasid} to an iommu_domain
+ * @idev: device to attach
+ * @pasid: pasid to attach
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This connects a pasid of the device to an iommu_domain. Once this
+ * completes the device could do DMA with the pasid.
+ *
+ * This function is undone by calling iommufd_device_detach_pasid().
+ *
+ * iommufd does not handle race between iommufd_device_pasid_attach(),
+ * iommufd_device_pasid_replace() and iommufd_device_pasid_detach().
+ * So caller of them should guarantee no concurrent call on the same
+ * device and pasid.
+ *
+ * Return 0 for success, otherwise errno.
+ */
+int iommufd_device_pasid_attach(struct iommufd_device *idev,
+				ioasid_t pasid, u32 *pt_id)
+{
+	return iommufd_device_change_pt(idev, pasid, pt_id,
+					&iommufd_device_pasid_do_attach);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_attach, "IOMMUFD");
+
+/**
+ * iommufd_device_pasid_replace - Change the {device, pasid}'s iommu_domain
+ * @idev: device to change
+ * @pasid: pasid to change
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This is the same as
+ *   iommufd_device_pasid_detach();
+ *   iommufd_device_pasid_attach();
+ *
+ * If it fails then no change is made to the attachment. The iommu driver may
+ * implement this so there is no disruption in translation. This can only be
+ * called if iommufd_device_pasid_attach() has already succeeded.
+ *
+ * iommufd does not handle race between iommufd_device_pasid_replace(),
+ * iommufd_device_pasid_attach() and iommufd_device_pasid_detach().
+ * So caller of them should guarantee no concurrent call on the same
+ * device and pasid.
+ *
+ * Return 0 for success, otherwise errno.
+ */
+int iommufd_device_pasid_replace(struct iommufd_device *idev,
+				 ioasid_t pasid, u32 *pt_id)
+{
+	return iommufd_device_change_pt(idev, pasid, pt_id,
+					&iommufd_device_pasid_do_replace);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_replace, "IOMMUFD");
+
+/**
+ * iommufd_device_pasid_detach - Disconnect a {device, pasid} to an iommu_domain
+ * @idev: device to detach
+ * @pasid: pasid to detach
+ *
+ * Undo iommufd_device_pasid_attach(). This disconnects the idev/pasid from
+ * the previously attached pt_id.
+ *
+ * iommufd does not handle race between iommufd_device_pasid_detach(),
+ * iommufd_device_pasid_attach() and iommufd_device_pasid_replace().
+ * So caller of them should guarantee no concurrent call on the same
+ * device and pasid.
+ */
+void iommufd_device_pasid_detach(struct iommufd_device *idev, ioasid_t pasid)
+{
+	struct iommufd_hw_pagetable *hwpt;
+
+	hwpt = xa_erase(&idev->pasid_hwpts, pasid);
+	if (WARN_ON(!hwpt))
+		return;
+	iommufd_hwpt_detach_device(hwpt, idev, pasid);
+	iommufd_hw_pagetable_put(idev->ictx, hwpt);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_detach, "IOMMUFD");
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 11110c749200..af7e5a4bfcf2 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -8,6 +8,7 @@
 
 #include <linux/err.h>
 #include <linux/errno.h>
+#include <linux/iommu.h>
 #include <linux/refcount.h>
 #include <linux/types.h>
 #include <linux/xarray.h>
@@ -56,6 +57,12 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
 int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id);
 void iommufd_device_detach(struct iommufd_device *idev);
 
+int iommufd_device_pasid_attach(struct iommufd_device *idev,
+				ioasid_t pasid, u32 *pt_id);
+int iommufd_device_pasid_replace(struct iommufd_device *idev,
+				 ioasid_t pasid, u32 *pt_id);
+void iommufd_device_pasid_detach(struct iommufd_device *idev, ioasid_t pasid);
+
 struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
 u32 iommufd_device_to_id(struct iommufd_device *idev);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 08/14] iommufd: Enforce PASID-compatible domain for RID
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (6 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 07/14] iommufd: Support pasid attach/replace Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2025-01-09  8:31   ` Tian, Kevin
  2024-12-19 13:27 ` [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce
the RID to use PASID-compatible domain if PASID has been attached.

This enforcement requires a lock across the RID and PASID attach path,
use the idev->igroup->lock for this sync.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 18 ++++++++++++++++--
 drivers/iommu/iommufd/pasid.c           | 14 +++++++++++++-
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index e9d6bd8b44bc..158d8e6d5a9a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -513,7 +513,14 @@ static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 					     struct iommufd_device *idev,
 					     ioasid_t pasid)
 {
-	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
+	lockdep_assert_held(&idev->igroup->lock);
+
+	if (pasid == IOMMU_NO_PASID &&
+	    !xa_empty(&idev->pasid_hwpts) && !hwpt->pasid_compat)
+		return -EINVAL;
+
+	if (pasid != IOMMU_NO_PASID &&
+	    (!idev->igroup->hwpt->pasid_compat || !hwpt->pasid_compat))
 		return -EINVAL;
 
 	if (hwpt->fault)
@@ -549,7 +556,14 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 	struct iommufd_attach_handle *curr;
 	int ret;
 
-	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
+	lockdep_assert_held(&idev->igroup->lock);
+
+	if (pasid == IOMMU_NO_PASID &&
+	    !xa_empty(&idev->pasid_hwpts) && !hwpt->pasid_compat)
+		return -EINVAL;
+
+	if (pasid != IOMMU_NO_PASID &&
+	    (!idev->igroup->hwpt->pasid_compat || !hwpt->pasid_compat))
 		return -EINVAL;
 
 	if (old->fault || hwpt->fault)
diff --git a/drivers/iommu/iommufd/pasid.c b/drivers/iommu/iommufd/pasid.c
index fcdfbc01dcbb..fdf97f1d71ae 100644
--- a/drivers/iommu/iommufd/pasid.c
+++ b/drivers/iommu/iommufd/pasid.c
@@ -15,6 +15,8 @@ iommufd_device_pasid_do_attach(struct iommufd_device *idev, ioasid_t pasid,
 	int rc;
 
 	refcount_inc(&hwpt->obj.users);
+
+	mutex_lock(&idev->igroup->lock);
 	curr = xa_cmpxchg(&idev->pasid_hwpts, pasid, NULL, hwpt, GFP_KERNEL);
 	if (curr) {
 		if (curr == hwpt)
@@ -30,9 +32,11 @@ iommufd_device_pasid_do_attach(struct iommufd_device *idev, ioasid_t pasid,
 		goto err_put_hwpt;
 	}
 
+	mutex_unlock(&idev->igroup->lock);
 	return NULL;
 
 err_put_hwpt:
+	mutex_unlock(&idev->igroup->lock);
 	refcount_dec(&hwpt->obj.users);
 	return rc ? ERR_PTR(rc) : NULL;
 }
@@ -45,6 +49,8 @@ iommufd_device_pasid_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	int rc;
 
 	refcount_inc(&hwpt->obj.users);
+
+	mutex_lock(&idev->igroup->lock);
 	curr = xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
 	rc = xa_err(curr);
 	if (rc)
@@ -70,10 +76,12 @@ iommufd_device_pasid_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 		goto out_put_hwpt;
 	}
 
+	mutex_unlock(&idev->igroup->lock);
 	/* Caller must destroy old_hwpt */
 	return curr;
 
 out_put_hwpt:
+	mutex_unlock(&idev->igroup->lock);
 	refcount_dec(&hwpt->obj.users);
 	return rc ? ERR_PTR(rc) : NULL;
 }
@@ -152,10 +160,14 @@ void iommufd_device_pasid_detach(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_hw_pagetable *hwpt;
 
+	mutex_lock(&idev->igroup->lock);
 	hwpt = xa_erase(&idev->pasid_hwpts, pasid);
-	if (WARN_ON(!hwpt))
+	if (WARN_ON(!hwpt)) {
+		mutex_unlock(&idev->igroup->lock);
 		return;
+	}
 	iommufd_hwpt_detach_device(hwpt, idev, pasid);
+	mutex_unlock(&idev->igroup->lock);
 	iommufd_hw_pagetable_put(idev->ictx, hwpt);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_detach, "IOMMUFD");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (7 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 08/14] iommufd: Enforce PASID-compatible domain for RID Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-23  2:51   ` Baolu Lu
                     ` (2 more replies)
  2024-12-19 13:27 ` [PATCH v6 10/14] iommufd: Allow allocating PASID-compatible domain Yi Liu
                   ` (4 subsequent siblings)
  13 siblings, 3 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

Intel iommu driver just treats it as a nop since Intel VT-d does not have
special requirement on domains attached to either the PASID or RID of a
PASID-capable device.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/iommu.c  | 3 ++-
 drivers/iommu/intel/nested.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cd5e339fd5bb..0a622a89d876 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3347,7 +3347,8 @@ intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
 	bool first_stage;
 
 	if (flags &
-	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
+	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
+	       IOMMU_HWPT_ALLOC_PASID)))
 		return ERR_PTR(-EOPNOTSUPP);
 	if (nested_parent && !nested_supported(iommu))
 		return ERR_PTR(-EOPNOTSUPP);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index aba92c00b427..6ac5c534bef4 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	struct dmar_domain *domain;
 	int ret;
 
-	if (!nested_supported(iommu) || flags)
+	if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)
 		return ERR_PTR(-EOPNOTSUPP);
 
 	/* Must be nested domain */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 10/14] iommufd: Allow allocating PASID-compatible domain
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (8 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-19 13:27 ` [PATCH v6 11/14] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

The underlying framework has supported the PASID attach and related
enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag.
This extends iommufd to support PASID compatible domain requested by
userspace or the PASID compatible domain allocated in the auto_domain
path.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c       | 4 +++-
 drivers/iommu/iommufd/hw_pagetable.c | 7 ++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index c1ff3dbe109c..768eae6c3275 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -662,7 +662,9 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
 		goto out_unlock;
 	}
 
-	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, 0,
+	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev,
+						pasid != IOMMU_NO_PASID ?
+						IOMMU_HWPT_ALLOC_PASID : 0,
 						immediate_attach, NULL);
 	if (IS_ERR(hwpt_paging)) {
 		destroy_hwpt = ERR_CAST(hwpt_paging);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 6fc848d3ef47..7787d0931761 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -111,7 +111,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 {
 	const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT |
 				IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
-				IOMMU_HWPT_FAULT_ID_VALID;
+				IOMMU_HWPT_FAULT_ID_VALID |
+				IOMMU_HWPT_ALLOC_PASID;
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
 	struct iommufd_hwpt_paging *hwpt_paging;
 	struct iommufd_hw_pagetable *hwpt;
@@ -228,7 +229,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+	if ((flags & ~(IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID)) ||
 	    !user_data->len || !ops->domain_alloc_nested)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (parent->auto_domain || !parent->nest_parent ||
@@ -283,7 +284,7 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if (flags & ~IOMMU_HWPT_FAULT_ID_VALID)
+	if (flags & ~(IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID))
 		return ERR_PTR(-EOPNOTSUPP);
 	if (!user_data->len)
 		return ERR_PTR(-EOPNOTSUPP);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 11/14] iommufd/selftest: Add set_dev_pasid in mock iommu
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (9 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 10/14] iommufd: Allow allocating PASID-compatible domain Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-19 13:27 ` [PATCH v6 12/14] iommufd/selftest: Add a helper to get test device Yi Liu
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

The callback is needed to make pasid_attach/detach path complete for mock
device. A nop is enough for set_dev_pasid.

A MOCK_FLAGS_DEVICE_PASID is added to indicate a pasid-capable mock device
for the pasid test cases. Other test cases will still create a non-pasid
mock device. While the mock iommu always pretends to be pasid-capable.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  1 +
 drivers/iommu/iommufd/selftest.c     | 45 +++++++++++++++++++++++++---
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index a6b7a163f636..bdc979557272 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -48,6 +48,7 @@ enum {
 enum {
 	MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
 	MOCK_FLAGS_DEVICE_HUGE_IOVA = 1 << 1,
+	MOCK_FLAGS_DEVICE_PASID = 1 << 2,
 };
 
 enum {
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d40deb0a4f06..3f49ef72ba91 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -200,8 +200,16 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
 	return 0;
 }
 
+static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
+					 struct device *dev, ioasid_t pasid,
+					 struct iommu_domain *old)
+{
+	return 0;
+}
+
 static const struct iommu_domain_ops mock_blocking_ops = {
 	.attach_dev = mock_domain_nop_attach,
+	.set_dev_pasid = mock_domain_set_dev_pasid_nop
 };
 
 static struct iommu_domain mock_blocking_domain = {
@@ -343,11 +351,15 @@ mock_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	struct mock_iommu_domain_nested *mock_nested;
 	struct mock_iommu_domain *mock_parent;
 
-	if (flags)
+	if (flags & ~IOMMU_HWPT_ALLOC_PASID)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (!parent || parent->ops != mock_ops.default_domain_ops)
 		return ERR_PTR(-EINVAL);
 
+	if ((flags & IOMMU_HWPT_ALLOC_PASID) &&
+	    !dev->iommu->iommu_dev->max_pasids)
+		return ERR_PTR(-EOPNOTSUPP);
+
 	mock_parent = to_mock_domain(parent);
 	if (!mock_parent)
 		return ERR_PTR(-EINVAL);
@@ -365,7 +377,8 @@ mock_domain_alloc_paging_flags(struct device *dev, u32 flags,
 {
 	bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
 	const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
-				 IOMMU_HWPT_ALLOC_NEST_PARENT;
+				 IOMMU_HWPT_ALLOC_NEST_PARENT |
+				 IOMMU_HWPT_ALLOC_PASID;
 	struct mock_dev *mdev = to_mock_dev(dev);
 	bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY;
 	struct mock_iommu_domain *mock;
@@ -375,6 +388,10 @@ mock_domain_alloc_paging_flags(struct device *dev, u32 flags,
 	if ((flags & ~PAGING_FLAGS) || (has_dirty_flag && no_dirty_ops))
 		return ERR_PTR(-EOPNOTSUPP);
 
+	if ((flags & IOMMU_HWPT_ALLOC_PASID) &&
+	    !dev->iommu->iommu_dev->max_pasids)
+		return ERR_PTR(-EOPNOTSUPP);
+
 	mock = kzalloc(sizeof(*mock), GFP_KERNEL);
 	if (!mock)
 		return ERR_PTR(-ENOMEM);
@@ -585,7 +602,11 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
 	struct mock_viommu *mock_viommu = to_mock_viommu(viommu);
 	struct mock_iommu_domain_nested *mock_nested;
 
-	if (flags)
+	if (flags & ~IOMMU_HWPT_ALLOC_PASID)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	if ((flags & IOMMU_HWPT_ALLOC_PASID) &&
+	    !mock_viommu->core.iommu_dev->max_pasids)
 		return ERR_PTR(-EOPNOTSUPP);
 
 	mock_nested = __mock_domain_alloc_nested(user_data);
@@ -720,6 +741,7 @@ static const struct iommu_ops mock_ops = {
 			.map_pages = mock_domain_map_pages,
 			.unmap_pages = mock_domain_unmap_pages,
 			.iova_to_phys = mock_domain_iova_to_phys,
+			.set_dev_pasid = mock_domain_set_dev_pasid_nop,
 		},
 };
 
@@ -780,6 +802,7 @@ static struct iommu_domain_ops domain_nested_ops = {
 	.free = mock_domain_free_nested,
 	.attach_dev = mock_domain_nop_attach,
 	.cache_invalidate_user = mock_domain_cache_invalidate_user,
+	.set_dev_pasid = mock_domain_set_dev_pasid_nop,
 };
 
 static inline struct iommufd_hw_pagetable *
@@ -839,11 +862,16 @@ static void mock_dev_release(struct device *dev)
 
 static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 {
+	struct property_entry prop[] = {
+		PROPERTY_ENTRY_U32("pasid-num-bits", 20),
+		{},
+	};
 	struct mock_dev *mdev;
 	int rc, i;
 
 	if (dev_flags &
-	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
+	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY |
+		    MOCK_FLAGS_DEVICE_HUGE_IOVA | MOCK_FLAGS_DEVICE_PASID))
 		return ERR_PTR(-EINVAL);
 
 	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
@@ -866,6 +894,14 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 	if (rc)
 		goto err_put;
 
+	if (dev_flags & MOCK_FLAGS_DEVICE_PASID) {
+		rc = device_create_managed_software_node(&mdev->dev, prop, NULL);
+		if (rc) {
+			dev_err(&mdev->dev, "add pasid-num-bits property failed, rc: %d", rc);
+			goto err_put;
+		}
+	}
+
 	rc = device_add(&mdev->dev);
 	if (rc)
 		goto err_put;
@@ -1724,6 +1760,7 @@ int __init iommufd_test_init(void)
 	init_completion(&mock_iommu.complete);
 
 	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+	mock_iommu.iommu_dev.max_pasids = (1 << 20);
 
 	return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 12/14] iommufd/selftest: Add a helper to get test device
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (10 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 11/14] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-19 13:27 ` [PATCH v6 13/14] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
  2024-12-19 13:27 ` [PATCH v6 14/14] iommufd/selftest: Add coverage for iommufd " Yi Liu
  13 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

There is need to get the selftest device (sobj->type == TYPE_IDEV) in
multiple places, so have a helper to for it.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/selftest.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 3f49ef72ba91..515435640e35 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -982,29 +982,39 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
-/* Replace the mock domain with a manually allocated hw_pagetable */
-static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
-					    unsigned int device_id, u32 pt_id,
-					    struct iommu_test_cmd *cmd)
+static struct selftest_obj *
+iommufd_test_get_self_test_device(struct iommufd_ctx *ictx, u32 id)
 {
 	struct iommufd_object *dev_obj;
 	struct selftest_obj *sobj;
-	int rc;
 
 	/*
 	 * Prefer to use the OBJ_SELFTEST because the destroy_rwsem will ensure
 	 * it doesn't race with detach, which is not allowed.
 	 */
-	dev_obj =
-		iommufd_get_object(ucmd->ictx, device_id, IOMMUFD_OBJ_SELFTEST);
+	dev_obj = iommufd_get_object(ictx, id, IOMMUFD_OBJ_SELFTEST);
 	if (IS_ERR(dev_obj))
-		return PTR_ERR(dev_obj);
+		return ERR_CAST(dev_obj);
 
 	sobj = to_selftest_obj(dev_obj);
 	if (sobj->type != TYPE_IDEV) {
-		rc = -EINVAL;
-		goto out_dev_obj;
+		iommufd_put_object(ictx, dev_obj);
+		return ERR_PTR(-EINVAL);
 	}
+	return sobj;
+}
+
+/* Replace the mock domain with a manually allocated hw_pagetable */
+static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
+					    unsigned int device_id, u32 pt_id,
+					    struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+	int rc;
+
+	sobj = iommufd_test_get_self_test_device(ucmd->ictx, device_id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
 
 	rc = iommufd_device_replace(sobj->idev.idev, &pt_id);
 	if (rc)
@@ -1014,7 +1024,7 @@ static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 
 out_dev_obj:
-	iommufd_put_object(ucmd->ictx, dev_obj);
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
 	return rc;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 13/14] iommufd/selftest: Add test ops to test pasid attach/detach
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (11 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 12/14] iommufd/selftest: Add a helper to get test device Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  2024-12-19 13:27 ` [PATCH v6 14/14] iommufd/selftest: Add coverage for iommufd " Yi Liu
  13 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

This adds 4 test ops for pasid attach/replace/detach testing. There are
ops to attach/detach pasid, and also op to check the attached domain of
a pasid.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  30 ++++++
 drivers/iommu/iommufd/selftest.c     | 154 +++++++++++++++++++++++++++
 2 files changed, 184 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index bdc979557272..b3af7db97bc0 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -24,6 +24,10 @@ enum {
 	IOMMU_TEST_OP_MD_CHECK_IOTLB,
 	IOMMU_TEST_OP_TRIGGER_IOPF,
 	IOMMU_TEST_OP_DEV_CHECK_CACHE,
+	IOMMU_TEST_OP_PASID_ATTACH,
+	IOMMU_TEST_OP_PASID_REPLACE,
+	IOMMU_TEST_OP_PASID_DETACH,
+	IOMMU_TEST_OP_PASID_CHECK_DOMAIN,
 };
 
 enum {
@@ -146,6 +150,32 @@ struct iommu_test_cmd {
 			__u32 id;
 			__u32 cache;
 		} check_dev_cache;
+		struct {
+			__u32 pasid;
+			__u32 pt_id;
+			/* @id is stdev_id for IOMMU_TEST_OP_PASID_ATTACH
+			 * pasid#1024 is for special test, avoid use it
+			 * in normal case.
+			 */
+		} pasid_attach;
+		struct {
+			__u32 pasid;
+			__u32 pt_id;
+			/* @id is stdev_id for IOMMU_TEST_OP_PASID_ATTACH
+			 * pasid#1024 is for special test, avoid use it
+			 * in normal case.
+			 */
+		} pasid_replace;
+		struct {
+			__u32 pasid;
+			/* @id is stdev_id for IOMMU_TEST_OP_PASID_DETACH */
+		} pasid_detach;
+		struct {
+			__u32 pasid;
+			__u32 hwpt_id;
+			__u64 out_result_ptr;
+			/* @id is stdev_id for IOMMU_TEST_OP_HWPT_GET_DOMAIN */
+		} pasid_check;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 515435640e35..289221c7b3b5 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -200,10 +200,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
 	return 0;
 }
 
+static bool pasid_1024_attached;
+
 static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
 					 struct device *dev, ioasid_t pasid,
 					 struct iommu_domain *old)
 {
+	/*
+	 * First attach with pasid 1024 succ, second attach would fail.
+	 * This is helpful to test the case in which the iommu core needs
+	 * to rollback to old domain due to driver failure.
+	 */
+	if (pasid == 1024) {
+		if (domain->type == IOMMU_DOMAIN_BLOCKED) {
+			pasid_1024_attached = false;
+		} else if (pasid_1024_attached) {
+			pasid_1024_attached = false;
+			// Fake an error to fail the replacement
+			return -ENOMEM;
+		} else {
+			pasid_1024_attached = true;
+		}
+	}
+
 	return 0;
 }
 
@@ -1643,6 +1662,132 @@ static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
 	return 0;
 }
 
+static int iommufd_test_pasid_attach(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+	int rc;
+
+	sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	rc = iommufd_device_pasid_attach(sobj->idev.idev,
+					 cmd->pasid_attach.pasid,
+					 &cmd->pasid_attach.pt_id);
+	if (rc)
+		goto out_dev_obj;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		iommufd_device_pasid_detach(sobj->idev.idev,
+					    cmd->pasid_attach.pasid);
+
+out_dev_obj:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return rc;
+}
+
+static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd,
+				      struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+	int rc;
+
+	sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	rc = iommufd_device_pasid_replace(sobj->idev.idev,
+					  cmd->pasid_attach.pasid,
+					  &cmd->pasid_attach.pt_id);
+	if (rc)
+		goto out_dev_obj;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+
+out_dev_obj:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return rc;
+}
+
+static int iommufd_test_pasid_detach(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+
+	sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	iommufd_device_pasid_detach(sobj->idev.idev,
+				    cmd->pasid_detach.pasid);
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return 0;
+}
+
+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+	struct iommufd_object *pt_obj;
+
+	pt_obj = iommufd_get_object(ucmd->ictx, id, IOMMUFD_OBJ_ANY);
+	if (IS_ERR(pt_obj))
+		return ERR_CAST(pt_obj);
+
+	if (pt_obj->type != IOMMUFD_OBJ_HWPT_NESTED &&
+	    pt_obj->type != IOMMUFD_OBJ_HWPT_PAGING) {
+		iommufd_put_object(ucmd->ictx, pt_obj);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+}
+
+static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
+					   struct iommu_test_cmd *cmd)
+{
+	struct iommu_domain *attached_domain, *expect_domain = NULL;
+	struct iommufd_hw_pagetable *hwpt = NULL;
+	struct iommu_attach_handle *handle;
+	struct selftest_obj *sobj;
+	struct mock_dev *mdev;
+	bool result;
+	int rc = 0;
+
+	sobj = iommufd_test_get_self_test_device(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	mdev = sobj->idev.mock_dev;
+
+	handle = iommu_attach_handle_get(mdev->dev.iommu_group,
+					 cmd->pasid_check.pasid, 0);
+	if (IS_ERR(handle))
+		attached_domain = NULL;
+	else
+		attached_domain = handle->domain;
+
+	if (cmd->pasid_check.hwpt_id) {
+		hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
+		if (IS_ERR(hwpt)) {
+			rc = PTR_ERR(hwpt);
+			goto out_put_dev;
+		}
+		expect_domain = hwpt->domain;
+	}
+
+	result = (attached_domain == expect_domain) ? 1 : 0;
+	if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr),
+			 &result, sizeof(result)))
+		rc = -EFAULT;
+	if (hwpt)
+		iommufd_put_object(ucmd->ictx, &hwpt->obj);
+out_put_dev:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return rc;
+}
+
 void iommufd_selftest_destroy(struct iommufd_object *obj)
 {
 	struct selftest_obj *sobj = to_selftest_obj(obj);
@@ -1724,6 +1869,14 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 					  cmd->dirty.flags);
 	case IOMMU_TEST_OP_TRIGGER_IOPF:
 		return iommufd_test_trigger_iopf(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_ATTACH:
+		return iommufd_test_pasid_attach(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_REPLACE:
+		return iommufd_test_pasid_replace(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_DETACH:
+		return iommufd_test_pasid_detach(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_CHECK_DOMAIN:
+		return iommufd_test_pasid_check_domain(ucmd, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -1771,6 +1924,7 @@ int __init iommufd_test_init(void)
 
 	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
 	mock_iommu.iommu_dev.max_pasids = (1 << 20);
+	pasid_1024_attached = false;
 
 	return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v6 14/14] iommufd/selftest: Add coverage for iommufd pasid attach/detach
  2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
                   ` (12 preceding siblings ...)
  2024-12-19 13:27 ` [PATCH v6 13/14] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
@ 2024-12-19 13:27 ` Yi Liu
  13 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-19 13:27 UTC (permalink / raw)
  To: joro, jgg, kevin.tian, baolu.lu
  Cc: eric.auger, nicolinc, chao.p.peng, yi.l.liu, iommu, vasant.hegde,
	will

This tests iommufd pasid attach/replace/detach.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 348 ++++++++++++++++++
 .../selftests/iommu/iommufd_fail_nth.c        |  39 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 102 +++++
 3 files changed, 482 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index a1b2b657999d..575a9289dfdb 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2956,4 +2956,352 @@ TEST_F(iommufd_viommu, vdevice_cache)
 	}
 }
 
+FIXTURE(iommufd_device_pasid)
+{
+	int fd;
+	uint32_t ioas_id;
+	uint32_t hwpt_id;
+	uint32_t stdev_id;
+	uint32_t device_id;
+	uint32_t no_pasid_stdev_id;
+	uint32_t no_pasid_device_id;
+};
+
+FIXTURE_VARIANT(iommufd_device_pasid)
+{
+	bool pasid_capable;
+};
+
+FIXTURE_SETUP(iommufd_device_pasid)
+{
+	self->fd = open("/dev/iommu", O_RDWR);
+	ASSERT_NE(-1, self->fd);
+	test_ioctl_ioas_alloc(&self->ioas_id);
+
+	test_cmd_mock_domain_flags(self->ioas_id,
+				   MOCK_FLAGS_DEVICE_PASID,
+				   &self->stdev_id, &self->hwpt_id,
+				   &self->device_id);
+	if (!variant->pasid_capable)
+		test_cmd_mock_domain_flags(self->ioas_id, 0,
+					   &self->no_pasid_stdev_id, NULL,
+					   &self->no_pasid_device_id);
+}
+
+FIXTURE_TEARDOWN(iommufd_device_pasid)
+{
+	teardown_iommufd(self->fd, _metadata);
+}
+
+FIXTURE_VARIANT_ADD(iommufd_device_pasid, no_pasid)
+{
+	.pasid_capable = false,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_device_pasid, has_pasid)
+{
+	.pasid_capable = true,
+};
+
+TEST_F(iommufd_device_pasid, pasid_attach)
+{
+	struct iommu_hwpt_selftest data = {
+		.iotlb =  IOMMU_TEST_IOTLB_DEFAULT,
+	};
+	uint32_t nested_hwpt_id[3] = {};
+	uint32_t parent_hwpt_id = 0;
+	uint32_t fault_id, fault_fd;
+	uint32_t s2_hwpt_id = 0;
+	uint32_t iopf_hwpt_id;
+	uint32_t pasid = 100;
+	uint32_t auto_hwpt;
+	uint32_t viommu_id;
+	bool result;
+
+	/* Allocate two nested hwpts sharing one common parent hwpt */
+	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+			    IOMMU_HWPT_ALLOC_NEST_PARENT,
+			    &parent_hwpt_id);
+	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+				   IOMMU_HWPT_ALLOC_PASID,
+				   &nested_hwpt_id[0],
+				   IOMMU_HWPT_DATA_SELFTEST,
+				   &data, sizeof(data));
+	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+				   IOMMU_HWPT_ALLOC_PASID,
+				   &nested_hwpt_id[1],
+				   IOMMU_HWPT_DATA_SELFTEST,
+				   &data, sizeof(data));
+
+	/* Faulte related preparation */
+	test_ioctl_fault_alloc(&fault_id, &fault_fd);
+	test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+				 IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID,
+				 &iopf_hwpt_id,
+				 IOMMU_HWPT_DATA_SELFTEST, &data,
+				 sizeof(data));
+
+	/* Allocate a regular nested hwpt based on viommu */
+	test_cmd_viommu_alloc(self->device_id, parent_hwpt_id,
+			      IOMMU_VIOMMU_TYPE_SELFTEST,
+			      &viommu_id);
+	test_cmd_hwpt_alloc_nested(self->device_id, viommu_id,
+				   IOMMU_HWPT_ALLOC_PASID,
+				   &nested_hwpt_id[2],
+				   IOMMU_HWPT_DATA_SELFTEST, &data,
+				   sizeof(data));
+
+	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+			    IOMMU_HWPT_ALLOC_PASID,
+			    &s2_hwpt_id);
+
+	/* Attach RID to non-pasid compat domain, */
+	test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
+	/* then attach to pasid should fail */
+	test_err_pasid_attach(EINVAL, pasid, s2_hwpt_id, NULL);
+
+	/* Attach RID to pasid compat domain, */
+	test_cmd_mock_domain_replace(self->stdev_id, s2_hwpt_id);
+	/* then attach to pasid should succeed, */
+	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
+	/* but attach RID to non-pasid compat domain should fail now. */
+	test_err_mock_domain_replace(EINVAL, self->stdev_id, parent_hwpt_id);
+	test_cmd_pasid_detach(pasid);
+
+	if (!variant->pasid_capable) {
+		/*
+		 * PASID-compatible domain can be used by non-PASID-capable
+		 * device.
+		 */
+		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, nested_hwpt_id[0]);
+		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, self->ioas_id);
+		/*
+		 * Attach hwpt to pasid#100 of non-PASID-capable device,
+		 * should fail, no matter domain is pasid-comapt or not.
+		 */
+		EXPECT_ERRNO(EINVAL,
+			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
+						    pasid, parent_hwpt_id, NULL));
+		EXPECT_ERRNO(EINVAL,
+			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
+						    pasid, s2_hwpt_id, NULL));
+	}
+
+	/*
+	 * Attach non pasid compat hwpt to pasid-capable device, should
+	 * fail, and have null domain.
+	 */
+	test_err_pasid_attach(EINVAL, pasid, parent_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Attach ioas to pasid 100, should succeed, domain should
+	 * be valid.
+	 */
+	test_cmd_pasid_attach(pasid, self->ioas_id, &auto_hwpt);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, auto_hwpt, &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Attach same ioas to pasid 100, should succeed.
+	 */
+	test_cmd_pasid_attach(pasid, self->ioas_id, &auto_hwpt);
+
+	/*
+	 * Try attach pasid 100 with another hwpt, should FAIL
+	 * as attach does not allow overwrite, use REPLACE instead.
+	 */
+	test_err_pasid_attach(EINVAL, pasid, nested_hwpt_id[0], NULL);
+
+	/*
+	 * Detach hwpt from pasid 100, and check if the pasid 100
+	 * has null domain. Should be done before the next attach.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Attach nested hwpt to pasid 100, should succeed, domain
+	 * should be valid.
+	 */
+	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, nested_hwpt_id[0],
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Try attach pasid 100 to same nested_hwpt_id[0], should succeed.
+	 */
+	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
+
+	/*
+	 * Detach hwpt from pasid 100, and check if the pasid 100
+	 * has null domain
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/* Replace tests */
+
+	pasid = 200;
+	/*
+	 * Replace pasid 200 without attaching it first, should
+	 * fail with -EINVAL.
+	 */
+	test_err_cmd_pasid_replace(EINVAL, pasid, s2_hwpt_id, NULL);
+
+	/*
+	 * Attach a s2 hwpt to pasid 200, should succeed, domain should
+	 * be valid.
+	 */
+	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace pasid 200 with self->ioas_id, should succeed,
+	 * and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, self->ioas_id, &auto_hwpt);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, auto_hwpt,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace a nested hwpt for pasid 200, should succeed,
+	 * and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, nested_hwpt_id[0], NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, nested_hwpt_id[0],
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace with another nested hwpt for pasid 200, should
+	 * succeed, and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, nested_hwpt_id[1], NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, nested_hwpt_id[1],
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Detach hwpt from pasid 200, and check if the pasid 200
+	 * has null domain.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/* Negative Tests for pasid replace, use pasid 1024 */
+
+	/*
+	 * Attach a s2 hwpt to pasid 1024, should succeed, domain should
+	 * be valid.
+	 */
+	pasid = 1024;
+	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace pasid 1024 with self->ioas_id, should fail,
+	 * but have the old valid domain. This is a designed
+	 * negative case, normally replace with self->ioas_id
+	 * could succeed.
+	 */
+	test_err_cmd_pasid_replace(ENOMEM, pasid, self->ioas_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Detach hwpt from pasid 1024, and check if the pasid 1024
+	 * has null domain.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/* Attach to iopf-capable hwpt */
+
+	/*
+	 * Attach an iopf hwpt to pasid 2048, should succeed, domain should
+	 * be valid.
+	 */
+	pasid = 2048;
+	test_cmd_pasid_attach(pasid, iopf_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, iopf_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace with s2_hwpt_id for pasid 2048, should
+	 * succeed, and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, s2_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Detach hwpt from pasid 2048, and check if the pasid 2048
+	 * has null domain.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	test_ioctl_destroy(iopf_hwpt_id);
+	close(fault_fd);
+	test_ioctl_destroy(fault_id);
+
+	/* Detach the s2_hwpt_id from RID */
+	test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+
+	test_ioctl_destroy(nested_hwpt_id[0]);
+	test_ioctl_destroy(nested_hwpt_id[1]);
+	test_ioctl_destroy(nested_hwpt_id[2]);
+	test_ioctl_destroy(viommu_id);
+	test_ioctl_destroy(parent_hwpt_id);
+	test_ioctl_destroy(s2_hwpt_id);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 64b1f8e1b0cf..c9580724ca9c 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -209,12 +209,16 @@ FIXTURE(basic_fail_nth)
 {
 	int fd;
 	uint32_t access_id;
+	uint32_t stdev_id;
+	uint32_t pasid;
 };
 
 FIXTURE_SETUP(basic_fail_nth)
 {
 	self->fd = -1;
 	self->access_id = 0;
+	self->stdev_id = 0;
+	self->pasid = 0; //test should use a non-zero value
 }
 
 FIXTURE_TEARDOWN(basic_fail_nth)
@@ -226,6 +230,8 @@ FIXTURE_TEARDOWN(basic_fail_nth)
 		rc = _test_cmd_destroy_access(self->access_id);
 		assert(rc == 0);
 	}
+	if (self->pasid && self->stdev_id)
+		_test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid);
 	teardown_iommufd(self->fd, _metadata);
 }
 
@@ -623,7 +629,6 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	uint32_t fault_hwpt_id;
 	uint32_t ioas_id;
 	uint32_t ioas_id2;
-	uint32_t stdev_id;
 	uint32_t idev_id;
 	uint32_t hwpt_id;
 	uint32_t viommu_id;
@@ -654,25 +659,29 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
-				  &idev_id))
+	if (_test_cmd_mock_domain_flags(self->fd, ioas_id,
+					MOCK_FLAGS_DEVICE_PASID,
+					&self->stdev_id, NULL, &idev_id))
 		return -1;
 
 	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0,
+				 IOMMU_HWPT_ALLOC_PASID, &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
-	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
+	if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, ioas_id2, NULL))
 		return -1;
 
-	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL))
+	if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, hwpt_id, NULL))
 		return -1;
 
 	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0,
-				 IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id,
+				 IOMMU_HWPT_ALLOC_NEST_PARENT |
+						IOMMU_HWPT_ALLOC_PASID,
+				 &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
@@ -692,6 +701,22 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 				 IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)))
 		return -1;
 
+	self->pasid = 200;
+
+	/* Tests for pasid attach/replace/detach */
+	if (_test_cmd_pasid_attach(self->fd, self->stdev_id,
+				   self->pasid, ioas_id, NULL)) {
+		self->pasid = 0;
+		return -1;
+	}
+
+	_test_cmd_pasid_replace(self->fd, self->stdev_id, self->pasid, ioas_id2, NULL);
+
+	if (_test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid))
+		return -1;
+
+	self->pasid = 0;
+
 	return 0;
 }
 
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index d979f5b0efe8..523ff28e4bc9 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -936,3 +936,105 @@ static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id,
 	EXPECT_ERRNO(_errno,                                                 \
 		     _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id,   \
 					     virt_id, vdev_id))
+
+static int _test_cmd_pasid_attach(int fd, __u32 stdev_id, __u32 pasid,
+				  __u32 pt_id, __u32 *out_pt_id)
+{
+	struct iommu_test_cmd test_attach = {
+		.size = sizeof(test_attach),
+		.op = IOMMU_TEST_OP_PASID_ATTACH,
+		.id = stdev_id,
+		.pasid_attach = {
+			.pasid = pasid,
+			.pt_id = pt_id,
+		},
+	};
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_ATTACH),
+		    &test_attach);
+	if (ret)
+		return ret;
+
+	if (out_pt_id)
+		*out_pt_id = test_attach.pasid_attach.pt_id;
+	return 0;
+}
+
+#define test_cmd_pasid_attach(pasid, hwpt_id, out_pt_id) \
+	ASSERT_EQ(0, _test_cmd_pasid_attach(self->fd, self->stdev_id, \
+					    pasid, hwpt_id, out_pt_id))
+
+#define test_err_pasid_attach(_errno, pasid, hwpt_id, out_pt_id) \
+	EXPECT_ERRNO(_errno, \
+		     _test_cmd_pasid_attach(self->fd, self->stdev_id, \
+					    pasid, hwpt_id, out_pt_id))
+
+static int _test_cmd_pasid_replace(int fd, __u32 stdev_id, __u32 pasid,
+				   __u32 pt_id, __u32 *out_pt_id)
+{
+	struct iommu_test_cmd test_replace = {
+		.size = sizeof(test_replace),
+		.op = IOMMU_TEST_OP_PASID_REPLACE,
+		.id = stdev_id,
+		.pasid_replace = {
+			.pasid = pasid,
+			.pt_id = pt_id,
+		},
+	};
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_REPLACE),
+		    &test_replace);
+	if (ret)
+		return ret;
+
+	if (out_pt_id)
+		*out_pt_id = test_replace.pasid_replace.pt_id;
+	return 0;
+}
+
+#define test_cmd_pasid_replace(pasid, hwpt_id, out_pt_id) \
+	ASSERT_EQ(0, _test_cmd_pasid_replace(self->fd, self->stdev_id, \
+					     pasid, hwpt_id, out_pt_id))
+
+#define test_err_cmd_pasid_replace(_errno, pasid, hwpt_id, out_pt_id) \
+	EXPECT_ERRNO(_errno, \
+		     _test_cmd_pasid_replace(self->fd, self->stdev_id, \
+					     pasid, hwpt_id, out_pt_id))
+
+static int _test_cmd_pasid_detach(int fd, __u32 stdev_id, __u32 pasid)
+{
+	struct iommu_test_cmd test_detach = {
+		.size = sizeof(test_detach),
+		.op = IOMMU_TEST_OP_PASID_DETACH,
+		.id = stdev_id,
+		.pasid_detach = {
+			.pasid = pasid,
+		},
+	};
+
+	return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_DETACH),
+		     &test_detach);
+}
+
+#define test_cmd_pasid_detach(pasid) \
+	ASSERT_EQ(0, _test_cmd_pasid_detach(self->fd, self->stdev_id, pasid))
+
+static int test_cmd_pasid_check_domain(int fd, __u32 stdev_id, __u32 pasid,
+				       __u32 hwpt_id, bool *result)
+{
+	struct iommu_test_cmd test_pasid_check = {
+		.size = sizeof(test_pasid_check),
+		.op = IOMMU_TEST_OP_PASID_CHECK_DOMAIN,
+		.id = stdev_id,
+		.pasid_check = {
+			.pasid = pasid,
+			.hwpt_id = hwpt_id,
+			.out_result_ptr = (__u64)result,
+		},
+	};
+
+	return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_CHECK_DOMAIN),
+		     &test_pasid_check);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
@ 2024-12-20  2:47   ` Baolu Lu
  2025-01-09  7:08   ` Tian, Kevin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 71+ messages in thread
From: Baolu Lu @ 2024-12-20  2:47 UTC (permalink / raw)
  To: Yi Liu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 12/19/24 21:27, Yi Liu wrote:
> +/**
> + * iommu_replace_device_pasid - Replace the domain that a pasid is attached to
> + * @domain: the new iommu domain
> + * @dev: the attached device.
> + * @pasid: the pasid of the device.
> + * @handle: the attach handle.
> + *
> + * This API allows the pasid to switch domains. Return 0 on success, or an
> + * error. The pasid will keep the old configuration if replacement failed.
> + * This is supposed to be used by iommufd, and iommufd can guarantee that
> + * both iommu_attach_device_pasid() and iommu_replace_device_pasid() would
> + * pass in a valid @handle.
> + */
> +int iommu_replace_device_pasid(struct iommu_domain *domain,
> +			       struct device *dev, ioasid_t pasid,
> +			       struct iommu_attach_handle *handle)
> +{
> +	/* Caller must be a probed driver on dev */
> +	struct iommu_group *group = dev->iommu_group;
> +	struct iommu_attach_handle *curr;
> +	int ret;
> +
> +	if (!group)
> +		return -ENODEV;
> +
> +	if (!domain->ops->set_dev_pasid)
> +		return -EOPNOTSUPP;
> +
> +	if (dev_iommu_ops(dev) != domain->owner ||
> +	    pasid == IOMMU_NO_PASID || !handle)
> +		return -EINVAL;
> +
> +	handle->domain = domain;
> +
> +	mutex_lock(&group->mutex);
> +	/*
> +	 * The iommu_attach_handle of the pasid becomes inconsistent with the
> +	 * actual handle per the below operation. The concurrent PRI path will
> +	 * deliver the PRQs per the new handle, this does not have a functional
> +	 * impact. The PRI path would eventually become consistent when the
> +	 * replacement is done.
> +	 */
> +	curr = (struct iommu_attach_handle *)xa_store(&group->pasid_array,
> +						      pasid, handle,
> +						      GFP_KERNEL);

The pointer type cast seems unnecessary.

> +	if (!curr) {
> +		xa_erase(&group->pasid_array, pasid);
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	ret = xa_err(curr);
> +	if (ret)
> +		goto out_unlock;

xa_store() returns either the old pointer or an error code. It's better
to handle error code first, as shown below:

	/* Failed to store the new pointer: */
	if (xa_is_err(curr)) {
		ret = xa_err(curr);
		goto out_unlock;
	}

	/* Not a replace case: */
	if (!curr) {
		xa_erase(&group->pasid_array, pasid);
		ret = -EINVAL;
		goto out_unlock;
	}

Your code is already functional, just suggest making it more readable.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c
  2024-12-19 13:27 ` [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c Yi Liu
@ 2024-12-20  3:31   ` Baolu Lu
  2024-12-20  6:34     ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Baolu Lu @ 2024-12-20  3:31 UTC (permalink / raw)
  To: Yi Liu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 12/19/24 21:27, Yi Liu wrote:
> The iommu_attach_handle is now only passed when attaching iopf-capable
> domain, while it is not convenient for the iommu core to track the
> attached domain of pasids. To address it, the iommu_attach_handle will
> be passed to iommu core for non-fault-able domain as well. Hence the
> iommufd_handle related helpers are no longer fault specific, it makes
> more sense to move it out of fault.c.
> 
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/device.c          | 62 +++++++++++++++++++++++++
>   drivers/iommu/iommufd/fault.c           | 56 +---------------------
>   drivers/iommu/iommufd/iommufd_private.h |  8 ++++
>   3 files changed, 72 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index dfd0898fb6c1..0e1baf84e887 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -293,6 +293,68 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
>   }
>   EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
>   
> +/**
> + * iommufd_device_get_attach_handle - Return the attach handle for the RID
> + *
> + * @idev: The device to get attach_handle
> + *
> + * Currently there is no locking to synchronize threads that access the
> + * returned handle with those attaching or replacing the domain which might
> + * change the handle. It's caller's duty to guarantee no use-after-free.

It's better to make "It's caller's duty to guarantee no use-after-free"
more specific. Something like, the caller is responsible for ensuring
that the returned pointer is not used after the domain is removed from
the device's RID.

> + *
> + * Return valid attach_handle if there is, otherwise NULL.
> + */
> +struct iommufd_attach_handle *
> +iommufd_device_get_attach_handle(struct iommufd_device *idev)
> +{
> +	struct iommu_attach_handle *handle;
> +
> +	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
> +	if (IS_ERR(handle))
> +		return NULL;
> +
> +	return to_iommufd_handle(handle);
> +}
> +
> +int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
> +			      struct iommufd_device *idev)
> +{
> +	struct iommufd_attach_handle *handle;
> +	int ret;
> +
> +	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
> +	if (!handle)
> +		return -ENOMEM;
> +
> +	handle->idev = idev;
> +	ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
> +					&handle->handle);
> +	if (ret)
> +		kfree(handle);
> +
> +	return ret;
> +}
> +
> +int iommufd_dev_replace_handle(struct iommufd_device *idev,
> +			       struct iommufd_hw_pagetable *hwpt,
> +			       struct iommufd_hw_pagetable *old)
> +{
> +	struct iommufd_attach_handle *handle;
> +	int ret;
> +
> +	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
> +	if (!handle)
> +		return -ENOMEM;
> +
> +	handle->idev = idev;
> +	ret = iommu_replace_group_handle(idev->igroup->group,
> +					 hwpt->domain, &handle->handle);
> +	if (ret)
> +		kfree(handle);
> +
> +	return ret;
> +}

Where will the old handle be freed? It seems unreasonable to allocate
the handle in these helper functions, only to have it freed by callers
in other files.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2024-12-19 13:27 ` [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core Yi Liu
@ 2024-12-20  4:35   ` Nicolin Chen
  2024-12-20  6:40     ` Yi Liu
  2025-01-09  7:44   ` Tian, Kevin
  1 sibling, 1 reply; 71+ messages in thread
From: Nicolin Chen @ 2024-12-20  4:35 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, jgg, kevin.tian, baolu.lu, eric.auger, chao.p.peng, iommu,
	vasant.hegde, will

Hi Yi,

On Thu, Dec 19, 2024 at 05:27:36AM -0800, Yi Liu wrote:
> The iommu_attach_handle is optional in the RID attach/replace API and the
> PASID attach APIs. But it is a mandatory argument for the PASID replace API.
> Without it, the PASID replace path cannot get the old domain. Hence, the
> PASID path (attach/replace) requires the attach handle. As iommufd is the
> major user of the RID attach/replace with iommu_attach_handle, this also
> makes the iommufd always pass the attach handle for the RID path as well.
> This keeps the RID and PASID path much aligned.

> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index d5c83f10d83e..6cf9c1f10e85 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -493,28 +493,42 @@ static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
>  	if (hwpt->fault)
>  		return iommufd_fault_domain_attach_dev(hwpt, idev);
>  
> -	return iommu_attach_group(hwpt->domain, idev->igroup->group);
> +	return iommufd_dev_attach_handle(hwpt, idev);
>  }
>  
>  static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
>  					      struct iommufd_device *idev)
>  {
> +	struct iommufd_attach_handle *handle;
> +
>  	if (hwpt->fault) {
>  		iommufd_fault_domain_detach_dev(hwpt, idev);
>  		return;
>  	}
>  
> -	iommu_detach_group(hwpt->domain, idev->igroup->group);
> +	handle = iommufd_device_get_attach_handle(idev);
> +	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> +	kfree(handle);
>  }
>  
>  static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
>  					      struct iommufd_hw_pagetable *hwpt,
>  					      struct iommufd_hw_pagetable *old)
>  {
> +	struct iommufd_attach_handle *curr;
> +	int ret;
> +
>  	if (old->fault || hwpt->fault)
>  		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
>  
> -	return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
> +	curr = iommufd_device_get_attach_handle(idev);
> +
> +	ret = iommufd_dev_replace_handle(idev, hwpt, old);
> +	if (ret)
> +		return ret;

These inline functions feel heavier after this rework..

I actually have the same patch for a different reason, yet haven't
sent yet:
https://github.com/nicolinc/iommufd/commits/wip/iommufd_msi_p1-v1
   iommu: Turn fault_data to iommufd private pointer
 =>iommufd: Make attach_handle generic

I think we can align with each other on these functions with one
common patch: my series requires the fault_data and attach_handle
are common to all HWPTs, v.s. exclusive to an hwpt->fault.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c
  2024-12-20  3:31   ` Baolu Lu
@ 2024-12-20  6:34     ` Yi Liu
  0 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2024-12-20  6:34 UTC (permalink / raw)
  To: Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 2024/12/20 11:31, Baolu Lu wrote:
> On 12/19/24 21:27, Yi Liu wrote:
>> The iommu_attach_handle is now only passed when attaching iopf-capable
>> domain, while it is not convenient for the iommu core to track the
>> attached domain of pasids. To address it, the iommu_attach_handle will
>> be passed to iommu core for non-fault-able domain as well. Hence the
>> iommufd_handle related helpers are no longer fault specific, it makes
>> more sense to move it out of fault.c.
>>
>> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/iommufd/device.c          | 62 +++++++++++++++++++++++++
>>   drivers/iommu/iommufd/fault.c           | 56 +---------------------
>>   drivers/iommu/iommufd/iommufd_private.h |  8 ++++
>>   3 files changed, 72 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
>> index dfd0898fb6c1..0e1baf84e887 100644
>> --- a/drivers/iommu/iommufd/device.c
>> +++ b/drivers/iommu/iommufd/device.c
>> @@ -293,6 +293,68 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
>>   }
>>   EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
>> +/**
>> + * iommufd_device_get_attach_handle - Return the attach handle for the RID
>> + *
>> + * @idev: The device to get attach_handle
>> + *
>> + * Currently there is no locking to synchronize threads that access the
>> + * returned handle with those attaching or replacing the domain which might
>> + * change the handle. It's caller's duty to guarantee no use-after-free.
> 
> It's better to make "It's caller's duty to guarantee no use-after-free"
> more specific. Something like, the caller is responsible for ensuring
> that the returned pointer is not used after the domain is removed from
> the device's RID.

ok.

>> + *
>> + * Return valid attach_handle if there is, otherwise NULL.
>> + */
>> +struct iommufd_attach_handle *
>> +iommufd_device_get_attach_handle(struct iommufd_device *idev)
>> +{
>> +    struct iommu_attach_handle *handle;
>> +
>> +    handle = iommu_attach_handle_get(idev->igroup->group, 
>> IOMMU_NO_PASID, 0);
>> +    if (IS_ERR(handle))
>> +        return NULL;
>> +
>> +    return to_iommufd_handle(handle);
>> +}
>> +
>> +int iommufd_dev_attach_handle(struct iommufd_hw_pagetable *hwpt,
>> +                  struct iommufd_device *idev)
>> +{
>> +    struct iommufd_attach_handle *handle;
>> +    int ret;
>> +
>> +    handle = kzalloc(sizeof(*handle), GFP_KERNEL);
>> +    if (!handle)
>> +        return -ENOMEM;
>> +
>> +    handle->idev = idev;
>> +    ret = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
>> +                    &handle->handle);
>> +    if (ret)
>> +        kfree(handle);
>> +
>> +    return ret;
>> +}
>> +
>> +int iommufd_dev_replace_handle(struct iommufd_device *idev,
>> +                   struct iommufd_hw_pagetable *hwpt,
>> +                   struct iommufd_hw_pagetable *old)
>> +{
>> +    struct iommufd_attach_handle *handle;
>> +    int ret;
>> +
>> +    handle = kzalloc(sizeof(*handle), GFP_KERNEL);
>> +    if (!handle)
>> +        return -ENOMEM;
>> +
>> +    handle->idev = idev;
>> +    ret = iommu_replace_group_handle(idev->igroup->group,
>> +                     hwpt->domain, &handle->handle);
>> +    if (ret)
>> +        kfree(handle);
>> +
>> +    return ret;
>> +}
> 
> Where will the old handle be freed? It seems unreasonable to allocate
> the handle in these helper functions, only to have it freed by callers
> in other files.

yes, it's in the caller side. See the below snippet in patch 02. Maybe
it's an over work to split this patch with patch 04. Merging them may
be helpful. Nic has a proposed patch as well for such purpose.

@@ -196,13 +187,24 @@ int iommufd_fault_domain_replace_dev(struct 
iommufd_device *idev,
  			return ret;
  	}

-	ret = __fault_domain_replace_dev(idev, hwpt, old);
+	curr = iommufd_device_get_attach_handle(idev);
+
+	if (hwpt->fault)
+		ret = __fault_domain_replace_dev(idev, hwpt, old);
+	else
+		ret = iommu_replace_group_handle(idev->igroup->group,
+						 hwpt->domain, NULL);
  	if (ret) {
  		if (iopf_on)
  			iommufd_fault_iopf_disable(idev);
  		return ret;
  	}

+	if (curr) {
+		iommufd_auto_response_faults(old, curr);
+		kfree(curr);
+	}
+


-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2024-12-20  4:35   ` Nicolin Chen
@ 2024-12-20  6:40     ` Yi Liu
  2024-12-20  6:58       ` Nicolin Chen
  0 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-20  6:40 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: joro, jgg, kevin.tian, baolu.lu, eric.auger, chao.p.peng, iommu,
	vasant.hegde, will

On 2024/12/20 12:35, Nicolin Chen wrote:
> Hi Yi,
> 
> On Thu, Dec 19, 2024 at 05:27:36AM -0800, Yi Liu wrote:
>> The iommu_attach_handle is optional in the RID attach/replace API and the
>> PASID attach APIs. But it is a mandatory argument for the PASID replace API.
>> Without it, the PASID replace path cannot get the old domain. Hence, the
>> PASID path (attach/replace) requires the attach handle. As iommufd is the
>> major user of the RID attach/replace with iommu_attach_handle, this also
>> makes the iommufd always pass the attach handle for the RID path as well.
>> This keeps the RID and PASID path much aligned.
> 
>> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
>> index d5c83f10d83e..6cf9c1f10e85 100644
>> --- a/drivers/iommu/iommufd/iommufd_private.h
>> +++ b/drivers/iommu/iommufd/iommufd_private.h
>> @@ -493,28 +493,42 @@ static inline int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
>>   	if (hwpt->fault)
>>   		return iommufd_fault_domain_attach_dev(hwpt, idev);
>>   
>> -	return iommu_attach_group(hwpt->domain, idev->igroup->group);
>> +	return iommufd_dev_attach_handle(hwpt, idev);
>>   }
>>   
>>   static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
>>   					      struct iommufd_device *idev)
>>   {
>> +	struct iommufd_attach_handle *handle;
>> +
>>   	if (hwpt->fault) {
>>   		iommufd_fault_domain_detach_dev(hwpt, idev);
>>   		return;
>>   	}
>>   
>> -	iommu_detach_group(hwpt->domain, idev->igroup->group);
>> +	handle = iommufd_device_get_attach_handle(idev);
>> +	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
>> +	kfree(handle);
>>   }
>>   
>>   static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
>>   					      struct iommufd_hw_pagetable *hwpt,
>>   					      struct iommufd_hw_pagetable *old)
>>   {
>> +	struct iommufd_attach_handle *curr;
>> +	int ret;
>> +
>>   	if (old->fault || hwpt->fault)
>>   		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
>>   
>> -	return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
>> +	curr = iommufd_device_get_attach_handle(idev);
>> +
>> +	ret = iommufd_dev_replace_handle(idev, hwpt, old);
>> +	if (ret)
>> +		return ret;
> 
> These inline functions feel heavier after this rework..
> 
> I actually have the same patch for a different reason, yet haven't
> sent yet:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd_msi_p1-v1
>     iommu: Turn fault_data to iommufd private pointer
>   =>iommufd: Make attach_handle generic
> 
> I think we can align with each other on these functions with one
> common patch: my series requires the fault_data and attach_handle
> are common to all HWPTs, v.s. exclusive to an hwpt->fault.

yeah, I'm ok to move the helpers to be in device.c to avoid inline.
I've three remark on your patch [1]. I'm now also considering to do
the work in one patch, hence it might be better for review w.r.t the
comment from Baolu [2].

[1] 
https://github.com/nicolinc/iommufd/commit/26f9a2a4a5e28574295ecfe057e949dcc9a58c7e#diff-c251419ce7f13e1e22e8fe08864a50d8541d6c9addaee27f806bf9786b4a4886R374
[2] 
https://lore.kernel.org/linux-iommu/a2f363e7-88e4-46ef-8755-8a5e0cc47ecc@linux.intel.com/

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2024-12-20  6:40     ` Yi Liu
@ 2024-12-20  6:58       ` Nicolin Chen
  0 siblings, 0 replies; 71+ messages in thread
From: Nicolin Chen @ 2024-12-20  6:58 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, jgg, kevin.tian, baolu.lu, eric.auger, chao.p.peng, iommu,
	vasant.hegde, will

On Fri, Dec 20, 2024 at 02:40:39PM +0800, Yi Liu wrote:
> > >   static inline void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
> > >   					      struct iommufd_device *idev)
> > >   {
> > > +	struct iommufd_attach_handle *handle;
> > > +
> > >   	if (hwpt->fault) {
> > >   		iommufd_fault_domain_detach_dev(hwpt, idev);
> > >   		return;
> > >   	}
> > > -	iommu_detach_group(hwpt->domain, idev->igroup->group);
> > > +	handle = iommufd_device_get_attach_handle(idev);
> > > +	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> > > +	kfree(handle);
> > >   }
> > >   static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
> > >   					      struct iommufd_hw_pagetable *hwpt,
> > >   					      struct iommufd_hw_pagetable *old)
> > >   {
> > > +	struct iommufd_attach_handle *curr;
> > > +	int ret;
> > > +
> > >   	if (old->fault || hwpt->fault)
> > >   		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
> > > -	return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
> > > +	curr = iommufd_device_get_attach_handle(idev);
> > > +
> > > +	ret = iommufd_dev_replace_handle(idev, hwpt, old);
> > > +	if (ret)
> > > +		return ret;
> > 
> > These inline functions feel heavier after this rework..
> > 
> > I actually have the same patch for a different reason, yet haven't
> > sent yet:
> > https://github.com/nicolinc/iommufd/commits/wip/iommufd_msi_p1-v1
> >     iommu: Turn fault_data to iommufd private pointer
> >   =>iommufd: Make attach_handle generic
> > 
> > I think we can align with each other on these functions with one
> > common patch: my series requires the fault_data and attach_handle
> > are common to all HWPTs, v.s. exclusive to an hwpt->fault.
> 
> yeah, I'm ok to move the helpers to be in device.c to avoid inline.
> I've three remark on your patch [1]. I'm now also considering to do
> the work in one patch, hence it might be better for review w.r.t the
> comment from Baolu [2].

I replied: I moved attach_handle out of fault functions, leaving
them to only handle fault specific routines.

Yet, you might want to fix the iommufd_device_get_attach_handle()
in my patch to a normal function where NULL will be returned upon
an IS_ERR(attach_handle) v.s. just a to_iommufd_handle, which I
just realized a minute ago..

Thanks
Nic

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-19 13:27 ` [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
@ 2024-12-23  2:51   ` Baolu Lu
  2024-12-24 11:35     ` Yi Liu
  2025-01-09 15:27     ` Jason Gunthorpe
  2025-01-10  7:38   ` Tian, Kevin
  2025-01-13 20:31   ` Jason Gunthorpe
  2 siblings, 2 replies; 71+ messages in thread
From: Baolu Lu @ 2024-12-23  2:51 UTC (permalink / raw)
  To: Yi Liu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 12/19/24 21:27, Yi Liu wrote:
> Intel iommu driver just treats it as a nop since Intel VT-d does not have
> special requirement on domains attached to either the PASID or RID of a
> PASID-capable device.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/intel/iommu.c  | 3 ++-
>   drivers/iommu/intel/nested.c | 2 +-
>   2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cd5e339fd5bb..0a622a89d876 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3347,7 +3347,8 @@ intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
>   	bool first_stage;
>   
>   	if (flags &
> -	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
> +	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
> +	       IOMMU_HWPT_ALLOC_PASID)))
>   		return ERR_PTR(-EOPNOTSUPP);
>   	if (nested_parent && !nested_supported(iommu))
>   		return ERR_PTR(-EOPNOTSUPP);
> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> index aba92c00b427..6ac5c534bef4 100644
> --- a/drivers/iommu/intel/nested.c
> +++ b/drivers/iommu/intel/nested.c
> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
>   	struct dmar_domain *domain;
>   	int ret;
>   
> -	if (!nested_supported(iommu) || flags)
> +	if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)
>   		return ERR_PTR(-EOPNOTSUPP);
>   
>   	/* Must be nested domain */

It's better to abort and fail a domain allocation when
IOMMU_HWPT_ALLOC_PASID is set but the iommu lacks pasid support?

Another related consideration is the support for page faults in nested
domains once PASID is available in user space. Would it be reasonable to
support page faults for nested domains?

If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?

--
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-23  2:51   ` Baolu Lu
@ 2024-12-24 11:35     ` Yi Liu
  2024-12-25  1:02       ` Baolu Lu
  2025-01-09 15:27     ` Jason Gunthorpe
  1 sibling, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-24 11:35 UTC (permalink / raw)
  To: Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 2024/12/23 10:51, Baolu Lu wrote:
> On 12/19/24 21:27, Yi Liu wrote:
>> Intel iommu driver just treats it as a nop since Intel VT-d does not have
>> special requirement on domains attached to either the PASID or RID of a
>> PASID-capable device.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/intel/iommu.c  | 3 ++-
>>   drivers/iommu/intel/nested.c | 2 +-
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index cd5e339fd5bb..0a622a89d876 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -3347,7 +3347,8 @@ intel_iommu_domain_alloc_paging_flags(struct device 
>> *dev, u32 flags,
>>       bool first_stage;
>>       if (flags &
>> -        (~(IOMMU_HWPT_ALLOC_NEST_PARENT | 
>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
>> +        (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
>> +           IOMMU_HWPT_ALLOC_PASID)))
>>           return ERR_PTR(-EOPNOTSUPP);
>>       if (nested_parent && !nested_supported(iommu))
>>           return ERR_PTR(-EOPNOTSUPP);
>> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
>> index aba92c00b427..6ac5c534bef4 100644
>> --- a/drivers/iommu/intel/nested.c
>> +++ b/drivers/iommu/intel/nested.c
>> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, 
>> struct iommu_domain *parent,
>>       struct dmar_domain *domain;
>>       int ret;
>> -    if (!nested_supported(iommu) || flags)
>> +    if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)
>>           return ERR_PTR(-EOPNOTSUPP);
>>       /* Must be nested domain */
> 
> It's better to abort and fail a domain allocation when
> IOMMU_HWPT_ALLOC_PASID is set but the iommu lacks pasid support?

in concept, yes.

> Another related consideration is the support for page faults in nested
> domains once PASID is available in user space. Would it be reasonable to
> support page faults for nested domains?

yeah, it's good to discuss it.

> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?

IMHO, the PRQ support for nested domains requires some more facility. PRQ
can happen at either stage-1 or stage-2, iommu driver may need to tell
it and forward to the correct domain (nested domain or parent domain). Or
the stage-2 is always pinned just as the VFIO/iommufd does. Hence, any PRQ
happens under nested translation should be due to stage-1.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-24 11:35     ` Yi Liu
@ 2024-12-25  1:02       ` Baolu Lu
  2024-12-25  4:30         ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Baolu Lu @ 2024-12-25  1:02 UTC (permalink / raw)
  To: Yi Liu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 12/24/24 19:35, Yi Liu wrote:
>> Another related consideration is the support for page faults in nested
>> domains once PASID is available in user space. Would it be reasonable to
>> support page faults for nested domains?
> 
> yeah, it's good to discuss it.
> 
>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
> 
> IMHO, the PRQ support for nested domains requires some more facility. PRQ
> can happen at either stage-1 or stage-2, iommu driver may need to tell
> it and forward to the correct domain (nested domain or parent domain). Or
> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, any PRQ
> happens under nested translation should be due to stage-1.

The parent domain is currently always pinned and does not yet support
page faults. Therefore, when a page fault occurs within a nested domain,
it apparently should be routed to user space...

If we decide to support page faults on the stage-2 domain in the future,
we will need to figure out the correct destination of each page fault
and route it accordingly, either to the parent domain or the user space
nested domain. Hardware assistance would be beneficial, otherwise the
software may need to traverse the parent domain, which is not
performance friendly.

So, I don't see any reason why the iommu driver shouldn't be able to
support iopf-capable nested domains. Have I missed anything?

--
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-25  1:02       ` Baolu Lu
@ 2024-12-25  4:30         ` Yi Liu
  2024-12-25  7:13           ` Baolu Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2024-12-25  4:30 UTC (permalink / raw)
  To: Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will

On 2024/12/25 09:02, Baolu Lu wrote:
> On 12/24/24 19:35, Yi Liu wrote:
>>> Another related consideration is the support for page faults in nested
>>> domains once PASID is available in user space. Would it be reasonable to
>>> support page faults for nested domains?
>>
>> yeah, it's good to discuss it.
>>
>>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
>>
>> IMHO, the PRQ support for nested domains requires some more facility. PRQ
>> can happen at either stage-1 or stage-2, iommu driver may need to tell
>> it and forward to the correct domain (nested domain or parent domain). Or
>> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, any PRQ
>> happens under nested translation should be due to stage-1.
> 
> The parent domain is currently always pinned and does not yet support
> page faults. Therefore, when a page fault occurs within a nested domain,
> it apparently should be routed to user space...
> 
> If we decide to support page faults on the stage-2 domain in the future,
> we will need to figure out the correct destination of each page fault
> and route it accordingly, either to the parent domain or the user space
> nested domain. Hardware assistance would be beneficial, otherwise the
> software may need to traverse the parent domain, which is not
> performance friendly.

this is my question. Is it still true the parent domain is always pinned
after the below series? If yes, then it's fine to enable IOPF for nested
domain.

https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0-b696ca89ba29@kernel.org/

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-25  4:30         ` Yi Liu
@ 2024-12-25  7:13           ` Baolu Lu
  2025-02-12  7:47             ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Baolu Lu @ 2024-12-25  7:13 UTC (permalink / raw)
  To: Yi Liu, joro, jgg, kevin.tian
  Cc: baolu.lu, eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde,
	will

On 2024/12/25 12:30, Yi Liu wrote:
> On 2024/12/25 09:02, Baolu Lu wrote:
>> On 12/24/24 19:35, Yi Liu wrote:
>>>> Another related consideration is the support for page faults in nested
>>>> domains once PASID is available in user space. Would it be 
>>>> reasonable to
>>>> support page faults for nested domains?
>>>
>>> yeah, it's good to discuss it.
>>>
>>>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
>>>
>>> IMHO, the PRQ support for nested domains requires some more facility. 
>>> PRQ
>>> can happen at either stage-1 or stage-2, iommu driver may need to tell
>>> it and forward to the correct domain (nested domain or parent 
>>> domain). Or
>>> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, 
>>> any PRQ
>>> happens under nested translation should be due to stage-1.
>>
>> The parent domain is currently always pinned and does not yet support
>> page faults. Therefore, when a page fault occurs within a nested domain,
>> it apparently should be routed to user space...
>>
>> If we decide to support page faults on the stage-2 domain in the future,
>> we will need to figure out the correct destination of each page fault
>> and route it accordingly, either to the parent domain or the user space
>> nested domain. Hardware assistance would be beneficial, otherwise the
>> software may need to traverse the parent domain, which is not
>> performance friendly.
> 
> this is my question. Is it still true the parent domain is always pinned
> after the below series? If yes, then it's fine to enable IOPF for nested
> domain.
> 
> https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0- 
> b696ca89ba29@kernel.org/

Then, perhaps we could enforce this in iommufd for a short-term purpose?

When allocating a hwpt in iommufd, we should enforce that the flags
IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_FAULT_ID_VALID cannot be set
at the same time.

---
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
  2024-12-20  2:47   ` Baolu Lu
@ 2025-01-09  7:08   ` Tian, Kevin
  2025-01-09  7:20   ` Tian, Kevin
  2025-01-13 20:21   ` Jason Gunthorpe
  3 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  7:08 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
> 
> +/**
> + * iommu_replace_device_pasid - Replace the domain that a pasid is
> attached to
> + * @domain: the new iommu domain
> + * @dev: the attached device.
> + * @pasid: the pasid of the device.
> + * @handle: the attach handle.
> + *
> + * This API allows the pasid to switch domains. Return 0 on success, or an
> + * error. The pasid will keep the old configuration if replacement failed.
> + * This is supposed to be used by iommufd, and iommufd can guarantee
> that
> + * both iommu_attach_device_pasid() and iommu_replace_device_pasid()
> would
> + * pass in a valid @handle.
> + */

Better explain why a valid handle is required here.

Also iommu_attach_device_pasid() allows NULL handle:

	if (handle)
		handle->domain = domain;

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
  2024-12-20  2:47   ` Baolu Lu
  2025-01-09  7:08   ` Tian, Kevin
@ 2025-01-09  7:20   ` Tian, Kevin
  2025-01-09 14:43     ` Jason Gunthorpe
  2025-01-13 20:21   ` Jason Gunthorpe
  3 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  7:20 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Tian, Kevin
> Sent: Thursday, January 9, 2025 3:08 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, December 19, 2024 9:28 PM
> >
> > +/**
> > + * iommu_replace_device_pasid - Replace the domain that a pasid is
> > attached to
> > + * @domain: the new iommu domain
> > + * @dev: the attached device.
> > + * @pasid: the pasid of the device.
> > + * @handle: the attach handle.
> > + *
> > + * This API allows the pasid to switch domains. Return 0 on success, or an
> > + * error. The pasid will keep the old configuration if replacement failed.
> > + * This is supposed to be used by iommufd, and iommufd can guarantee
> > that
> > + * both iommu_attach_device_pasid() and iommu_replace_device_pasid()
> > would
> > + * pass in a valid @handle.
> > + */
> 
> Better explain why a valid handle is required here.

Okay, it's because __iommu_set_group_pasid() requires the old domain now
and the only way to retrieve it at this point is via a handle. It's probably also
ok to directly store a domain pointer to the xarray when the handle is missing
but that sounds more confusing.

> 
> Also iommu_attach_device_pasid() allows NULL handle:
> 
> 	if (handle)
> 		handle->domain = domain;

With that it sounds more consistent to make attach requiring a valid
handle too.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2024-12-19 13:27 ` [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core Yi Liu
  2024-12-20  4:35   ` Nicolin Chen
@ 2025-01-09  7:44   ` Tian, Kevin
  2025-01-17 12:33     ` Yi Liu
  1 sibling, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  7:44 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
> 
>  static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
>  					      struct iommufd_hw_pagetable
> *hwpt,
>  					      struct iommufd_hw_pagetable
> *old)
>  {
> +	struct iommufd_attach_handle *curr;
> +	int ret;
> +
>  	if (old->fault || hwpt->fault)
>  		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
> 
> -	return iommu_group_replace_domain(idev->igroup->group, hwpt-
> >domain);
> +	curr = iommufd_device_get_attach_handle(idev);
> +
> +	ret = iommufd_dev_replace_handle(idev, hwpt, old);
> +	if (ret)
> +		return ret;
> +
> +	kfree(curr);
> +	return 0;
>  }
> 

It's not balanced to have handle freed in different locations (one in
Fault domain specific handler and the other here).

It sounds clearer to remove the fault domain specific helper and have
everything together in this function when you move it to a C file.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path
  2024-12-19 13:27 ` [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path Yi Liu
@ 2025-01-09  7:53   ` Tian, Kevin
  2025-01-09 14:51     ` Jason Gunthorpe
  0 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  7:53 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
>  void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable
> *hwpt,
> -				     struct iommufd_device *idev)
> +				     struct iommufd_device *idev,
> +				     ioasid_t pasid)
>  {
>  	struct iommufd_attach_handle *handle;
> 
> -	handle = iommufd_device_get_attach_handle(idev);
> +	handle = iommufd_device_get_attach_handle(idev, pasid);
> +	WARN_ON(pasid != IOMMU_NO_PASID);

Nit: usually we warn before using it, though this will soon be deleted by following patch.

> @@ -184,7 +184,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx
> *ictx, struct iommufd_ioas *ioas,
>  	 * sequence. Once those drivers are fixed this should be removed.
>  	 */
>  	if (immediate_attach) {
> -		rc = iommufd_hw_pagetable_attach(hwpt, idev);
> +		/* Sinc this is just a trick, so passing IOMMU_NO_PASID is
> enough */
> +		rc = iommufd_hw_pagetable_attach(hwpt, idev,
> IOMMU_NO_PASID);

Not sure this comment helps. The ALLOC uAPI is defined against a device, not
device+pasid. So it's implied to use IOMMU_NO_PASID. The said trick doesn't
matter here.

While at it, @Jason, is immediate_attach still required now?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 06/14] iommufd: Mark PASID-compatible domain
  2024-12-19 13:27 ` [PATCH v6 06/14] iommufd: Mark PASID-compatible domain Yi Liu
@ 2025-01-09  7:56   ` Tian, Kevin
  2025-01-09 14:54   ` Jason Gunthorpe
  1 sibling, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  7:56 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
> 
> AMD IOMMU requires attaching PASID-compatible domains to PASID-
> capable
> devices. This includes the domains attached to RID and PASIDs. Related
> discussions in link [1] and [2].  ARM has similar requirement but does
> not need extra hint from iommufd, Intel does not have this requirement
> but can live up with it. Hence, iommufd is going to enforce this
> requirement as it's general requirement. 

Clearly general requirement is not a reason from the description. Instead
we just want to lift it to be a general requirement as doing so is not
harmful to other vendors.

> Mark the PASID-capable domains
> to prepare for adding this enforcement when PASID support is added.
> 
> [1] https://lore.kernel.org/linux-
> iommu/20240709182303.GK14050@ziepe.ca/
> [2] https://lore.kernel.org/linux-
> iommu/20240822124433.GD3468552@ziepe.ca/
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 07/14] iommufd: Support pasid attach/replace
  2024-12-19 13:27 ` [PATCH v6 07/14] iommufd: Support pasid attach/replace Yi Liu
@ 2025-01-09  8:25   ` Tian, Kevin
  0 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  8:25 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
>
> +struct iommufd_hw_pagetable *
> +iommufd_device_pasid_do_attach(struct iommufd_device *idev, ioasid_t
> pasid,
> +			       struct iommufd_hw_pagetable *hwpt)
> +{
> +	void *curr;
> +	int rc;
> +
> +	refcount_inc(&hwpt->obj.users);
> +	curr = xa_cmpxchg(&idev->pasid_hwpts, pasid, NULL, hwpt,
> GFP_KERNEL);
> +	if (curr) {
> +		if (curr == hwpt)
> +			rc = 0;
> +		else
> +			rc = xa_err(curr) ? : -EINVAL;
> +		goto err_put_hwpt;
> +	}

Refcount can be incremented at the end when the operation succeeds
then no need to recover it in the err unwind and here direct returns.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 08/14] iommufd: Enforce PASID-compatible domain for RID
  2024-12-19 13:27 ` [PATCH v6 08/14] iommufd: Enforce PASID-compatible domain for RID Yi Liu
@ 2025-01-09  8:31   ` Tian, Kevin
  0 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-09  8:31 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
> 
> Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to
> enforce
> the RID to use PASID-compatible domain if PASID has been attached.

And vice versa.

> 
> This enforcement requires a lock across the RID and PASID attach path,
> use the idev->igroup->lock for this sync.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h | 18 ++++++++++++++++--
>  drivers/iommu/iommufd/pasid.c           | 14 +++++++++++++-
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/iommufd_private.h
> b/drivers/iommu/iommufd/iommufd_private.h
> index e9d6bd8b44bc..158d8e6d5a9a 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -513,7 +513,14 @@ static inline int iommufd_hwpt_attach_device(struct
> iommufd_hw_pagetable *hwpt,
>  					     struct iommufd_device *idev,
>  					     ioasid_t pasid)
>  {
> -	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
> +	lockdep_assert_held(&idev->igroup->lock);
> +
> +	if (pasid == IOMMU_NO_PASID &&
> +	    !xa_empty(&idev->pasid_hwpts) && !hwpt->pasid_compat)
> +		return -EINVAL;
> +
> +	if (pasid != IOMMU_NO_PASID &&
> +	    (!idev->igroup->hwpt->pasid_compat || !hwpt->pasid_compat))
>  		return -EINVAL;
> 
>  	if (hwpt->fault)
> @@ -549,7 +556,14 @@ static inline int
> iommufd_hwpt_replace_device(struct iommufd_device *idev,
>  	struct iommufd_attach_handle *curr;
>  	int ret;
> 
> -	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
> +	lockdep_assert_held(&idev->igroup->lock);
> +
> +	if (pasid == IOMMU_NO_PASID &&
> +	    !xa_empty(&idev->pasid_hwpts) && !hwpt->pasid_compat)
> +		return -EINVAL;
> +
> +	if (pasid != IOMMU_NO_PASID &&
> +	    (!idev->igroup->hwpt->pasid_compat || !hwpt->pasid_compat))
>  		return -EINVAL;

Make above check into a helper as both attach/replace require it.

> 
>  	if (old->fault || hwpt->fault)
> diff --git a/drivers/iommu/iommufd/pasid.c
> b/drivers/iommu/iommufd/pasid.c
> index fcdfbc01dcbb..fdf97f1d71ae 100644
> --- a/drivers/iommu/iommufd/pasid.c
> +++ b/drivers/iommu/iommufd/pasid.c
> @@ -15,6 +15,8 @@ iommufd_device_pasid_do_attach(struct
> iommufd_device *idev, ioasid_t pasid,
>  	int rc;
> 
>  	refcount_inc(&hwpt->obj.users);
> +
> +	mutex_lock(&idev->igroup->lock);
>  	curr = xa_cmpxchg(&idev->pasid_hwpts, pasid, NULL, hwpt,
> GFP_KERNEL);
>  	if (curr) {
>  		if (curr == hwpt)
> @@ -30,9 +32,11 @@ iommufd_device_pasid_do_attach(struct
> iommufd_device *idev, ioasid_t pasid,
>  		goto err_put_hwpt;
>  	}
> 
> +	mutex_unlock(&idev->igroup->lock);
>  	return NULL;
> 
>  err_put_hwpt:
> +	mutex_unlock(&idev->igroup->lock);
>  	refcount_dec(&hwpt->obj.users);
>  	return rc ? ERR_PTR(rc) : NULL;
>  }
> @@ -45,6 +49,8 @@ iommufd_device_pasid_do_replace(struct
> iommufd_device *idev, ioasid_t pasid,
>  	int rc;
> 
>  	refcount_inc(&hwpt->obj.users);
> +
> +	mutex_lock(&idev->igroup->lock);
>  	curr = xa_store(&idev->pasid_hwpts, pasid, hwpt, GFP_KERNEL);
>  	rc = xa_err(curr);
>  	if (rc)
> @@ -70,10 +76,12 @@ iommufd_device_pasid_do_replace(struct
> iommufd_device *idev, ioasid_t pasid,
>  		goto out_put_hwpt;
>  	}
> 
> +	mutex_unlock(&idev->igroup->lock);
>  	/* Caller must destroy old_hwpt */
>  	return curr;
> 
>  out_put_hwpt:
> +	mutex_unlock(&idev->igroup->lock);
>  	refcount_dec(&hwpt->obj.users);
>  	return rc ? ERR_PTR(rc) : NULL;
>  }
> @@ -152,10 +160,14 @@ void iommufd_device_pasid_detach(struct
> iommufd_device *idev, ioasid_t pasid)
>  {
>  	struct iommufd_hw_pagetable *hwpt;
> 
> +	mutex_lock(&idev->igroup->lock);
>  	hwpt = xa_erase(&idev->pasid_hwpts, pasid);
> -	if (WARN_ON(!hwpt))
> +	if (WARN_ON(!hwpt)) {
> +		mutex_unlock(&idev->igroup->lock);
>  		return;
> +	}
>  	iommufd_hwpt_detach_device(hwpt, idev, pasid);
> +	mutex_unlock(&idev->igroup->lock);
>  	iommufd_hw_pagetable_put(idev->ictx, hwpt);
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_pasid_detach, "IOMMUFD");
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-09  7:20   ` Tian, Kevin
@ 2025-01-09 14:43     ` Jason Gunthorpe
  2025-01-10  2:31       ` Baolu Lu
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-09 14:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On Thu, Jan 09, 2025 at 07:20:01AM +0000, Tian, Kevin wrote:
> > From: Tian, Kevin
> > Sent: Thursday, January 9, 2025 3:08 PM
> > 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, December 19, 2024 9:28 PM
> > >
> > > +/**
> > > + * iommu_replace_device_pasid - Replace the domain that a pasid is
> > > attached to
> > > + * @domain: the new iommu domain
> > > + * @dev: the attached device.
> > > + * @pasid: the pasid of the device.
> > > + * @handle: the attach handle.
> > > + *
> > > + * This API allows the pasid to switch domains. Return 0 on success, or an
> > > + * error. The pasid will keep the old configuration if replacement failed.
> > > + * This is supposed to be used by iommufd, and iommufd can guarantee
> > > that
> > > + * both iommu_attach_device_pasid() and iommu_replace_device_pasid()
> > > would
> > > + * pass in a valid @handle.
> > > + */
> > 
> > Better explain why a valid handle is required here.
> 
> Okay, it's because __iommu_set_group_pasid() requires the old domain now
> and the only way to retrieve it at this point is via a handle. It's probably also
> ok to directly store a domain pointer to the xarray when the handle is missing
> but that sounds more confusing.

I had shared a xarray approach to do that at one point, it apparently
was scary enough nobody picked it up :)

https://lore.kernel.org/linux-iommu/20240322165927.GG66976@ziepe.ca/

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path
  2025-01-09  7:53   ` Tian, Kevin
@ 2025-01-09 14:51     ` Jason Gunthorpe
  2025-01-10  7:22       ` Tian, Kevin
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-09 14:51 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On Thu, Jan 09, 2025 at 07:53:03AM +0000, Tian, Kevin wrote:

> While at it, @Jason, is immediate_attach still required now?

Sadly yes. 

virtio-iommu hasn't been converted to domain_alloc_paging() yet and it
has a domain finalize function called in attach. I think this is easy
to fix.

arm-smmu-v2 is converted, but it still has the finalize function
during attach only (I failed to fix this).

mtk_iommu and mtk_iommu_v1 both have "domain finalise" functions
by simple grep.

Possibly more issues in the embedded drivers.

However the server drivers (smmuv3, intel, amd, riscv) are all now OK
without it.

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 06/14] iommufd: Mark PASID-compatible domain
  2024-12-19 13:27 ` [PATCH v6 06/14] iommufd: Mark PASID-compatible domain Yi Liu
  2025-01-09  7:56   ` Tian, Kevin
@ 2025-01-09 14:54   ` Jason Gunthorpe
  2025-01-17 10:50     ` Yi Liu
  1 sibling, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-09 14:54 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, kevin.tian, baolu.lu, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will

On Thu, Dec 19, 2024 at 05:27:38AM -0800, Yi Liu wrote:
> AMD IOMMU requires attaching PASID-compatible domains to PASID-capable
> devices. This includes the domains attached to RID and PASIDs. Related
> discussions in link [1] and [2].  ARM has similar requirement but does
> not need extra hint from iommufd,

ARM does use the hint now, exactly the same as AMD, it is merged:

https://lore.kernel.org/all/2-v1-0bb8d5313a27+27b-smmuv3_paging_flags_jgg@nvidia.com/

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-23  2:51   ` Baolu Lu
  2024-12-24 11:35     ` Yi Liu
@ 2025-01-09 15:27     ` Jason Gunthorpe
  2025-01-10  2:41       ` Baolu Lu
  1 sibling, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-09 15:27 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, kevin.tian, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will

On Mon, Dec 23, 2024 at 10:51:50AM +0800, Baolu Lu wrote:
> > @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
> >   	struct dmar_domain *domain;
> >   	int ret;
> > -	if (!nested_supported(iommu) || flags)
> > +	if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)
> >   		return ERR_PTR(-EOPNOTSUPP);
> >   	/* Must be nested domain */
> 
> It's better to abort and fail a domain allocation when
> IOMMU_HWPT_ALLOC_PASID is set but the iommu lacks pasid support?

With multi-instances of iommus in the system you still want to succeed
creating a PASID capable domain even on instances that don't support
PASID since it may end up being used on an other instances too.

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-09 14:43     ` Jason Gunthorpe
@ 2025-01-10  2:31       ` Baolu Lu
  2025-01-10  7:21         ` Tian, Kevin
  0 siblings, 1 reply; 71+ messages in thread
From: Baolu Lu @ 2025-01-10  2:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Liu, Yi L, joro@8bytes.org, eric.auger@redhat.com,
	nicolinc@nvidia.com, chao.p.peng@linux.intel.com,
	iommu@lists.linux.dev, vasant.hegde@amd.com, will@kernel.org

On 1/9/25 22:43, Jason Gunthorpe wrote:
> On Thu, Jan 09, 2025 at 07:20:01AM +0000, Tian, Kevin wrote:
>>> From: Tian, Kevin
>>> Sent: Thursday, January 9, 2025 3:08 PM
>>>
>>>> From: Liu, Yi L<yi.l.liu@intel.com>
>>>> Sent: Thursday, December 19, 2024 9:28 PM
>>>>
>>>> +/**
>>>> + * iommu_replace_device_pasid - Replace the domain that a pasid is
>>>> attached to
>>>> + * @domain: the new iommu domain
>>>> + * @dev: the attached device.
>>>> + * @pasid: the pasid of the device.
>>>> + * @handle: the attach handle.
>>>> + *
>>>> + * This API allows the pasid to switch domains. Return 0 on success, or an
>>>> + * error. The pasid will keep the old configuration if replacement failed.
>>>> + * This is supposed to be used by iommufd, and iommufd can guarantee
>>>> that
>>>> + * both iommu_attach_device_pasid() and iommu_replace_device_pasid()
>>>> would
>>>> + * pass in a valid @handle.
>>>> + */
>>> Better explain why a valid handle is required here.
>> Okay, it's because __iommu_set_group_pasid() requires the old domain now
>> and the only way to retrieve it at this point is via a handle. It's probably also
>> ok to directly store a domain pointer to the xarray when the handle is missing
>> but that sounds more confusing.
> I had shared a xarray approach to do that at one point, it apparently
> was scary enough nobody picked it up 🙂
> 
> https://lore.kernel.org/linux-iommu/20240322165927.GG66976@ziepe.ca/

I evaluated it at that time and ultimately decided not using it. At that
time, there was no requirement for pasid replacement, so the 'old
domain' was not yet an issue. This is why I chose to allocate on demand
at that stage. :-)

---
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-01-09 15:27     ` Jason Gunthorpe
@ 2025-01-10  2:41       ` Baolu Lu
  2025-01-10  7:34         ` Tian, Kevin
  0 siblings, 1 reply; 71+ messages in thread
From: Baolu Lu @ 2025-01-10  2:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, kevin.tian, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will

On 1/9/25 23:27, Jason Gunthorpe wrote:
> On Mon, Dec 23, 2024 at 10:51:50AM +0800, Baolu Lu wrote:
>>> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
>>>    	struct dmar_domain *domain;
>>>    	int ret;
>>> -	if (!nested_supported(iommu) || flags)
>>> +	if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)
>>>    		return ERR_PTR(-EOPNOTSUPP);
>>>    	/* Must be nested domain */
>> It's better to abort and fail a domain allocation when
>> IOMMU_HWPT_ALLOC_PASID is set but the iommu lacks pasid support?
> With multi-instances of iommus in the system you still want to succeed
> creating a PASID capable domain even on instances that don't support
> PASID since it may end up being used on an other instances too.

Okay, so this flag is just a hint that "this domain will probably be
attached to a pasid of a device", NOT a requirement of "this domain
necessitates hardware support for the PASID feature".

---
baolu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-10  2:31       ` Baolu Lu
@ 2025-01-10  7:21         ` Tian, Kevin
  2025-01-16 10:00           ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-10  7:21 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: Liu, Yi L, joro@8bytes.org, eric.auger@redhat.com,
	nicolinc@nvidia.com, chao.p.peng@linux.intel.com,
	iommu@lists.linux.dev, vasant.hegde@amd.com, will@kernel.org

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, January 10, 2025 10:32 AM
> 
> On 1/9/25 22:43, Jason Gunthorpe wrote:
> > On Thu, Jan 09, 2025 at 07:20:01AM +0000, Tian, Kevin wrote:
> >>> From: Tian, Kevin
> >>> Sent: Thursday, January 9, 2025 3:08 PM
> >>>
> >>>> From: Liu, Yi L<yi.l.liu@intel.com>
> >>>> Sent: Thursday, December 19, 2024 9:28 PM
> >>>>
> >>>> +/**
> >>>> + * iommu_replace_device_pasid - Replace the domain that a pasid is
> >>>> attached to
> >>>> + * @domain: the new iommu domain
> >>>> + * @dev: the attached device.
> >>>> + * @pasid: the pasid of the device.
> >>>> + * @handle: the attach handle.
> >>>> + *
> >>>> + * This API allows the pasid to switch domains. Return 0 on success, or
> an
> >>>> + * error. The pasid will keep the old configuration if replacement failed.
> >>>> + * This is supposed to be used by iommufd, and iommufd can
> guarantee
> >>>> that
> >>>> + * both iommu_attach_device_pasid() and
> iommu_replace_device_pasid()
> >>>> would
> >>>> + * pass in a valid @handle.
> >>>> + */
> >>> Better explain why a valid handle is required here.
> >> Okay, it's because __iommu_set_group_pasid() requires the old domain
> now
> >> and the only way to retrieve it at this point is via a handle. It's probably
> also
> >> ok to directly store a domain pointer to the xarray when the handle is
> missing
> >> but that sounds more confusing.
> > I had shared a xarray approach to do that at one point, it apparently
> > was scary enough nobody picked it up 🙂
> >
> > https://lore.kernel.org/linux-iommu/20240322165927.GG66976@ziepe.ca/
> 
> I evaluated it at that time and ultimately decided not using it. At that
> time, there was no requirement for pasid replacement, so the 'old
> domain' was not yet an issue. This is why I chose to allocate on demand
> at that stage. :-)
> 

Not that scary when reading it again. 😊

Somehow less restrictions on the kAPI is worthy of a bit more work
Inside the helpers. I won't be the last one to think about the difference
why the pasid variants impose such restriction while it's not applied
to RID.

Yi, could you take a look and think about any oversight of adopting it?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path
  2025-01-09 14:51     ` Jason Gunthorpe
@ 2025-01-10  7:22       ` Tian, Kevin
  0 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-10  7:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, January 9, 2025 10:51 PM
> 
> On Thu, Jan 09, 2025 at 07:53:03AM +0000, Tian, Kevin wrote:
> 
> > While at it, @Jason, is immediate_attach still required now?
> 
> Sadly yes.
> 
> virtio-iommu hasn't been converted to domain_alloc_paging() yet and it
> has a domain finalize function called in attach. I think this is easy
> to fix.
> 
> arm-smmu-v2 is converted, but it still has the finalize function
> during attach only (I failed to fix this).
> 
> mtk_iommu and mtk_iommu_v1 both have "domain finalise" functions
> by simple grep.
> 
> Possibly more issues in the embedded drivers.
> 
> However the server drivers (smmuv3, intel, amd, riscv) are all now OK
> without it.
> 

Thanks for the info. Looks it has to stay there for a longer time.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-01-10  2:41       ` Baolu Lu
@ 2025-01-10  7:34         ` Tian, Kevin
  2025-01-17 10:57           ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-10  7:34 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: Liu, Yi L, joro@8bytes.org, eric.auger@redhat.com,
	nicolinc@nvidia.com, chao.p.peng@linux.intel.com,
	iommu@lists.linux.dev, vasant.hegde@amd.com, will@kernel.org

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, January 10, 2025 10:41 AM
> 
> On 1/9/25 23:27, Jason Gunthorpe wrote:
> > On Mon, Dec 23, 2024 at 10:51:50AM +0800, Baolu Lu wrote:
> >>> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device
> *dev, struct iommu_domain *parent,
> >>>    	struct dmar_domain *domain;
> >>>    	int ret;
> >>> -	if (!nested_supported(iommu) || flags)
> >>> +	if (!nested_supported(iommu) || flags &
> ~IOMMU_HWPT_ALLOC_PASID)
> >>>    		return ERR_PTR(-EOPNOTSUPP);
> >>>    	/* Must be nested domain */
> >> It's better to abort and fail a domain allocation when
> >> IOMMU_HWPT_ALLOC_PASID is set but the iommu lacks pasid support?
> > With multi-instances of iommus in the system you still want to succeed
> > creating a PASID capable domain even on instances that don't support
> > PASID since it may end up being used on an other instances too.
> 
> Okay, so this flag is just a hint that "this domain will probably be
> attached to a pasid of a device", NOT a requirement of "this domain
> necessitates hardware support for the PASID feature".
> 

And note that pasid capability in VT-d just decides whether to serve
DMA request with PASID. It doesn't change the format. It's legacy
Mode vs. scalable mode for deciding the format, which is always
pasid compatible once scalable mode is selected.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-19 13:27 ` [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
  2024-12-23  2:51   ` Baolu Lu
@ 2025-01-10  7:38   ` Tian, Kevin
  2025-01-14  8:13     ` Tian, Kevin
  2025-01-13 20:31   ` Jason Gunthorpe
  2 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-10  7:38 UTC (permalink / raw)
  To: Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, December 19, 2024 9:28 PM
> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device
> *dev, struct iommu_domain *parent,
>  	struct dmar_domain *domain;
>  	int ret;
> 
> -	if (!nested_supported(iommu) || flags)
> +	if (!nested_supported(iommu) || flags &
> ~IOMMU_HWPT_ALLOC_PASID)
>  		return ERR_PTR(-EOPNOTSUPP);
> 

Hmm isn't it causing regression? Though we don't have qemu support
ready, the kernel support has been there for quite some time so
we're not sure any vIOMMU already developed to utilize this interface
then will be broken on this change.

Actually this check is not necessary. I don't see any restriction on
VT-d which requires setting this flag for nesting. Instead ARM/AMD
would require clearing this bit for nesting, as in that case it's
RID attaching to CD table and there is no pasid attaching in the kernel
hence no choice of format.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
                     ` (2 preceding siblings ...)
  2025-01-09  7:20   ` Tian, Kevin
@ 2025-01-13 20:21   ` Jason Gunthorpe
  2025-01-14  8:10     ` Tian, Kevin
  3 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-13 20:21 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, kevin.tian, baolu.lu, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will

On Thu, Dec 19, 2024 at 05:27:33AM -0800, Yi Liu wrote:
> +	mutex_lock(&group->mutex);
> +	/*
> +	 * The iommu_attach_handle of the pasid becomes inconsistent with the
> +	 * actual handle per the below operation. The concurrent PRI path will
> +	 * deliver the PRQs per the new handle, this does not have a functional
> +	 * impact. The PRI path would eventually become consistent when the
> +	 * replacement is done.
> +	 */
> +	curr = (struct iommu_attach_handle *)xa_store(&group->pasid_array,
> +						      pasid, handle,
> +						      GFP_KERNEL);

The cast is not necessary..

> +
> +	ret = __iommu_set_group_pasid(domain, group, pasid, curr->domain);
> +	if (ret)
> +		WARN_ON(handle != xa_store(&group->pasid_array, pasid,
> +					   curr, GFP_KERNEL));

I wonder about the ordering here, is it OK to have PRIs being
delivered to a domain that failed to attach? What cleans up that race
condition with domain free?

Should we replace the domain then set the xarray? (and same ordering
question for normal attach)

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-19 13:27 ` [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
  2024-12-23  2:51   ` Baolu Lu
  2025-01-10  7:38   ` Tian, Kevin
@ 2025-01-13 20:31   ` Jason Gunthorpe
  2025-01-14  8:19     ` Tian, Kevin
  2 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-13 20:31 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, kevin.tian, baolu.lu, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will

On Thu, Dec 19, 2024 at 05:27:41AM -0800, Yi Liu wrote:
> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> index aba92c00b427..6ac5c534bef4 100644
> --- a/drivers/iommu/intel/nested.c
> +++ b/drivers/iommu/intel/nested.c
> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
>  	struct dmar_domain *domain;
>  	int ret;
>  
> -	if (!nested_supported(iommu) || flags)
> +	if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)

ARM and AMD should reject ALLOC_PASID when combined with nesting, it
cannot do that. So it would be OK for intel to reject it too

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-13 20:21   ` Jason Gunthorpe
@ 2025-01-14  8:10     ` Tian, Kevin
  2025-01-14 13:45       ` Jason Gunthorpe
  0 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-14  8:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L
  Cc: joro@8bytes.org, baolu.lu@linux.intel.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, chao.p.peng@linux.intel.com,
	iommu@lists.linux.dev, vasant.hegde@amd.com, will@kernel.org

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, January 14, 2025 4:22 AM
> 
> On Thu, Dec 19, 2024 at 05:27:33AM -0800, Yi Liu wrote:
> > +	mutex_lock(&group->mutex);
> > +	/*
> > +	 * The iommu_attach_handle of the pasid becomes inconsistent with
> the
> > +	 * actual handle per the below operation. The concurrent PRI path
> will
> > +	 * deliver the PRQs per the new handle, this does not have a
> functional
> > +	 * impact. The PRI path would eventually become consistent when
> the
> > +	 * replacement is done.
> > +	 */
> > +	curr = (struct iommu_attach_handle *)xa_store(&group->pasid_array,
> > +						      pasid, handle,
> > +						      GFP_KERNEL);
> 
> The cast is not necessary..
> 
> > +
> > +	ret = __iommu_set_group_pasid(domain, group, pasid, curr-
> >domain);
> > +	if (ret)
> > +		WARN_ON(handle != xa_store(&group->pasid_array, pasid,
> > +					   curr, GFP_KERNEL));
> 
> I wonder about the ordering here, is it OK to have PRIs being
> delivered to a domain that failed to attach? What cleans up that race
> condition with domain free?
> 
> Should we replace the domain then set the xarray? (and same ordering
> question for normal attach)
> 

That makes sense to me.

But I don't think there is a problem with attach. xa_insert() will
return error if an entry already exists. So there won't be any
PRI being delivered at __iommu_set_group_pasid(), no matter
it succeeds or not.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-01-10  7:38   ` Tian, Kevin
@ 2025-01-14  8:13     ` Tian, Kevin
  0 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-14  8:13 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, January 10, 2025 3:39 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, December 19, 2024 9:28 PM
> > @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device
> > *dev, struct iommu_domain *parent,
> >  	struct dmar_domain *domain;
> >  	int ret;
> >
> > -	if (!nested_supported(iommu) || flags)
> > +	if (!nested_supported(iommu) || flags &
> > ~IOMMU_HWPT_ALLOC_PASID)
> >  		return ERR_PTR(-EOPNOTSUPP);
> >
> 
> Hmm isn't it causing regression? Though we don't have qemu support
> ready, the kernel support has been there for quite some time so
> we're not sure any vIOMMU already developed to utilize this interface
> then will be broken on this change.
> 

Please ignore this comment. The change was to allow the new flag
instead of requiring it.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-01-13 20:31   ` Jason Gunthorpe
@ 2025-01-14  8:19     ` Tian, Kevin
  0 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-01-14  8:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L
  Cc: joro@8bytes.org, baolu.lu@linux.intel.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, chao.p.peng@linux.intel.com,
	iommu@lists.linux.dev, vasant.hegde@amd.com, will@kernel.org

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, January 14, 2025 4:31 AM
> 
> On Thu, Dec 19, 2024 at 05:27:41AM -0800, Yi Liu wrote:
> > diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> > index aba92c00b427..6ac5c534bef4 100644
> > --- a/drivers/iommu/intel/nested.c
> > +++ b/drivers/iommu/intel/nested.c
> > @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device
> *dev, struct iommu_domain *parent,
> >  	struct dmar_domain *domain;
> >  	int ret;
> >
> > -	if (!nested_supported(iommu) || flags)
> > +	if (!nested_supported(iommu) || flags &
> ~IOMMU_HWPT_ALLOC_PASID)
> 
> ARM and AMD should reject ALLOC_PASID when combined with nesting, it
> cannot do that. So it would be OK for intel to reject it too
> 

But Intel does allow nest_hwpt attached to a PASID. It sounds clearer
to me to have the flag required on all hwpt types attached to PASID than
having inconsistent rules among types. Anyway the nested hwpt alloc
logic is per viommu so having different rules among vendors is fine...

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-14  8:10     ` Tian, Kevin
@ 2025-01-14 13:45       ` Jason Gunthorpe
  2025-01-15  4:43         ` Tian, Kevin
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-14 13:45 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On Tue, Jan 14, 2025 at 08:10:41AM +0000, Tian, Kevin wrote:

> > > +	ret = __iommu_set_group_pasid(domain, group, pasid, curr-
> > >domain);
> > > +	if (ret)
> > > +		WARN_ON(handle != xa_store(&group->pasid_array, pasid,
> > > +					   curr, GFP_KERNEL));
> > 
> > I wonder about the ordering here, is it OK to have PRIs being
> > delivered to a domain that failed to attach? What cleans up that race
> > condition with domain free?
> > 
> > Should we replace the domain then set the xarray? (and same ordering
> > question for normal attach)
> 
> That makes sense to me.
> 
> But I don't think there is a problem with attach. xa_insert() will
> return error if an entry already exists. So there won't be any
> PRI being delivered at __iommu_set_group_pasid(), no matter
> it succeeds or not.

It has the same issue:

	ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
	if (ret)
		goto out_unlock;

	ret = __iommu_set_group_pasid(domain, group, pasid);
        .. Concurrently a PRI event is pushed to the domain ..
	if (ret)
		xa_erase(&group->pasid_array, pasid);

        .. Now what? Who fences the PRI event thread before the caller
	   frees the domain ..?

We arranged things so that detatch would fence the PRI, if detach is
not called then there is no fence..

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-14 13:45       ` Jason Gunthorpe
@ 2025-01-15  4:43         ` Tian, Kevin
  2025-01-15 14:43           ` Jason Gunthorpe
  0 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-15  4:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, January 14, 2025 9:46 PM
> 
> On Tue, Jan 14, 2025 at 08:10:41AM +0000, Tian, Kevin wrote:
> 
> > > > +	ret = __iommu_set_group_pasid(domain, group, pasid, curr-
> > > >domain);
> > > > +	if (ret)
> > > > +		WARN_ON(handle != xa_store(&group->pasid_array, pasid,
> > > > +					   curr, GFP_KERNEL));
> > >
> > > I wonder about the ordering here, is it OK to have PRIs being
> > > delivered to a domain that failed to attach? What cleans up that race
> > > condition with domain free?
> > >
> > > Should we replace the domain then set the xarray? (and same ordering
> > > question for normal attach)
> >
> > That makes sense to me.
> >
> > But I don't think there is a problem with attach. xa_insert() will
> > return error if an entry already exists. So there won't be any
> > PRI being delivered at __iommu_set_group_pasid(), no matter
> > it succeeds or not.
> 
> It has the same issue:
> 
> 	ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
> 	if (ret)
> 		goto out_unlock;
> 
> 	ret = __iommu_set_group_pasid(domain, group, pasid);
>         .. Concurrently a PRI event is pushed to the domain ..
> 	if (ret)
> 		xa_erase(&group->pasid_array, pasid);
> 
>         .. Now what? Who fences the PRI event thread before the caller
> 	   frees the domain ..?
> 
> We arranged things so that detatch would fence the PRI, if detach is
> not called then there is no fence..
> 

Though I'm fine to change the order in attach too as it looks more
reasonable logically, I'm trying to understand the actual impact of
the original order (e.g. is the change worth of a Fix tag?)

If there is no detach happened before then it's the 1st attach to
a faultable domain and PRI will be enabled right before this function
hence no fence required.

Would a sane device trigger PRI in this window?

If it's a detach-then-attach flow, detach will do the fence anyway
before the attach.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-15  4:43         ` Tian, Kevin
@ 2025-01-15 14:43           ` Jason Gunthorpe
  2025-01-16  5:48             ` Tian, Kevin
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-01-15 14:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On Wed, Jan 15, 2025 at 04:43:41AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, January 14, 2025 9:46 PM
> > 
> > On Tue, Jan 14, 2025 at 08:10:41AM +0000, Tian, Kevin wrote:
> > 
> > > > > +	ret = __iommu_set_group_pasid(domain, group, pasid, curr-
> > > > >domain);
> > > > > +	if (ret)
> > > > > +		WARN_ON(handle != xa_store(&group->pasid_array, pasid,
> > > > > +					   curr, GFP_KERNEL));
> > > >
> > > > I wonder about the ordering here, is it OK to have PRIs being
> > > > delivered to a domain that failed to attach? What cleans up that race
> > > > condition with domain free?
> > > >
> > > > Should we replace the domain then set the xarray? (and same ordering
> > > > question for normal attach)
> > >
> > > That makes sense to me.
> > >
> > > But I don't think there is a problem with attach. xa_insert() will
> > > return error if an entry already exists. So there won't be any
> > > PRI being delivered at __iommu_set_group_pasid(), no matter
> > > it succeeds or not.
> > 
> > It has the same issue:
> > 
> > 	ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
> > 	if (ret)
> > 		goto out_unlock;
> > 
> > 	ret = __iommu_set_group_pasid(domain, group, pasid);
> >         .. Concurrently a PRI event is pushed to the domain ..
> > 	if (ret)
> > 		xa_erase(&group->pasid_array, pasid);
> > 
> >         .. Now what? Who fences the PRI event thread before the caller
> > 	   frees the domain ..?
> > 
> > We arranged things so that detatch would fence the PRI, if detach is
> > not called then there is no fence..
> > 
> 
> Though I'm fine to change the order in attach too as it looks more
> reasonable logically, I'm trying to understand the actual impact of
> the original order (e.g. is the change worth of a Fix tag?)
> 
> If there is no detach happened before then it's the 1st attach to
> a faultable domain and PRI will be enabled right before this function
> hence no fence required.
> 
> Would a sane device trigger PRI in this window?

I would say this is not a "sane" scenario, this is a theoretical race
triggerable by the device. Perhaps a VFIO user can force the device to
trigger this race and exploit the kernel.

> If it's a detach-then-attach flow, detach will do the fence anyway
> before the attach.

The issue is the error, once we do the xa_insert() then any faults will
get routed to our domain and the fault path threads will hold pointers
to the domain. Once the xa_insert() is done we must flush the fault
path threads before allowing the domain to be freed.

If __iommu_set_group_pasid() fails then we do an xa_erase() but
nothing will flush the fault threads.

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-15 14:43           ` Jason Gunthorpe
@ 2025-01-16  5:48             ` Tian, Kevin
  2025-01-17 10:32               ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-01-16  5:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro@8bytes.org, baolu.lu@linux.intel.com,
	eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, January 15, 2025 10:44 PM
> 
> On Wed, Jan 15, 2025 at 04:43:41AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, January 14, 2025 9:46 PM
> > >
> > > On Tue, Jan 14, 2025 at 08:10:41AM +0000, Tian, Kevin wrote:
> > >
> > > > > > +	ret = __iommu_set_group_pasid(domain, group, pasid, curr-
> > > > > >domain);
> > > > > > +	if (ret)
> > > > > > +		WARN_ON(handle != xa_store(&group->pasid_array,
> pasid,
> > > > > > +					   curr, GFP_KERNEL));
> > > > >
> > > > > I wonder about the ordering here, is it OK to have PRIs being
> > > > > delivered to a domain that failed to attach? What cleans up that race
> > > > > condition with domain free?
> > > > >
> > > > > Should we replace the domain then set the xarray? (and same
> ordering
> > > > > question for normal attach)
> > > >
> > > > That makes sense to me.
> > > >
> > > > But I don't think there is a problem with attach. xa_insert() will
> > > > return error if an entry already exists. So there won't be any
> > > > PRI being delivered at __iommu_set_group_pasid(), no matter
> > > > it succeeds or not.
> > >
> > > It has the same issue:
> > >
> > > 	ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
> > > 	if (ret)
> > > 		goto out_unlock;
> > >
> > > 	ret = __iommu_set_group_pasid(domain, group, pasid);
> > >         .. Concurrently a PRI event is pushed to the domain ..
> > > 	if (ret)
> > > 		xa_erase(&group->pasid_array, pasid);
> > >
> > >         .. Now what? Who fences the PRI event thread before the caller
> > > 	   frees the domain ..?
> > >
> > > We arranged things so that detatch would fence the PRI, if detach is
> > > not called then there is no fence..
> > >
> >
> > Though I'm fine to change the order in attach too as it looks more
> > reasonable logically, I'm trying to understand the actual impact of
> > the original order (e.g. is the change worth of a Fix tag?)
> >
> > If there is no detach happened before then it's the 1st attach to
> > a faultable domain and PRI will be enabled right before this function
> > hence no fence required.
> >
> > Would a sane device trigger PRI in this window?
> 
> I would say this is not a "sane" scenario, this is a theoretical race
> triggerable by the device. Perhaps a VFIO user can force the device to
> trigger this race and exploit the kernel.
> 
> > If it's a detach-then-attach flow, detach will do the fence anyway
> > before the attach.
> 
> The issue is the error, once we do the xa_insert() then any faults will
> get routed to our domain and the fault path threads will hold pointers
> to the domain. Once the xa_insert() is done we must flush the fault
> path threads before allowing the domain to be freed.
> 
> If __iommu_set_group_pasid() fails then we do an xa_erase() but
> nothing will flush the fault threads.
> 

If __iommu_set_group_pasid() fails iommufd will attempt to disable
PRI which reaches iopf_queue_remove_device(). The latter auto
responds to the list of pending faults.

But this is not a reliable assumption e.g. when multiple functions
(VFs and PF) share a single PRI entity.

So I agree it's conceptually clearer to swap the order of updating
xarray and doing attach.

btw in reality this won't trigger any issue on VT-d. The spec says
that PRI upon a non-present PASID entry (the state before attach
succeeds) is auto-responded by HW as 'Invalid Request'. So the
entire software faulting path won't triggered at all but this might
be a vendor specific behavior...



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-10  7:21         ` Tian, Kevin
@ 2025-01-16 10:00           ` Yi Liu
  0 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2025-01-16 10:00 UTC (permalink / raw)
  To: Tian, Kevin, Baolu Lu, Jason Gunthorpe
  Cc: joro@8bytes.org, eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On 2025/1/10 15:21, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Friday, January 10, 2025 10:32 AM
>>
>> On 1/9/25 22:43, Jason Gunthorpe wrote:
>>> On Thu, Jan 09, 2025 at 07:20:01AM +0000, Tian, Kevin wrote:
>>>>> From: Tian, Kevin
>>>>> Sent: Thursday, January 9, 2025 3:08 PM
>>>>>
>>>>>> From: Liu, Yi L<yi.l.liu@intel.com>
>>>>>> Sent: Thursday, December 19, 2024 9:28 PM
>>>>>>
>>>>>> +/**
>>>>>> + * iommu_replace_device_pasid - Replace the domain that a pasid is
>>>>>> attached to
>>>>>> + * @domain: the new iommu domain
>>>>>> + * @dev: the attached device.
>>>>>> + * @pasid: the pasid of the device.
>>>>>> + * @handle: the attach handle.
>>>>>> + *
>>>>>> + * This API allows the pasid to switch domains. Return 0 on success, or
>> an
>>>>>> + * error. The pasid will keep the old configuration if replacement failed.
>>>>>> + * This is supposed to be used by iommufd, and iommufd can
>> guarantee
>>>>>> that
>>>>>> + * both iommu_attach_device_pasid() and
>> iommu_replace_device_pasid()
>>>>>> would
>>>>>> + * pass in a valid @handle.
>>>>>> + */
>>>>> Better explain why a valid handle is required here.
>>>> Okay, it's because __iommu_set_group_pasid() requires the old domain
>> now
>>>> and the only way to retrieve it at this point is via a handle. It's probably
>> also
>>>> ok to directly store a domain pointer to the xarray when the handle is
>> missing
>>>> but that sounds more confusing.
>>> I had shared a xarray approach to do that at one point, it apparently
>>> was scary enough nobody picked it up 🙂
>>>
>>> https://lore.kernel.org/linux-iommu/20240322165927.GG66976@ziepe.ca/
>>
>> I evaluated it at that time and ultimately decided not using it. At that
>> time, there was no requirement for pasid replacement, so the 'old
>> domain' was not yet an issue. This is why I chose to allocate on demand
>> at that stage. :-)
>>
> 
> Not that scary when reading it again. 😊
> 
> Somehow less restrictions on the kAPI is worthy of a bit more work
> Inside the helpers. I won't be the last one to think about the difference
> why the pasid variants impose such restriction while it's not applied
> to RID.
> 
> Yi, could you take a look and think about any oversight of adopting it?

It sounds like the encoding scheme Jason once mentioned when I first asked
this. Then we would have the iommu_attach_device_pasid() and
iommu_attach_device_pasid_handle() pair just like the RID path.

I think it should be ok. iommufd pasid code will use the _handle() version
for the RID and PASID path. While other drivers in kernel will only use
the normal version. And I don't think we would add
iommu_replace_device_pasid() since no known drivers need it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 01/14] iommu: Introduce a replace API for device pasid
  2025-01-16  5:48             ` Tian, Kevin
@ 2025-01-17 10:32               ` Yi Liu
  0 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2025-01-17 10:32 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Lu Baolu
  Cc: joro@8bytes.org, baolu.lu@linux.intel.com, eric.auger@redhat.com,
	nicolinc@nvidia.com, chao.p.peng@linux.intel.com,
	iommu@lists.linux.dev, vasant.hegde@amd.com, will@kernel.org

On 2025/1/16 13:48, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Wednesday, January 15, 2025 10:44 PM
>>
>> On Wed, Jan 15, 2025 at 04:43:41AM +0000, Tian, Kevin wrote:
>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>> Sent: Tuesday, January 14, 2025 9:46 PM
>>>>
>>>> On Tue, Jan 14, 2025 at 08:10:41AM +0000, Tian, Kevin wrote:
>>>>
>>>>>>> +	ret = __iommu_set_group_pasid(domain, group, pasid, curr-
>>>>>>> domain);
>>>>>>> +	if (ret)
>>>>>>> +		WARN_ON(handle != xa_store(&group->pasid_array,
>> pasid,
>>>>>>> +					   curr, GFP_KERNEL));
>>>>>>
>>>>>> I wonder about the ordering here, is it OK to have PRIs being
>>>>>> delivered to a domain that failed to attach? What cleans up that race
>>>>>> condition with domain free?
>>>>>>
>>>>>> Should we replace the domain then set the xarray? (and same
>> ordering
>>>>>> question for normal attach)
>>>>>
>>>>> That makes sense to me.
>>>>>
>>>>> But I don't think there is a problem with attach. xa_insert() will
>>>>> return error if an entry already exists. So there won't be any
>>>>> PRI being delivered at __iommu_set_group_pasid(), no matter
>>>>> it succeeds or not.
>>>>
>>>> It has the same issue:
>>>>
>>>> 	ret = xa_insert(&group->pasid_array, pasid, handle, GFP_KERNEL);
>>>> 	if (ret)
>>>> 		goto out_unlock;
>>>>
>>>> 	ret = __iommu_set_group_pasid(domain, group, pasid);
>>>>          .. Concurrently a PRI event is pushed to the domain ..
>>>> 	if (ret)
>>>> 		xa_erase(&group->pasid_array, pasid);
>>>>
>>>>          .. Now what? Who fences the PRI event thread before the caller
>>>> 	   frees the domain ..?
>>>>
>>>> We arranged things so that detatch would fence the PRI, if detach is
>>>> not called then there is no fence..
>>>>
>>>
>>> Though I'm fine to change the order in attach too as it looks more
>>> reasonable logically, I'm trying to understand the actual impact of
>>> the original order (e.g. is the change worth of a Fix tag?)
>>>
>>> If there is no detach happened before then it's the 1st attach to
>>> a faultable domain and PRI will be enabled right before this function
>>> hence no fence required.
>>>
>>> Would a sane device trigger PRI in this window?
>>
>> I would say this is not a "sane" scenario, this is a theoretical race
>> triggerable by the device. Perhaps a VFIO user can force the device to
>> trigger this race and exploit the kernel.
>>
>>> If it's a detach-then-attach flow, detach will do the fence anyway
>>> before the attach.
>>
>> The issue is the error, once we do the xa_insert() then any faults will
>> get routed to our domain and the fault path threads will hold pointers
>> to the domain. Once the xa_insert() is done we must flush the fault
>> path threads before allowing the domain to be freed.
>>
>> If __iommu_set_group_pasid() fails then we do an xa_erase() but
>> nothing will flush the fault threads.
>>
> 
> If __iommu_set_group_pasid() fails iommufd will attempt to disable
> PRI which reaches iopf_queue_remove_device(). The latter auto
> responds to the list of pending faults.
> 
> But this is not a reliable assumption e.g. when multiple functions
> (VFs and PF) share a single PRI entity.

I think iommufd fault only supports PF so far. VF is not supported. :) But
it's better not make decisions based on this limitation when considering
the PRI flushing issue.

> So I agree it's conceptually clearer to swap the order of updating
> xarray and doing attach.
> 
> btw in reality this won't trigger any issue on VT-d. The spec says
> that PRI upon a non-present PASID entry (the state before attach
> succeeds) is auto-responded by HW as 'Invalid Request'. So the
> entire software faulting path won't triggered at all but this might
> be a vendor specific behavior...

I'll swap the order between the group->pasid_array and
__iommu_set_group_pasid() in both the PASID attach and replace path.

I think the RID attach/replace path have the same problem since the
PRI forwarding path also retrieves the iommu_attach_handle from the
group->pasid_array. Even worse, I didn't see the RID path set the
handle to group->pasid_array. @Baolu, perhaps you can help to fix
the RID path? :)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 06/14] iommufd: Mark PASID-compatible domain
  2025-01-09 14:54   ` Jason Gunthorpe
@ 2025-01-17 10:50     ` Yi Liu
  0 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2025-01-17 10:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, kevin.tian, baolu.lu, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will

On 2025/1/9 22:54, Jason Gunthorpe wrote:
> On Thu, Dec 19, 2024 at 05:27:38AM -0800, Yi Liu wrote:
>> AMD IOMMU requires attaching PASID-compatible domains to PASID-capable
>> devices. This includes the domains attached to RID and PASIDs. Related
>> discussions in link [1] and [2].  ARM has similar requirement but does
>> not need extra hint from iommufd,
> 
> ARM does use the hint now, exactly the same as AMD, it is merged:
> 
> https://lore.kernel.org/all/2-v1-0bb8d5313a27+27b-smmuv3_paging_flags_jgg@nvidia.com/

got it. Then I will mention both AMD and ARM have such a requirement, while
Intel just treat it as nop.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-01-10  7:34         ` Tian, Kevin
@ 2025-01-17 10:57           ` Yi Liu
  0 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2025-01-17 10:57 UTC (permalink / raw)
  To: Tian, Kevin, Baolu Lu, Jason Gunthorpe
  Cc: joro@8bytes.org, eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On 2025/1/10 15:34, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Friday, January 10, 2025 10:41 AM
>>
>> On 1/9/25 23:27, Jason Gunthorpe wrote:
>>> On Mon, Dec 23, 2024 at 10:51:50AM +0800, Baolu Lu wrote:
>>>>> @@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device
>> *dev, struct iommu_domain *parent,
>>>>>     	struct dmar_domain *domain;
>>>>>     	int ret;
>>>>> -	if (!nested_supported(iommu) || flags)
>>>>> +	if (!nested_supported(iommu) || flags &
>> ~IOMMU_HWPT_ALLOC_PASID)
>>>>>     		return ERR_PTR(-EOPNOTSUPP);
>>>>>     	/* Must be nested domain */
>>>> It's better to abort and fail a domain allocation when
>>>> IOMMU_HWPT_ALLOC_PASID is set but the iommu lacks pasid support?
>>> With multi-instances of iommus in the system you still want to succeed
>>> creating a PASID capable domain even on instances that don't support
>>> PASID since it may end up being used on an other instances too.
>>
>> Okay, so this flag is just a hint that "this domain will probably be
>> attached to a pasid of a device", NOT a requirement of "this domain
>> necessitates hardware support for the PASID feature".
>>
> 
> And note that pasid capability in VT-d just decides whether to serve
> DMA request with PASID. It doesn't change the format. It's legacy
> Mode vs. scalable mode for deciding the format, which is always
> pasid compatible once scalable mode is selected.

yes, pasid compatible is more about page table format. And I'd prefer to
treat this flag just as a nop since VT-d does not have the
aforementioned pasid-compatible limitation. So I may not add too much
words here. :)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2025-01-09  7:44   ` Tian, Kevin
@ 2025-01-17 12:33     ` Yi Liu
  2025-01-17 19:03       ` Nicolin Chen
  0 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2025-01-17 12:33 UTC (permalink / raw)
  To: Tian, Kevin, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On 2025/1/9 15:44, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, December 19, 2024 9:28 PM
>>
>>   static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
>>   					      struct iommufd_hw_pagetable
>> *hwpt,
>>   					      struct iommufd_hw_pagetable
>> *old)
>>   {
>> +	struct iommufd_attach_handle *curr;
>> +	int ret;
>> +
>>   	if (old->fault || hwpt->fault)
>>   		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
>>
>> -	return iommu_group_replace_domain(idev->igroup->group, hwpt-
>>> domain);
>> +	curr = iommufd_device_get_attach_handle(idev);
>> +
>> +	ret = iommufd_dev_replace_handle(idev, hwpt, old);
>> +	if (ret)
>> +		return ret;
>> +
>> +	kfree(curr);
>> +	return 0;
>>   }
>>
> 
> It's not balanced to have handle freed in different locations (one in
> Fault domain specific handler and the other here).
> 
> It sounds clearer to remove the fault domain specific helper and have
> everything together in this function when you move it to a C file.

sure. I'm also coordinating with Nic about this since he also needs to
make the handle code to be generic in both fault and non-fault path.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core
  2025-01-17 12:33     ` Yi Liu
@ 2025-01-17 19:03       ` Nicolin Chen
  0 siblings, 0 replies; 71+ messages in thread
From: Nicolin Chen @ 2025-01-17 19:03 UTC (permalink / raw)
  To: Yi Liu
  Cc: Tian, Kevin, joro@8bytes.org, jgg@nvidia.com,
	baolu.lu@linux.intel.com, eric.auger@redhat.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org

On Fri, Jan 17, 2025 at 08:33:43PM +0800, Yi Liu wrote:
> On 2025/1/9 15:44, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, December 19, 2024 9:28 PM
> > > 
> > >   static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
> > >   					      struct iommufd_hw_pagetable
> > > *hwpt,
> > >   					      struct iommufd_hw_pagetable
> > > *old)
> > >   {
> > > +	struct iommufd_attach_handle *curr;
> > > +	int ret;
> > > +
> > >   	if (old->fault || hwpt->fault)
> > >   		return iommufd_fault_domain_replace_dev(idev, hwpt, old);
> > > 
> > > -	return iommu_group_replace_domain(idev->igroup->group, hwpt-
> > > > domain);
> > > +	curr = iommufd_device_get_attach_handle(idev);
> > > +
> > > +	ret = iommufd_dev_replace_handle(idev, hwpt, old);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	kfree(curr);
> > > +	return 0;
> > >   }
> > > 
> > 
> > It's not balanced to have handle freed in different locations (one in
> > Fault domain specific handler and the other here).
> > 
> > It sounds clearer to remove the fault domain specific helper and have
> > everything together in this function when you move it to a C file.
> 
> sure. I'm also coordinating with Nic about this since he also needs to
> make the handle code to be generic in both fault and non-fault path.

Yea, it would look like the one in the MSI series [1]. A common
flow attaches/detaches/replaces the handles. If it's on a fault
patch, do the extra to enable/disable the fault specific stuff.

[1] https://lore.kernel.org/kvm/c708aedc678c63e2466b43ab9d4f8ac876e49aa1.1736550979.git.nicolinc@nvidia.com/

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2024-12-25  7:13           ` Baolu Lu
@ 2025-02-12  7:47             ` Yi Liu
  2025-02-12 12:59               ` Jason Gunthorpe
  2025-02-12 13:00               ` Robin Murphy
  0 siblings, 2 replies; 71+ messages in thread
From: Yi Liu @ 2025-02-12  7:47 UTC (permalink / raw)
  To: Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will,
	Suravee Suthikulpanit, Robin Murphy

On 2024/12/25 15:13, Baolu Lu wrote:
> On 2024/12/25 12:30, Yi Liu wrote:
>> On 2024/12/25 09:02, Baolu Lu wrote:
>>> On 12/24/24 19:35, Yi Liu wrote:
>>>>> Another related consideration is the support for page faults in nested
>>>>> domains once PASID is available in user space. Would it be reasonable to
>>>>> support page faults for nested domains?
>>>>
>>>> yeah, it's good to discuss it.
>>>>
>>>>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>>>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
>>>>
>>>> IMHO, the PRQ support for nested domains requires some more facility. PRQ
>>>> can happen at either stage-1 or stage-2, iommu driver may need to tell
>>>> it and forward to the correct domain (nested domain or parent domain). Or
>>>> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, any PRQ
>>>> happens under nested translation should be due to stage-1.
>>>
>>> The parent domain is currently always pinned and does not yet support
>>> page faults. Therefore, when a page fault occurs within a nested domain,
>>> it apparently should be routed to user space...
>>>
>>> If we decide to support page faults on the stage-2 domain in the future,
>>> we will need to figure out the correct destination of each page fault
>>> and route it accordingly, either to the parent domain or the user space
>>> nested domain. Hardware assistance would be beneficial, otherwise the
>>> software may need to traverse the parent domain, which is not
>>> performance friendly.
>>
>> this is my question. Is it still true the parent domain is always pinned
>> after the below series? If yes, then it's fine to enable IOPF for nested
>> domain.
>>
>> https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0- 
>> b696ca89ba29@kernel.org/
> 
> Then, perhaps we could enforce this in iommufd for a short-term purpose?
> 
> When allocating a hwpt in iommufd, we should enforce that the flags
> IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_FAULT_ID_VALID cannot be set
> at the same time.

Let's consult with other vendors. We need this enforcement because we lack
a straightforward method to distinguish between PRIs in stage-1 and stage-2
under nested translation. If other vendors share this requirement, it would
be appropriate to implement this enforcement in IOMMUFD. Otherwise, we may
check it in intel iommu driver.

@arm and amd folks. :)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-12  7:47             ` Yi Liu
@ 2025-02-12 12:59               ` Jason Gunthorpe
  2025-02-13  9:34                 ` Yi Liu
  2025-02-12 13:00               ` Robin Murphy
  1 sibling, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-02-12 12:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: Baolu Lu, joro, kevin.tian, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will, Suravee Suthikulpanit, Robin Murphy

On Wed, Feb 12, 2025 at 03:47:57PM +0800, Yi Liu wrote:
> Let's consult with other vendors. We need this enforcement because we lack
> a straightforward method to distinguish between PRIs in stage-1 and stage-2
> under nested translation. If other vendors share this requirement, it would
> be appropriate to implement this enforcement in IOMMUFD. Otherwise, we may
> check it in intel iommu driver.

I do expect HW to be able to distinguish S1/S2 faults for the purpose
of PRI, otherwise you cannot use non-present pages in the S2 at all,
which is something that is very desirable.

Eg AMD as a GN bit in the PAGE_SERVICE_REQUEST

ARM has a S2 bit in their F_TRANSLATION/etc

IMHO not being able to do this is an Intel limitation (that you should
get your HW team to fix)

However, right now there is no driver or core implementation for any
of this, so setting up a S2 with fault should be blocked in the core
code.

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-12  7:47             ` Yi Liu
  2025-02-12 12:59               ` Jason Gunthorpe
@ 2025-02-12 13:00               ` Robin Murphy
  2025-02-12 13:08                 ` Jason Gunthorpe
  2025-02-13 10:10                 ` Yi Liu
  1 sibling, 2 replies; 71+ messages in thread
From: Robin Murphy @ 2025-02-12 13:00 UTC (permalink / raw)
  To: Yi Liu, Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will,
	Suravee Suthikulpanit

On 2025-02-12 7:47 am, Yi Liu wrote:
> On 2024/12/25 15:13, Baolu Lu wrote:
>> On 2024/12/25 12:30, Yi Liu wrote:
>>> On 2024/12/25 09:02, Baolu Lu wrote:
>>>> On 12/24/24 19:35, Yi Liu wrote:
>>>>>> Another related consideration is the support for page faults in 
>>>>>> nested
>>>>>> domains once PASID is available in user space. Would it be 
>>>>>> reasonable to
>>>>>> support page faults for nested domains?
>>>>>
>>>>> yeah, it's good to discuss it.
>>>>>
>>>>>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>>>>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
>>>>>
>>>>> IMHO, the PRQ support for nested domains requires some more 
>>>>> facility. PRQ
>>>>> can happen at either stage-1 or stage-2, iommu driver may need to tell
>>>>> it and forward to the correct domain (nested domain or parent 
>>>>> domain). Or
>>>>> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, 
>>>>> any PRQ
>>>>> happens under nested translation should be due to stage-1.
>>>>
>>>> The parent domain is currently always pinned and does not yet support
>>>> page faults. Therefore, when a page fault occurs within a nested 
>>>> domain,
>>>> it apparently should be routed to user space...
>>>>
>>>> If we decide to support page faults on the stage-2 domain in the 
>>>> future,
>>>> we will need to figure out the correct destination of each page fault
>>>> and route it accordingly, either to the parent domain or the user space
>>>> nested domain. Hardware assistance would be beneficial, otherwise the
>>>> software may need to traverse the parent domain, which is not
>>>> performance friendly.
>>>
>>> this is my question. Is it still true the parent domain is always pinned
>>> after the below series? If yes, then it's fine to enable IOPF for nested
>>> domain.
>>>
>>> https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0- 
>>> b696ca89ba29@kernel.org/
>>
>> Then, perhaps we could enforce this in iommufd for a short-term purpose?
>>
>> When allocating a hwpt in iommufd, we should enforce that the flags
>> IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_FAULT_ID_VALID cannot be set
>> at the same time.
> 
> Let's consult with other vendors. We need this enforcement because we lack
> a straightforward method to distinguish between PRIs in stage-1 and stage-2
> under nested translation. If other vendors share this requirement, it would
> be appropriate to implement this enforcement in IOMMUFD. Otherwise, we may
> check it in intel iommu driver.
> 
> @arm and amd folks. :)

Yup, SMMUv3 is more or less in the same boat - we *could* reasonably 
manage unpinned S2 for non-PCI devices using the stall model where the 
F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS 
it's the same thing where by the time the fault has taken a round-trip 
through an ATS response and a PRI request, we've lost the details of 
exactly how it faulted (or if the PRI request was sent eagerly without a 
prior translation request, then we simply have no idea at all). Thus the 
prospective mechanism would be to inject a virtual PRI, wait until we 
see the guest issue a matching CMD_PRI_RESP, then sniff the IPA/GPA for 
the given input address out of the guest pagetables to see if there's 
anything to do at S2 as well/instead. Yuck.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-12 13:00               ` Robin Murphy
@ 2025-02-12 13:08                 ` Jason Gunthorpe
  2025-02-13 10:10                 ` Yi Liu
  1 sibling, 0 replies; 71+ messages in thread
From: Jason Gunthorpe @ 2025-02-12 13:08 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yi Liu, Baolu Lu, joro, kevin.tian, eric.auger, nicolinc,
	chao.p.peng, iommu, vasant.hegde, will, Suravee Suthikulpanit

On Wed, Feb 12, 2025 at 01:00:11PM +0000, Robin Murphy wrote:
> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably manage
> unpinned S2 for non-PCI devices using the stall model where the
> F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS it's
> the same thing where by the time the fault has taken a round-trip through an
> ATS response and a PRI request,

I belive there is interest in PCISIG about this topic.

> we've lost the details of exactly how it faulted (or if the PRI
> request was sent eagerly without a prior translation request, then
> we simply have no idea at all). Thus the prospective mechanism would
> be to inject a virtual PRI, wait until we see the guest issue a
> matching CMD_PRI_RESP, then sniff the IPA/GPA for the given input
> address out of the guest pagetables to see if there's anything to do
> at S2 as well/instead. Yuck.

I vaugely recall AMD HW will run the PRI through the page tables to
figure out if it is S1/S2... Vasant?

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-12 12:59               ` Jason Gunthorpe
@ 2025-02-13  9:34                 ` Yi Liu
  2025-02-13 12:56                   ` Jason Gunthorpe
  0 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2025-02-13  9:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, joro, kevin.tian, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will, Suravee Suthikulpanit, Robin Murphy

On 2025/2/12 20:59, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2025 at 03:47:57PM +0800, Yi Liu wrote:
>> Let's consult with other vendors. We need this enforcement because we lack
>> a straightforward method to distinguish between PRIs in stage-1 and stage-2
>> under nested translation. If other vendors share this requirement, it would
>> be appropriate to implement this enforcement in IOMMUFD. Otherwise, we may
>> check it in intel iommu driver.
> 
> I do expect HW to be able to distinguish S1/S2 faults for the purpose
> of PRI, otherwise you cannot use non-present pages in the S2 at all,
> which is something that is very desirable.
> 
> Eg AMD as a GN bit in the PAGE_SERVICE_REQUEST
> 
> ARM has a S2 bit in their F_TRANSLATION/etc

thanks for this info.

> IMHO not being able to do this is an Intel limitation (that you should
> get your HW team to fix)
> 
> However, right now there is no driver or core implementation for any
> of this, so setting up a S2 with fault should be blocked in the core
> code.

yes, as iommufd always pin and map user pages for the paging domain, so
I don't think there will be any PRIs on S2 due to non-present pages.
However, PRIs on S2 might occur due to insufficient permissions,
potentially caused by userspace. Nonetheless, I am skeptical about the
existence of such use cases. What do you think about it?

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-12 13:00               ` Robin Murphy
  2025-02-12 13:08                 ` Jason Gunthorpe
@ 2025-02-13 10:10                 ` Yi Liu
  2025-02-13 10:24                   ` Robin Murphy
  1 sibling, 1 reply; 71+ messages in thread
From: Yi Liu @ 2025-02-13 10:10 UTC (permalink / raw)
  To: Robin Murphy, Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will,
	Suravee Suthikulpanit

On 2025/2/12 21:00, Robin Murphy wrote:
> On 2025-02-12 7:47 am, Yi Liu wrote:
>> On 2024/12/25 15:13, Baolu Lu wrote:
>>> On 2024/12/25 12:30, Yi Liu wrote:
>>>> On 2024/12/25 09:02, Baolu Lu wrote:
>>>>> On 12/24/24 19:35, Yi Liu wrote:
>>>>>>> Another related consideration is the support for page faults in nested
>>>>>>> domains once PASID is available in user space. Would it be 
>>>>>>> reasonable to
>>>>>>> support page faults for nested domains?
>>>>>>
>>>>>> yeah, it's good to discuss it.
>>>>>>
>>>>>>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>>>>>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
>>>>>>
>>>>>> IMHO, the PRQ support for nested domains requires some more facility. 
>>>>>> PRQ
>>>>>> can happen at either stage-1 or stage-2, iommu driver may need to tell
>>>>>> it and forward to the correct domain (nested domain or parent 
>>>>>> domain). Or
>>>>>> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, 
>>>>>> any PRQ
>>>>>> happens under nested translation should be due to stage-1.
>>>>>
>>>>> The parent domain is currently always pinned and does not yet support
>>>>> page faults. Therefore, when a page fault occurs within a nested domain,
>>>>> it apparently should be routed to user space...
>>>>>
>>>>> If we decide to support page faults on the stage-2 domain in the future,
>>>>> we will need to figure out the correct destination of each page fault
>>>>> and route it accordingly, either to the parent domain or the user space
>>>>> nested domain. Hardware assistance would be beneficial, otherwise the
>>>>> software may need to traverse the parent domain, which is not
>>>>> performance friendly.
>>>>
>>>> this is my question. Is it still true the parent domain is always pinned
>>>> after the below series? If yes, then it's fine to enable IOPF for nested
>>>> domain.
>>>>
>>>> https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0- 
>>>> b696ca89ba29@kernel.org/
>>>
>>> Then, perhaps we could enforce this in iommufd for a short-term purpose?
>>>
>>> When allocating a hwpt in iommufd, we should enforce that the flags
>>> IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_FAULT_ID_VALID cannot be set
>>> at the same time.
>>
>> Let's consult with other vendors. We need this enforcement because we lack
>> a straightforward method to distinguish between PRIs in stage-1 and stage-2
>> under nested translation. If other vendors share this requirement, it would
>> be appropriate to implement this enforcement in IOMMUFD. Otherwise, we may
>> check it in intel iommu driver.
>>
>> @arm and amd folks. :)
> 
> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably manage 
> unpinned S2 for non-PCI devices using the stall model where the 
> F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS it's 
> the same thing where by the time the fault has taken a round-trip through 
> an ATS response and a PRI request, we've lost the details of exactly how it 
> faulted (or if the PRI request was sent eagerly without a prior translation 
> request, then we simply have no idea at all). Thus the prospective 
> mechanism would be to inject a virtual PRI, wait until we see the guest 
> issue a matching CMD_PRI_RESP, then sniff the IPA/GPA for the given input 
> address out of the guest pagetables to see if there's anything to do at S2 
> as well/instead. Yuck.

Thanks for the response, Robin. It seems we're in the same situation. :(
Out of curiosity, will the vPRI response provide the IPA/GPA to the
hypervisor? If not, the hypervisor will need to derive it from the GVA on
its own, possibly requiring software to traverse the guest page table.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-13 10:10                 ` Yi Liu
@ 2025-02-13 10:24                   ` Robin Murphy
  2025-02-13 12:53                     ` Yi Liu
  2025-02-19  8:02                     ` Tian, Kevin
  0 siblings, 2 replies; 71+ messages in thread
From: Robin Murphy @ 2025-02-13 10:24 UTC (permalink / raw)
  To: Yi Liu, Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will,
	Suravee Suthikulpanit

On 2025-02-13 10:10 am, Yi Liu wrote:
> On 2025/2/12 21:00, Robin Murphy wrote:
>> On 2025-02-12 7:47 am, Yi Liu wrote:
>>> On 2024/12/25 15:13, Baolu Lu wrote:
>>>> On 2024/12/25 12:30, Yi Liu wrote:
>>>>> On 2024/12/25 09:02, Baolu Lu wrote:
>>>>>> On 12/24/24 19:35, Yi Liu wrote:
>>>>>>>> Another related consideration is the support for page faults in 
>>>>>>>> nested
>>>>>>>> domains once PASID is available in user space. Would it be 
>>>>>>>> reasonable to
>>>>>>>> support page faults for nested domains?
>>>>>>>
>>>>>>> yeah, it's good to discuss it.
>>>>>>>
>>>>>>>> If so, perhaps it's time to open 
>>>>>>>> intel_iommu_domain_alloc_nested() to
>>>>>>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the 
>>>>>>>> device?
>>>>>>>
>>>>>>> IMHO, the PRQ support for nested domains requires some more 
>>>>>>> facility. PRQ
>>>>>>> can happen at either stage-1 or stage-2, iommu driver may need to 
>>>>>>> tell
>>>>>>> it and forward to the correct domain (nested domain or parent 
>>>>>>> domain). Or
>>>>>>> the stage-2 is always pinned just as the VFIO/iommufd does. 
>>>>>>> Hence, any PRQ
>>>>>>> happens under nested translation should be due to stage-1.
>>>>>>
>>>>>> The parent domain is currently always pinned and does not yet support
>>>>>> page faults. Therefore, when a page fault occurs within a nested 
>>>>>> domain,
>>>>>> it apparently should be routed to user space...
>>>>>>
>>>>>> If we decide to support page faults on the stage-2 domain in the 
>>>>>> future,
>>>>>> we will need to figure out the correct destination of each page fault
>>>>>> and route it accordingly, either to the parent domain or the user 
>>>>>> space
>>>>>> nested domain. Hardware assistance would be beneficial, otherwise the
>>>>>> software may need to traverse the parent domain, which is not
>>>>>> performance friendly.
>>>>>
>>>>> this is my question. Is it still true the parent domain is always 
>>>>> pinned
>>>>> after the below series? If yes, then it's fine to enable IOPF for 
>>>>> nested
>>>>> domain.
>>>>>
>>>>> https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0- 
>>>>> b696ca89ba29@kernel.org/
>>>>
>>>> Then, perhaps we could enforce this in iommufd for a short-term 
>>>> purpose?
>>>>
>>>> When allocating a hwpt in iommufd, we should enforce that the flags
>>>> IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_FAULT_ID_VALID cannot be 
>>>> set
>>>> at the same time.
>>>
>>> Let's consult with other vendors. We need this enforcement because we 
>>> lack
>>> a straightforward method to distinguish between PRIs in stage-1 and 
>>> stage-2
>>> under nested translation. If other vendors share this requirement, it 
>>> would
>>> be appropriate to implement this enforcement in IOMMUFD. Otherwise, 
>>> we may
>>> check it in intel iommu driver.
>>>
>>> @arm and amd folks. :)
>>
>> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably 
>> manage unpinned S2 for non-PCI devices using the stall model where the 
>> F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS 
>> it's the same thing where by the time the fault has taken a round-trip 
>> through an ATS response and a PRI request, we've lost the details of 
>> exactly how it faulted (or if the PRI request was sent eagerly without 
>> a prior translation request, then we simply have no idea at all). Thus 
>> the prospective mechanism would be to inject a virtual PRI, wait until 
>> we see the guest issue a matching CMD_PRI_RESP, then sniff the IPA/GPA 
>> for the given input address out of the guest pagetables to see if 
>> there's anything to do at S2 as well/instead. Yuck.
> 
> Thanks for the response, Robin. It seems we're in the same situation. :(
> Out of curiosity, will the vPRI response provide the IPA/GPA to the
> hypervisor? If not, the hypervisor will need to derive it from the GVA on
> its own, possibly requiring software to traverse the guest page table.

No, the PRI response command only contains a Page Request Group Index, 
so the hypervisor would then have to look up all the outstanding 
requests with that index to retrieve their input addresses and then 
translate them. The SMMU architecture does technically accommodate a 
hardware-assisted translation feature (ATOS), but it's optional and 
rarely implemented - certainly Arm's implementations don't - so in 
reality that is indeed a going to mean software walks as well.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-13 10:24                   ` Robin Murphy
@ 2025-02-13 12:53                     ` Yi Liu
  2025-02-19  8:02                     ` Tian, Kevin
  1 sibling, 0 replies; 71+ messages in thread
From: Yi Liu @ 2025-02-13 12:53 UTC (permalink / raw)
  To: Robin Murphy, Baolu Lu, joro, jgg, kevin.tian
  Cc: eric.auger, nicolinc, chao.p.peng, iommu, vasant.hegde, will,
	Suravee Suthikulpanit

On 2025/2/13 18:24, Robin Murphy wrote:
> On 2025-02-13 10:10 am, Yi Liu wrote:
>> On 2025/2/12 21:00, Robin Murphy wrote:
>>> On 2025-02-12 7:47 am, Yi Liu wrote:
>>>> On 2024/12/25 15:13, Baolu Lu wrote:
>>>>> On 2024/12/25 12:30, Yi Liu wrote:
>>>>>> On 2024/12/25 09:02, Baolu Lu wrote:
>>>>>>> On 12/24/24 19:35, Yi Liu wrote:
>>>>>>>>> Another related consideration is the support for page faults in 
>>>>>>>>> nested
>>>>>>>>> domains once PASID is available in user space. Would it be 
>>>>>>>>> reasonable to
>>>>>>>>> support page faults for nested domains?
>>>>>>>>
>>>>>>>> yeah, it's good to discuss it.
>>>>>>>>
>>>>>>>>> If so, perhaps it's time to open intel_iommu_domain_alloc_nested() to
>>>>>>>>> support IOMMU_HWPT_FAULT_ID_VALID when PRI is enabled on the device?
>>>>>>>>
>>>>>>>> IMHO, the PRQ support for nested domains requires some more 
>>>>>>>> facility. PRQ
>>>>>>>> can happen at either stage-1 or stage-2, iommu driver may need to tell
>>>>>>>> it and forward to the correct domain (nested domain or parent 
>>>>>>>> domain). Or
>>>>>>>> the stage-2 is always pinned just as the VFIO/iommufd does. Hence, 
>>>>>>>> any PRQ
>>>>>>>> happens under nested translation should be due to stage-1.
>>>>>>>
>>>>>>> The parent domain is currently always pinned and does not yet support
>>>>>>> page faults. Therefore, when a page fault occurs within a nested 
>>>>>>> domain,
>>>>>>> it apparently should be routed to user space...
>>>>>>>
>>>>>>> If we decide to support page faults on the stage-2 domain in the 
>>>>>>> future,
>>>>>>> we will need to figure out the correct destination of each page fault
>>>>>>> and route it accordingly, either to the parent domain or the user space
>>>>>>> nested domain. Hardware assistance would be beneficial, otherwise the
>>>>>>> software may need to traverse the parent domain, which is not
>>>>>>> performance friendly.
>>>>>>
>>>>>> this is my question. Is it still true the parent domain is always pinned
>>>>>> after the below series? If yes, then it's fine to enable IOPF for nested
>>>>>> domain.
>>>>>>
>>>>>> https://lore.kernel.org/linux-iommu/20241015-jag-iopfv8-v4-0- 
>>>>>> b696ca89ba29@kernel.org/
>>>>>
>>>>> Then, perhaps we could enforce this in iommufd for a short-term purpose?
>>>>>
>>>>> When allocating a hwpt in iommufd, we should enforce that the flags
>>>>> IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_FAULT_ID_VALID cannot be set
>>>>> at the same time.
>>>>
>>>> Let's consult with other vendors. We need this enforcement because we lack
>>>> a straightforward method to distinguish between PRIs in stage-1 and 
>>>> stage-2
>>>> under nested translation. If other vendors share this requirement, it 
>>>> would
>>>> be appropriate to implement this enforcement in IOMMUFD. Otherwise, we may
>>>> check it in intel iommu driver.
>>>>
>>>> @arm and amd folks. :)
>>>
>>> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably 
>>> manage unpinned S2 for non-PCI devices using the stall model where the 
>>> F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS 
>>> it's the same thing where by the time the fault has taken a round-trip 
>>> through an ATS response and a PRI request, we've lost the details of 
>>> exactly how it faulted (or if the PRI request was sent eagerly without a 
>>> prior translation request, then we simply have no idea at all). Thus the 
>>> prospective mechanism would be to inject a virtual PRI, wait until we 
>>> see the guest issue a matching CMD_PRI_RESP, then sniff the IPA/GPA for 
>>> the given input address out of the guest pagetables to see if there's 
>>> anything to do at S2 as well/instead. Yuck.
>>
>> Thanks for the response, Robin. It seems we're in the same situation. :(
>> Out of curiosity, will the vPRI response provide the IPA/GPA to the
>> hypervisor? If not, the hypervisor will need to derive it from the GVA on
>> its own, possibly requiring software to traverse the guest page table.
> 
> No, the PRI response command only contains a Page Request Group Index, so 
> the hypervisor would then have to look up all the outstanding requests with 
> that index to retrieve their input addresses and then translate them. The 
> SMMU architecture does technically accommodate a hardware-assisted 
> translation feature (ATOS), but it's optional and rarely implemented - 
> certainly Arm's implementations don't - so in reality that is indeed a 
> going to mean software walks as well.

got it. thanks for the explanation.:)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-13  9:34                 ` Yi Liu
@ 2025-02-13 12:56                   ` Jason Gunthorpe
  2025-02-14  3:24                     ` Yi Liu
  0 siblings, 1 reply; 71+ messages in thread
From: Jason Gunthorpe @ 2025-02-13 12:56 UTC (permalink / raw)
  To: Yi Liu
  Cc: Baolu Lu, joro, kevin.tian, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will, Suravee Suthikulpanit, Robin Murphy

On Thu, Feb 13, 2025 at 05:34:52PM +0800, Yi Liu wrote:
> yes, as iommufd always pin and map user pages for the paging domain, so
> I don't think there will be any PRIs on S2 due to non-present pages.

That isn't strictly true, a misbehaving guest could deliberately setup
a S1 that points to a non-present S2. Ideally this would cause no PRI
into the guest and the VMM would fault the device.

> However, PRIs on S2 might occur due to insufficient permissions,
> potentially caused by userspace. Nonetheless, I am skeptical about the
> existence of such use cases. What do you think about it?

I suspect it is "OK", we already have to be safe against a misbehaving
guest creating a PRI storm, if a misbehaving guest does it via
pointing to the wrong S2 or just not updating the S1 doesn't seem to
make much difference..

IMHO it is desirable for the HW to provide a way to disambiguate the
PRI.

Jason

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-13 12:56                   ` Jason Gunthorpe
@ 2025-02-14  3:24                     ` Yi Liu
  0 siblings, 0 replies; 71+ messages in thread
From: Yi Liu @ 2025-02-14  3:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, joro, kevin.tian, eric.auger, nicolinc, chao.p.peng,
	iommu, vasant.hegde, will, Suravee Suthikulpanit, Robin Murphy

On 2025/2/13 20:56, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2025 at 05:34:52PM +0800, Yi Liu wrote:
>> yes, as iommufd always pin and map user pages for the paging domain, so
>> I don't think there will be any PRIs on S2 due to non-present pages.
> 
> That isn't strictly true, a misbehaving guest could deliberately setup
> a S1 that points to a non-present S2. Ideally this would cause no PRI
> into the guest and the VMM would fault the device.

yes, malicious guest can surely do it.

>> However, PRIs on S2 might occur due to insufficient permissions,
>> potentially caused by userspace. Nonetheless, I am skeptical about the
>> existence of such use cases. What do you think about it?
> 
> I suspect it is "OK", we already have to be safe against a misbehaving
> guest creating a PRI storm, if a misbehaving guest does it via
> pointing to the wrong S2 or just not updating the S1 doesn't seem to
> make much difference..

ok, given these facts, we can consider iommufd does not support valid PRI
on S2 for now. Hence adding the aforementioned enforcement is acceptable.

> IMHO it is desirable for the HW to provide a way to disambiguate the
> PRI.

yes.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-13 10:24                   ` Robin Murphy
  2025-02-13 12:53                     ` Yi Liu
@ 2025-02-19  8:02                     ` Tian, Kevin
  2025-02-19 12:50                       ` Yi Liu
  1 sibling, 1 reply; 71+ messages in thread
From: Tian, Kevin @ 2025-02-19  8:02 UTC (permalink / raw)
  To: Robin Murphy, Liu, Yi L, Baolu Lu, joro@8bytes.org,
	jgg@nvidia.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org, Suravee Suthikulpanit

> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Thursday, February 13, 2025 6:25 PM
> 
> On 2025-02-13 10:10 am, Yi Liu wrote:
> > On 2025/2/12 21:00, Robin Murphy wrote:
> >>
> >> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably
> >> manage unpinned S2 for non-PCI devices using the stall model where the
> >> F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS
> >> it's the same thing where by the time the fault has taken a round-trip
> >> through an ATS response and a PRI request, we've lost the details of
> >> exactly how it faulted (or if the PRI request was sent eagerly without
> >> a prior translation request, then we simply have no idea at all). Thus
> >> the prospective mechanism would be to inject a virtual PRI, wait until
> >> we see the guest issue a matching CMD_PRI_RESP, then sniff the IPA/GPA
> >> for the given input address out of the guest pagetables to see if
> >> there's anything to do at S2 as well/instead. Yuck.
> >
> > Thanks for the response, Robin. It seems we're in the same situation. :(
> > Out of curiosity, will the vPRI response provide the IPA/GPA to the
> > hypervisor? If not, the hypervisor will need to derive it from the GVA on
> > its own, possibly requiring software to traverse the guest page table.
> 
> No, the PRI response command only contains a Page Request Group Index,
> so the hypervisor would then have to look up all the outstanding
> requests with that index to retrieve their input addresses and then
> translate them. The SMMU architecture does technically accommodate a
> hardware-assisted translation feature (ATOS), but it's optional and
> rarely implemented - certainly Arm's implementations don't - so in
> reality that is indeed a going to mean software walks as well.
> 

In cases where software walks are inevitable, it's probably more
efficient to do the walk early to figure out whether it's S1 or S2
fault (then only forward s1 fault to guest), than always injecting
a virtual PRI to guest and then figuring out whether s2 should be
fixed too after receiving vPRI response.

Software walks are costly. Prefer to not supporting S2 faulting in
nested configuration until there is proper support from hw to
report the level (even a simple hw-conducted walking is more
efficient than doing it in sw).

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-19  8:02                     ` Tian, Kevin
@ 2025-02-19 12:50                       ` Yi Liu
  2025-02-20  6:57                         ` Tian, Kevin
  0 siblings, 1 reply; 71+ messages in thread
From: Yi Liu @ 2025-02-19 12:50 UTC (permalink / raw)
  To: Tian, Kevin, Robin Murphy, Baolu Lu, joro@8bytes.org,
	jgg@nvidia.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org, Suravee Suthikulpanit

On 2025/2/19 16:02, Tian, Kevin wrote:
>> From: Robin Murphy <robin.murphy@arm.com>
>> Sent: Thursday, February 13, 2025 6:25 PM
>>
>> On 2025-02-13 10:10 am, Yi Liu wrote:
>>> On 2025/2/12 21:00, Robin Murphy wrote:
>>>>
>>>> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably
>>>> manage unpinned S2 for non-PCI devices using the stall model where the
>>>> F_TRANSLATION or F_PERMISSION event tells us all we need, but for ATS
>>>> it's the same thing where by the time the fault has taken a round-trip
>>>> through an ATS response and a PRI request, we've lost the details of
>>>> exactly how it faulted (or if the PRI request was sent eagerly without
>>>> a prior translation request, then we simply have no idea at all). Thus
>>>> the prospective mechanism would be to inject a virtual PRI, wait until
>>>> we see the guest issue a matching CMD_PRI_RESP, then sniff the IPA/GPA
>>>> for the given input address out of the guest pagetables to see if
>>>> there's anything to do at S2 as well/instead. Yuck.
>>>
>>> Thanks for the response, Robin. It seems we're in the same situation. :(
>>> Out of curiosity, will the vPRI response provide the IPA/GPA to the
>>> hypervisor? If not, the hypervisor will need to derive it from the GVA on
>>> its own, possibly requiring software to traverse the guest page table.
>>
>> No, the PRI response command only contains a Page Request Group Index,
>> so the hypervisor would then have to look up all the outstanding
>> requests with that index to retrieve their input addresses and then
>> translate them. The SMMU architecture does technically accommodate a
>> hardware-assisted translation feature (ATOS), but it's optional and
>> rarely implemented - certainly Arm's implementations don't - so in
>> reality that is indeed a going to mean software walks as well.
>>
> 
> In cases where software walks are inevitable, it's probably more
> efficient to do the walk early to figure out whether it's S1 or S2
> fault (then only forward s1 fault to guest), than always injecting
> a virtual PRI to guest and then figuring out whether s2 should be
> fixed too after receiving vPRI response.
> 
> Software walks are costly. Prefer to not supporting S2 faulting in
> nested configuration until there is proper support from hw to
> report the level (even a simple hw-conducted walking is more
> efficient than doing it in sw).

The AMD IOMMU specification includes a bit to indicate whether the PRI is 
S2 or S1. However, the AMD IOMMU driver treats GN==0 as an invalid PRI.
Given the fact, looks like we don't have S2 PRI support in Linux so far.

Spec (Revision 3.09):
Table 74: PAGE_SERVICE_REQUEST PPR Log Buffer Entry Fields

GN: Guest/nested. 1=Address[63:12] is a GVA and PASID is valid. 
0=Address[63:12] is a
GPA and PASID should be ignored by software.

Code:
static bool ppr_is_valid(struct amd_iommu *iommu, u64 *raw)
{
	struct device *dev = iommu->iommu.dev;
	u16 devid = PPR_DEVID(raw[0]);

	if (!(PPR_FLAGS(raw[0]) & PPR_FLAG_GN)) {
		dev_dbg(dev, "PPR logged [Request ignored due to GN=0 
(device=%04x:%02x:%02x.%x "
			"pasid=0x%05llx address=0x%llx flags=0x%04llx tag=0x%03llx]\n",
			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
			PPR_PASID(raw[0]), raw[1], PPR_FLAGS(raw[0]), PPR_TAG(raw[0]));
		return false;
	}

https://github.com/torvalds/linux/blob/master/drivers/iommu/amd/ppr.c#L86

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* RE: [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-02-19 12:50                       ` Yi Liu
@ 2025-02-20  6:57                         ` Tian, Kevin
  0 siblings, 0 replies; 71+ messages in thread
From: Tian, Kevin @ 2025-02-20  6:57 UTC (permalink / raw)
  To: Liu, Yi L, Robin Murphy, Baolu Lu, joro@8bytes.org,
	jgg@nvidia.com
  Cc: eric.auger@redhat.com, nicolinc@nvidia.com,
	chao.p.peng@linux.intel.com, iommu@lists.linux.dev,
	vasant.hegde@amd.com, will@kernel.org, Suravee Suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, February 19, 2025 8:51 PM
> 
> On 2025/2/19 16:02, Tian, Kevin wrote:
> >> From: Robin Murphy <robin.murphy@arm.com>
> >> Sent: Thursday, February 13, 2025 6:25 PM
> >>
> >> On 2025-02-13 10:10 am, Yi Liu wrote:
> >>> On 2025/2/12 21:00, Robin Murphy wrote:
> >>>>
> >>>> Yup, SMMUv3 is more or less in the same boat - we *could* reasonably
> >>>> manage unpinned S2 for non-PCI devices using the stall model where
> the
> >>>> F_TRANSLATION or F_PERMISSION event tells us all we need, but for
> ATS
> >>>> it's the same thing where by the time the fault has taken a round-trip
> >>>> through an ATS response and a PRI request, we've lost the details of
> >>>> exactly how it faulted (or if the PRI request was sent eagerly without
> >>>> a prior translation request, then we simply have no idea at all). Thus
> >>>> the prospective mechanism would be to inject a virtual PRI, wait until
> >>>> we see the guest issue a matching CMD_PRI_RESP, then sniff the
> IPA/GPA
> >>>> for the given input address out of the guest pagetables to see if
> >>>> there's anything to do at S2 as well/instead. Yuck.
> >>>
> >>> Thanks for the response, Robin. It seems we're in the same situation. :(
> >>> Out of curiosity, will the vPRI response provide the IPA/GPA to the
> >>> hypervisor? If not, the hypervisor will need to derive it from the GVA on
> >>> its own, possibly requiring software to traverse the guest page table.
> >>
> >> No, the PRI response command only contains a Page Request Group Index,
> >> so the hypervisor would then have to look up all the outstanding
> >> requests with that index to retrieve their input addresses and then
> >> translate them. The SMMU architecture does technically accommodate a
> >> hardware-assisted translation feature (ATOS), but it's optional and
> >> rarely implemented - certainly Arm's implementations don't - so in
> >> reality that is indeed a going to mean software walks as well.
> >>
> >
> > In cases where software walks are inevitable, it's probably more
> > efficient to do the walk early to figure out whether it's S1 or S2
> > fault (then only forward s1 fault to guest), than always injecting
> > a virtual PRI to guest and then figuring out whether s2 should be
> > fixed too after receiving vPRI response.
> >
> > Software walks are costly. Prefer to not supporting S2 faulting in
> > nested configuration until there is proper support from hw to
> > report the level (even a simple hw-conducted walking is more
> > efficient than doing it in sw).
> 
> The AMD IOMMU specification includes a bit to indicate whether the PRI is
> S2 or S1. However, the AMD IOMMU driver treats GN==0 as an invalid PRI.
> Given the fact, looks like we don't have S2 PRI support in Linux so far.
> 

that's for sure given we don't have such support in core yet. 😊

so the AMD driver treats GN=0 as an error given it shouldn't happen
in supported configurations.

If someone wants to support GN=0 then he will need to add required
logic in iommufd/iommu and then the AMD driver. For now it's safe
to explicitly disallow S2 faulting in nesting.

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2025-02-20  6:57 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-19 13:27 [PATCH v6 00/14] iommufd support pasid attach/replace Yi Liu
2024-12-19 13:27 ` [PATCH v6 01/14] iommu: Introduce a replace API for device pasid Yi Liu
2024-12-20  2:47   ` Baolu Lu
2025-01-09  7:08   ` Tian, Kevin
2025-01-09  7:20   ` Tian, Kevin
2025-01-09 14:43     ` Jason Gunthorpe
2025-01-10  2:31       ` Baolu Lu
2025-01-10  7:21         ` Tian, Kevin
2025-01-16 10:00           ` Yi Liu
2025-01-13 20:21   ` Jason Gunthorpe
2025-01-14  8:10     ` Tian, Kevin
2025-01-14 13:45       ` Jason Gunthorpe
2025-01-15  4:43         ` Tian, Kevin
2025-01-15 14:43           ` Jason Gunthorpe
2025-01-16  5:48             ` Tian, Kevin
2025-01-17 10:32               ` Yi Liu
2024-12-19 13:27 ` [PATCH v6 02/14] iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle() Yi Liu
2024-12-19 13:27 ` [PATCH v6 03/14] iommufd: Move the iommufd_handle helpers to device.c Yi Liu
2024-12-20  3:31   ` Baolu Lu
2024-12-20  6:34     ` Yi Liu
2024-12-19 13:27 ` [PATCH v6 04/14] iommufd: Always pass iommu_attach_handle to iommu core Yi Liu
2024-12-20  4:35   ` Nicolin Chen
2024-12-20  6:40     ` Yi Liu
2024-12-20  6:58       ` Nicolin Chen
2025-01-09  7:44   ` Tian, Kevin
2025-01-17 12:33     ` Yi Liu
2025-01-17 19:03       ` Nicolin Chen
2024-12-19 13:27 ` [PATCH v6 05/14] iommufd: Pass pasid through the device attach/replace path Yi Liu
2025-01-09  7:53   ` Tian, Kevin
2025-01-09 14:51     ` Jason Gunthorpe
2025-01-10  7:22       ` Tian, Kevin
2024-12-19 13:27 ` [PATCH v6 06/14] iommufd: Mark PASID-compatible domain Yi Liu
2025-01-09  7:56   ` Tian, Kevin
2025-01-09 14:54   ` Jason Gunthorpe
2025-01-17 10:50     ` Yi Liu
2024-12-19 13:27 ` [PATCH v6 07/14] iommufd: Support pasid attach/replace Yi Liu
2025-01-09  8:25   ` Tian, Kevin
2024-12-19 13:27 ` [PATCH v6 08/14] iommufd: Enforce PASID-compatible domain for RID Yi Liu
2025-01-09  8:31   ` Tian, Kevin
2024-12-19 13:27 ` [PATCH v6 09/14] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
2024-12-23  2:51   ` Baolu Lu
2024-12-24 11:35     ` Yi Liu
2024-12-25  1:02       ` Baolu Lu
2024-12-25  4:30         ` Yi Liu
2024-12-25  7:13           ` Baolu Lu
2025-02-12  7:47             ` Yi Liu
2025-02-12 12:59               ` Jason Gunthorpe
2025-02-13  9:34                 ` Yi Liu
2025-02-13 12:56                   ` Jason Gunthorpe
2025-02-14  3:24                     ` Yi Liu
2025-02-12 13:00               ` Robin Murphy
2025-02-12 13:08                 ` Jason Gunthorpe
2025-02-13 10:10                 ` Yi Liu
2025-02-13 10:24                   ` Robin Murphy
2025-02-13 12:53                     ` Yi Liu
2025-02-19  8:02                     ` Tian, Kevin
2025-02-19 12:50                       ` Yi Liu
2025-02-20  6:57                         ` Tian, Kevin
2025-01-09 15:27     ` Jason Gunthorpe
2025-01-10  2:41       ` Baolu Lu
2025-01-10  7:34         ` Tian, Kevin
2025-01-17 10:57           ` Yi Liu
2025-01-10  7:38   ` Tian, Kevin
2025-01-14  8:13     ` Tian, Kevin
2025-01-13 20:31   ` Jason Gunthorpe
2025-01-14  8:19     ` Tian, Kevin
2024-12-19 13:27 ` [PATCH v6 10/14] iommufd: Allow allocating PASID-compatible domain Yi Liu
2024-12-19 13:27 ` [PATCH v6 11/14] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
2024-12-19 13:27 ` [PATCH v6 12/14] iommufd/selftest: Add a helper to get test device Yi Liu
2024-12-19 13:27 ` [PATCH v6 13/14] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
2024-12-19 13:27 ` [PATCH v6 14/14] iommufd/selftest: Add coverage for iommufd " Yi Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.