All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/18] iommufd support pasid attach/replace
@ 2025-03-20 13:47 Yi Liu
  2025-03-20 13:47 ` [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle Yi Liu
                   ` (18 more replies)
  0 siblings, 19 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

PASID (Process Address Space ID) is a PCIe extension that tags the DMA
transactions from a physical device. Most modern IOMMU hardware supports
PASID-granular address translation. This allows a PASID-capable device
to be attached to multiple hardware page tables (hwpts, also known as
domains), with each attachment tagged by a PASID.

This series builds on previous series [1]. It begins by adding a missing
IOMMU API to replace the domain for a PASID. Utilizing the IOMMU PASID
attach/replace/detach APIs, this series introduces iommufd APIs for device
drivers to attach, replace, or detach PASIDs to/from hwpts at the request
of userspace. It also enforces PASID compatibility with domain requirements,
allocates PASID-compatible hwpts in iommufd, and includes self-tests to
validate the iommufd APIs.

The complete code is available at the following link [2]. Please note that
the existing iommufd self-test was broken, and a temporary fix patch is at
the top of the branch [2]. If you wish to run the iommufd self-test, please
apply that fix. We apologize for any inconvenience.

The series is based on Jason's for-next branch plus one more patch [3].

https://web.git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/commit/?h=for-next&id=a05df03a88bc1088be8e9d958f208d6484691e43

[1] https://lore.kernel.org/linux-iommu/20250226011849.5102-1-yi.l.liu@intel.com/
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
[3] https://lore.kernel.org/linux-iommu/20250306034842.5950-1-yi.l.liu@intel.com/

Change log:

v10:
 - Drop patch 01/02 of v9 as it cannot make the handle be reused safetly. As a
   result the patch 08, 09 and 11 are also dropped as handle cannot be reused.
 - Add patch 01 to highlight handle cannot be reused.
 - Introduce a new data structure for tracking per-PASID attachs. The igroup->hwpt
   and igroup->device_list are moved to the new structure.
 - Rename patch 07 of v9 to be patch 06 of this series
 - Patch 02, 08, 09 and 11 of this series have minor tweaks due to the drop of
   reusing handle.
 - Refine the description of the ALLOC_PASID flag in patch 14 of this series.
 - Add a MOCK_PASID_WIDTH to avoid using 20 everywhere (Nic)

v9: https://lore.kernel.org/linux-iommu/20250313123532.103522-1-yi.l.liu@intel.com/
 - Drop patch 01 of v8 (Nic)
 - Add patch 01 and 02 to support reusing iommufd_attach_handle
 - Pass @Pasid to iommufd_hwpt_paging_alloc() instead of passing IOMMU_NO_PASID
   when calling iommufd_hw_pagetable_attach()
 - Consolidate the RID and PASID path. The RID and PASID path shares almost all
   the path from driver facing API to the place invoking underlying iommu driver
   callback. (Jason)
   *) Add patch 06, 07, 08, 09, 10 and 11 to use iommufd_attach_handle track
      attached hwpt and devices of a group.
   *) Patch 11 adds pasid_attach array to track iommufd_attach_handles used in
      the RID path and PASID path.
   *) Extend the existing attach/detach/replace API to support a pasid parameter
      to support PASID.
 - Rename "PATCH v8 05/12] iommufd: Mark PASID-compatible domain" to be
   "[PATCH v9 13/21] iommufd: Enforce PASID-compatible domain in PASID path"
   as this patch now adds pasid_compat flag and its check followed by Jason's
   remark.
 - Add description that the user that wants to use PASID should first attach
   its RID path to pasid-compatible domain.
 - Add back patch 11 0f v7 as it not included in any other series now
 - Drop the mixed_replace tests of v8 as it is not considered to be iommufd
   selftest
 - Add r-b tags got in v8. For the patches that have minor changes, I kept the
   r-b tags with a "v8 - v9" change log in individual patches. For the patches
   that have non-trivial change, I dropped the r-b tags to get more attention
   from reviewers.

v8: https://lore.kernel.org/linux-iommu/20250226114032.4591-1-yi.l.liu@intel.com/
 - Rebase on top of the dependency patch series
 - Check both handle and domain in the iommu_replace_device_pasid_handle()
   to support replace between domain and handle.
 - Fix a typo in patch 02 of v7 (Kevin)
 - r-b tag on patch 01 of v7 (Kevin)
 - Add selftest for replace between handle and non-handle case

v7: https://lore.kernel.org/linux-iommu/20250216035228.23831-1-yi.l.liu@intel.com/
 - Remove the iommu_attach_handle related refactors, as they have been addressed
   by Nic's series.
 - Address the comments on patch 01 of v6 in a separate series [1]. Store either
   the domain or handle in group->pasid_array, and swap the order of setting
   group->pasid_array and invoking the attach operation of IOMMU drivers.
 - Introduce iommu_attach_device_pasid_handle() and iommu_replace_device_pasid_handle(),
   and remove iommu_replace_device_pasid() since iommufd consistently uses the
   _handle() API.
 - Add patch 04 to include reserved_iova only for the RID path, as the underlying
   helpers are shared by both the RID and PASID paths, but only the RID path needs
   to add reserved_iova.
 - Remove the iommu_dev.max_pasids check in patch 11 of v6. The mock IOMMU always
   supports 20-bit PASIDs, so there is no need to verify PASID support in the mock
   IOMMU driver. Additionally, the IOMMU_HWPT_ALLOC_PASID flag does not imply PASID
   support, so it should be removed to avoid misleading IOMMU driver programming.

v6: https://lore.kernel.org/linux-iommu/20241219132746.16193-1-yi.l.liu@intel.com/
 - Add kdoc to iommufd_device_get_attach_handle() to note the returned handle
   should be used with care. (Baolu)
 - Reworked the patch 07 and 08 of v5 to avoid domain allocation failure on VT-d
   after applying patch 07 of v5.
     1) Split out the intel iommu driver IOMMU_HWPT_ALLOC_PASID support out of
	patch 08
     2) Rework the PASID-compatible domain enforcement by checking the RID domain
	and idev->pasid_hwpts under the idev->igroup->lock.
 - iommufd_device_pasid_do_attach() returns -EINVAL if there is old hwpt and it's
   not the same with new hwpt. This aligns with how the iommufd_device_do_attach()
   deals it. Otherwise, attaching the same pasid to the same ioas is going to fail
   before the auto_domain loop goes to the correct hwpt. Thsi is not reasonable. So
   make this change.
 - Enhanced the pasid selftest to have non-pasid-capable device and pasid-capable
   device.
 - The order of the series is tweaked to be prepare the iommufd for pasid attach,
   add pasid attach, add PASID-compat domain enforcement and then add the PASID-compat
   hwpt allocation.
 - Rebased on top of 6.13-rc3 and some already applied patches.

v5: https://lore.kernel.org/linux-iommu/20241104132513.15890-1-yi.l.liu@intel.com/
 - Fix a mistake in patch 02 of v4 (Kevin)
 - Move the iommufd_handle helpers to device.c
 - Add IOMMU_HWPT_ALLOC_PASID check to enforce pasid-compatible domain for pasid
   capable device in iommufd
 - Update the iommufd selftest to use IOMMU_HWPT_ALLOC_PASID

v4: https://lore.kernel.org/linux-iommu/20240912131255.13305-1-yi.l.liu@intel.com/
 - Replace remove_dev_pasid() by supporting set_dev_pasid() for blocking domain (Kevin)
	- This is done by the preparation series "Support attaching PASID to the blocked_domain"
 - Misc tweaks to foil the merging of the iommufd iopf series. Three new patches are added:
	- iommufd: Always pass iommu_attach_handle to iommu core
	- iommufd: Move the iommufd_handle helpers to iommufd_private.h
	- iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle()
 - Renmae patch 03 of v3 to be "iommufd: Support pasid attach/replace"
 - Add test case for attaching/replacing iopf-capable hwpt to pasid

v3: https://lore.kernel.org/kvm/20240628090557.50898-1-yi.l.liu@intel.com/
 - Split the set_dev_pasid op enhancements for domain replacement to be a
   separate series "Make set_dev_pasid op supportting domain replacement" [1].
   The below changes are made in the separate series.
   *) set_dev_pasid() callback should keep the old config if failed to attach to
      a domain. This simplifies the caller a lot as caller does not need to attach
      it back to old domain explicitly. This also avoids some corner cases in which
      the core may do duplicated domain attachment as described in below link (Jason)
      https://lore.kernel.org/linux-iommu/BN9PR11MB52768C98314A95AFCD2FA6478C0F2@BN9PR11MB5276.namprd11.prod.outlook.com/
   *) Drop patch 10 of v2 as it's a bug fix and can be submitted separately (Kevin)
   *) Rebase on top of Baolu's domain_alloc_paging refactor series (Jason)
 - Drop the attach_data which includes attach_fn and pasid, insteadly passing the
   pasid through the device attach path. (Jason)
 - Add a pasid-num-bits property to mock dev to make pasid selftest work (Kevin)

v2: https://lore.kernel.org/linux-iommu/20240412081516.31168-1-yi.l.liu@intel.com/
 - Domain replace for pasid should be handled in set_dev_pasid() callbacks
   instead of remove_dev_pasid and call set_dev_pasid afteward in iommu
   layer (Jason)
 - Make xarray operations more self-contained in iommufd pasid attach/replace/detach
   (Jason)
 - Tweak the dev_iommu_get_max_pasids() to allow iommu driver to populate the
   max_pasids. This makes the iommufd selftest simpler to meet the max_pasids
   check in iommu_attach_device_pasid()  (Jason)

v1: https://lore.kernel.org/kvm/20231127063428.127436-1-yi.l.liu@intel.com/#r
 - Implemnet iommu_replace_device_pasid() to fall back to the original domain
   if this replacement failed (Kevin)
 - Add check in do_attach() to check corressponding attach_fn per the pasid value.

rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (18):
  iommu: Require passing new handles to APIs supporting handle
  iommu: Introduce a replace API for device pasid
  iommufd: Pass @pasid through the device attach/replace path
  iommufd/device: Only add reserved_iova in non-pasid path
  iommufd/device: Replace idev->igroup with local variable
  iommufd/device: Add helper to detect the first attach of a group
  iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach
    struct
  iommufd/device: Replace device_list with device_array
  iommufd/device: Add pasid_attach array to track per-PASID attach
  iommufd: Enforce PASID-compatible domain in PASID path
  iommufd: Support pasid attach/replace
  iommufd: Enforce PASID-compatible domain for RID
  iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  iommufd: Allow allocating PASID-compatible domain
  iommufd/selftest: Add set_dev_pasid in mock iommu
  iommufd/selftest: Add a helper to get test device
  iommufd/selftest: Add test ops to test pasid attach/detach
  iommufd/selftest: Add coverage for iommufd pasid attach/detach

 drivers/iommu/intel/iommu.c                   |   3 +-
 drivers/iommu/intel/nested.c                  |   2 +-
 drivers/iommu/iommu-priv.h                    |   4 +
 drivers/iommu/iommu.c                         | 126 ++++++-
 drivers/iommu/iommufd/device.c                | 344 +++++++++++++-----
 drivers/iommu/iommufd/hw_pagetable.c          |  23 +-
 drivers/iommu/iommufd/iommufd_private.h       |  14 +-
 drivers/iommu/iommufd/iommufd_test.h          |  35 ++
 drivers/iommu/iommufd/selftest.c              | 228 ++++++++++--
 drivers/vfio/iommufd.c                        |  10 +-
 include/linux/iommufd.h                       |   9 +-
 include/uapi/linux/iommufd.h                  |   3 +
 tools/testing/selftests/iommu/iommufd.c       | 344 ++++++++++++++++++
 .../selftests/iommu/iommufd_fail_nth.c        |  41 ++-
 tools/testing/selftests/iommu/iommufd_utils.h | 102 ++++++
 15 files changed, 1139 insertions(+), 149 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 15:23   ` Jason Gunthorpe
  2025-03-21  2:35   ` Baolu Lu
  2025-03-20 13:47 ` [PATCH v10 02/18] iommu: Introduce a replace API for device pasid Yi Liu
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

Add kdoc to highligt the caller of iommu_[attach|replace]_group_handle()
and iommu_attach_device_pasid() should always provide a new handle.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0ee17893810f..cffd96e3efd2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3352,6 +3352,9 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
  * @pasid: the pasid of the device.
  * @handle: the attach handle.
  *
+ * Caller should always provide a new handle to avoid race with the paths
+ * that have lockless reference to handle if it intends to pass a valid handle.
+ *
  * Return: 0 on success, or an error.
  */
 int iommu_attach_device_pasid(struct iommu_domain *domain,
@@ -3512,6 +3515,9 @@ EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, "IOMMUFD_INTERNAL");
  * This is a variant of iommu_attach_group(). It allows the caller to provide
  * an attach handle and use it when the domain is attached. This is currently
  * used by IOMMUFD to deliver the I/O page faults.
+ *
+ * Caller should always provide a new handle to avoid race with the paths
+ * that have lockless reference to handle.
  */
 int iommu_attach_group_handle(struct iommu_domain *domain,
 			      struct iommu_group *group,
@@ -3581,6 +3587,9 @@ EXPORT_SYMBOL_NS_GPL(iommu_detach_group_handle, "IOMMUFD_INTERNAL");
  *
  * If the currently attached domain is a core domain (e.g. a default_domain),
  * it will act just like the iommu_attach_group_handle().
+ *
+ * Caller should always provide a new handle to avoid race with the paths
+ * that have lockless reference to handle.
  */
 int iommu_replace_group_handle(struct iommu_group *group,
 			       struct iommu_domain *new_domain,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
  2025-03-20 13:47 ` [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 17:24   ` Nicolin Chen
  2025-03-21  3:08   ` Baolu Lu
  2025-03-20 13:47 ` [PATCH v10 03/18] iommufd: Pass @pasid through the device attach/replace path Yi Liu
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

Provide a high-level API to allow replacements of one domain with another
for specific pasid of a device. This is similar to
iommu_replace_group_handle() and it is expected to be used only by IOMMUFD.

Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 - > v10: Convert to the v8 version, added a check to fail the case
            in which the passed handle is equal to the existing one.
---
 drivers/iommu/iommu-priv.h |   4 ++
 drivers/iommu/iommu.c      | 117 +++++++++++++++++++++++++++++++++++--
 2 files changed, 117 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index b4508423e13b..2985f05d699f 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -43,4 +43,8 @@ void iommu_detach_group_handle(struct iommu_domain *domain,
 int iommu_replace_group_handle(struct iommu_group *group,
 			       struct iommu_domain *new_domain,
 			       struct iommu_attach_handle *handle);
+
+int iommu_replace_device_pasid(struct iommu_domain *domain,
+			       struct device *dev, ioasid_t pasid,
+			       struct iommu_attach_handle *handle);
 #endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cffd96e3efd2..07134bb85c00 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -513,6 +513,13 @@ static void iommu_deinit_device(struct device *dev)
 	dev_iommu_free(dev);
 }
 
+static inline struct iommu_domain *pasid_array_entry_to_domain(void *entry)
+{
+	if (xa_pointer_tag(entry) == IOMMU_PASID_ARRAY_DOMAIN)
+		return xa_untag_pointer(entry);
+	return ((struct iommu_attach_handle *)xa_untag_pointer(entry))->domain;
+}
+
 DEFINE_MUTEX(iommu_probe_device_lock);
 
 static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
@@ -3311,14 +3318,15 @@ static void iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid,
 }
 
 static int __iommu_set_group_pasid(struct iommu_domain *domain,
-				   struct iommu_group *group, ioasid_t pasid)
+				   struct iommu_group *group, ioasid_t pasid,
+				   struct iommu_domain *old)
 {
 	struct group_device *device, *last_gdev;
 	int ret;
 
 	for_each_group_device(group, device) {
 		ret = domain->ops->set_dev_pasid(domain, device->dev,
-						 pasid, NULL);
+						 pasid, old);
 		if (ret)
 			goto err_revert;
 	}
@@ -3330,7 +3338,15 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain,
 	for_each_group_device(group, device) {
 		if (device == last_gdev)
 			break;
-		iommu_remove_dev_pasid(device->dev, pasid, domain);
+		/*
+		 * If no old domain, undo the succeeded devices/pasid.
+		 * Otherwise, rollback the succeeded devices/pasid to the old
+		 * domain. And it is a driver bug to fail attaching with a
+		 * previously good domain.
+		 */
+		if (!old || WARN_ON(old->ops->set_dev_pasid(old, device->dev,
+							    pasid, domain)))
+			iommu_remove_dev_pasid(device->dev, pasid, domain);
 	}
 	return ret;
 }
@@ -3399,7 +3415,7 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 	if (ret)
 		goto out_unlock;
 
-	ret = __iommu_set_group_pasid(domain, group, pasid);
+	ret = __iommu_set_group_pasid(domain, group, pasid, NULL);
 	if (ret) {
 		xa_release(&group->pasid_array, pasid);
 		goto out_unlock;
@@ -3420,6 +3436,99 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
 
+/**
+ * iommu_replace_device_pasid - Replace the domain that a pasid
+ *                              is attached to
+ * @domain: the new iommu domain
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ * @handle: the attach handle.
+ *
+ * This API allows the pasid to switch domains. The @pasid should have been
+ * attached. Otherwise, this fails. The pasid will keep the old configuration
+ * if replacement failed.
+ *
+ * Caller should always provide a new handle to avoid race with the paths
+ * that have lockless reference to handle if it intends to pass a valid handle.
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_replace_device_pasid(struct iommu_domain *domain,
+			       struct device *dev, ioasid_t pasid,
+			       struct iommu_attach_handle *handle)
+{
+	/* Caller must be a probed driver on dev */
+	struct iommu_group *group = dev->iommu_group;
+	struct iommu_attach_handle *entry;
+	struct iommu_domain *curr_domain;
+	void *curr;
+	int ret;
+
+	if (!group)
+		return -ENODEV;
+
+	if (!domain->ops->set_dev_pasid)
+		return -EOPNOTSUPP;
+
+	if (dev_iommu_ops(dev) != domain->owner ||
+	    pasid == IOMMU_NO_PASID || !handle)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	entry = iommu_make_pasid_array_entry(domain, handle);
+	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
+			  XA_ZERO_ENTRY, GFP_KERNEL);
+	if (xa_is_err(curr)) {
+		ret = xa_err(curr);
+		goto out_unlock;
+	}
+
+	/*
+	 * No domain (with or without handle) attached, hence not
+	 * a replace case.
+	 */
+	if (!curr) {
+		xa_release(&group->pasid_array, pasid);
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/*
+	 * Reusing handle is problematic as there are paths that refers
+	 * the handle without lock. To avoid race, reject the callers that
+	 * attempt it.
+	 */
+	if (handle && curr == entry) {
+		WARN_ON(1);
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	curr_domain = pasid_array_entry_to_domain(curr);
+	ret = 0;
+
+	if (curr_domain != domain) {
+		ret = __iommu_set_group_pasid(domain, group,
+					      pasid, curr_domain);
+		if (ret)
+			goto out_unlock;
+	}
+
+	if (curr != entry) {
+		/*
+		 * The above xa_cmpxchg() reserved the memory, and the
+		 * group->mutex is held, this cannot fail.
+		 */
+		WARN_ON(xa_is_err(xa_store(&group->pasid_array,
+					   pasid, entry, GFP_KERNEL)));
+	}
+
+out_unlock:
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
+
 /*
  * iommu_detach_device_pasid() - Detach the domain from pasid of device
  * @domain: the iommu domain.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 03/18] iommufd: Pass @pasid through the device attach/replace path
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
  2025-03-20 13:47 ` [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle Yi Liu
  2025-03-20 13:47 ` [PATCH v10 02/18] iommu: Introduce a replace API for device pasid Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-21  3:13   ` Baolu Lu
  2025-03-20 13:47 ` [PATCH v10 04/18] iommufd/device: Only add reserved_iova in non-pasid path Yi Liu
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

Most of the core logic before conducting the actual device attach/
replace operation can be shared with pasid attach/replace. So pass
@pasid through the device attach/replace helpers to prepare adding
pasid attach/replace.

So far the @pasid should only be IOMMU_NO_PASID. No functional change.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 70 +++++++++++++++----------
 drivers/iommu/iommufd/hw_pagetable.c    | 13 ++---
 drivers/iommu/iommufd/iommufd_private.h |  8 +--
 3 files changed, 52 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index bd50146e2ad0..3c83fb014dcb 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -483,7 +483,8 @@ static bool iommufd_device_is_attached(struct iommufd_device *idev)
 }
 
 static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
-				      struct iommufd_device *idev)
+				      struct iommufd_device *idev,
+				      ioasid_t pasid)
 {
 	struct iommufd_attach_handle *handle;
 	int rc;
@@ -501,6 +502,7 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 	}
 
 	handle->idev = idev;
+	WARN_ON(pasid != IOMMU_NO_PASID);
 	rc = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
 				       &handle->handle);
 	if (rc)
@@ -517,25 +519,28 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 }
 
 static struct iommufd_attach_handle *
-iommufd_device_get_attach_handle(struct iommufd_device *idev)
+iommufd_device_get_attach_handle(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommu_attach_handle *handle;
 
 	lockdep_assert_held(&idev->igroup->lock);
 
 	handle =
-		iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+		iommu_attach_handle_get(idev->igroup->group, pasid, 0);
 	if (IS_ERR(handle))
 		return NULL;
 	return to_iommufd_handle(handle);
 }
 
 static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
-				       struct iommufd_device *idev)
+				       struct iommufd_device *idev,
+				       ioasid_t pasid)
 {
 	struct iommufd_attach_handle *handle;
 
-	handle = iommufd_device_get_attach_handle(idev);
+	WARN_ON(pasid != IOMMU_NO_PASID);
+
+	handle = iommufd_device_get_attach_handle(idev, pasid);
 	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
 	if (hwpt->fault) {
 		iommufd_auto_response_faults(hwpt, handle);
@@ -545,13 +550,17 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
 }
 
 static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
+				       ioasid_t pasid,
 				       struct iommufd_hw_pagetable *hwpt,
 				       struct iommufd_hw_pagetable *old)
 {
-	struct iommufd_attach_handle *handle, *old_handle =
-		iommufd_device_get_attach_handle(idev);
+	struct iommufd_attach_handle *handle, *old_handle;
 	int rc;
 
+	WARN_ON(pasid != IOMMU_NO_PASID);
+
+	old_handle = iommufd_device_get_attach_handle(idev, pasid);
+
 	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
 	if (!handle)
 		return -ENOMEM;
@@ -586,7 +595,7 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 }
 
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
-				struct iommufd_device *idev)
+				struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
 	int rc;
@@ -612,7 +621,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommufd_hwpt_attach_device(hwpt, idev);
+		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
 		if (rc)
 			goto err_unresv;
 		idev->igroup->hwpt = hwpt;
@@ -630,7 +639,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 }
 
 struct iommufd_hw_pagetable *
-iommufd_hw_pagetable_detach(struct iommufd_device *idev)
+iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
@@ -638,7 +647,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	mutex_lock(&idev->igroup->lock);
 	list_del(&idev->group_item);
 	if (list_empty(&idev->igroup->device_list)) {
-		iommufd_hwpt_detach_device(hwpt, idev);
+		iommufd_hwpt_detach_device(hwpt, idev, pasid);
 		idev->igroup->hwpt = NULL;
 	}
 	if (hwpt_paging)
@@ -650,12 +659,12 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 }
 
 static struct iommufd_hw_pagetable *
-iommufd_device_do_attach(struct iommufd_device *idev,
+iommufd_device_do_attach(struct iommufd_device *idev, ioasid_t pasid,
 			 struct iommufd_hw_pagetable *hwpt)
 {
 	int rc;
 
-	rc = iommufd_hw_pagetable_attach(hwpt, idev);
+	rc = iommufd_hw_pagetable_attach(hwpt, idev, pasid);
 	if (rc)
 		return ERR_PTR(rc);
 	return NULL;
@@ -704,7 +713,7 @@ iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup,
 }
 
 static struct iommufd_hw_pagetable *
-iommufd_device_do_replace(struct iommufd_device *idev,
+iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 			  struct iommufd_hw_pagetable *hwpt)
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
@@ -738,7 +747,7 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 			goto err_unlock;
 	}
 
-	rc = iommufd_hwpt_replace_device(idev, hwpt, old_hwpt);
+	rc = iommufd_hwpt_replace_device(idev, pasid, hwpt, old_hwpt);
 	if (rc)
 		goto err_unresv;
 
@@ -771,7 +780,8 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 }
 
 typedef struct iommufd_hw_pagetable *(*attach_fn)(
-	struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
+	struct iommufd_device *idev, ioasid_t pasid,
+	struct iommufd_hw_pagetable *hwpt);
 
 /*
  * When automatically managing the domains we search for a compatible domain in
@@ -779,7 +789,7 @@ typedef struct iommufd_hw_pagetable *(*attach_fn)(
  * Automatic domain selection will never pick a manually created domain.
  */
 static struct iommufd_hw_pagetable *
-iommufd_device_auto_get_domain(struct iommufd_device *idev,
+iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
 			       struct iommufd_ioas *ioas, u32 *pt_id,
 			       attach_fn do_attach)
 {
@@ -808,7 +818,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		hwpt = &hwpt_paging->common;
 		if (!iommufd_lock_obj(&hwpt->obj))
 			continue;
-		destroy_hwpt = (*do_attach)(idev, hwpt);
+		destroy_hwpt = (*do_attach)(idev, pasid, hwpt);
 		if (IS_ERR(destroy_hwpt)) {
 			iommufd_put_object(idev->ictx, &hwpt->obj);
 			/*
@@ -826,8 +836,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		goto out_unlock;
 	}
 
-	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, 0,
-						immediate_attach, NULL);
+	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, pasid,
+						0, immediate_attach, NULL);
 	if (IS_ERR(hwpt_paging)) {
 		destroy_hwpt = ERR_CAST(hwpt_paging);
 		goto out_unlock;
@@ -835,7 +845,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	hwpt = &hwpt_paging->common;
 
 	if (!immediate_attach) {
-		destroy_hwpt = (*do_attach)(idev, hwpt);
+		destroy_hwpt = (*do_attach)(idev, pasid, hwpt);
 		if (IS_ERR(destroy_hwpt))
 			goto out_abort;
 	} else {
@@ -856,8 +866,9 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	return destroy_hwpt;
 }
 
-static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
-				    attach_fn do_attach)
+static int iommufd_device_change_pt(struct iommufd_device *idev,
+				    ioasid_t pasid,
+				    u32 *pt_id, attach_fn do_attach)
 {
 	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_object *pt_obj;
@@ -872,7 +883,7 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
 		struct iommufd_hw_pagetable *hwpt =
 			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
 
-		destroy_hwpt = (*do_attach)(idev, hwpt);
+		destroy_hwpt = (*do_attach)(idev, pasid, hwpt);
 		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
@@ -881,8 +892,8 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
 		struct iommufd_ioas *ioas =
 			container_of(pt_obj, struct iommufd_ioas, obj);
 
-		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
-							      do_attach);
+		destroy_hwpt = iommufd_device_auto_get_domain(idev, pasid, ioas,
+							      pt_id, do_attach);
 		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
@@ -919,7 +930,8 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 {
 	int rc;
 
-	rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach);
+	rc = iommufd_device_change_pt(idev, IOMMU_NO_PASID, pt_id,
+				      &iommufd_device_do_attach);
 	if (rc)
 		return rc;
 
@@ -949,7 +961,7 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, "IOMMUFD");
  */
 int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
 {
-	return iommufd_device_change_pt(idev, pt_id,
+	return iommufd_device_change_pt(idev, IOMMU_NO_PASID, pt_id,
 					&iommufd_device_do_replace);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, "IOMMUFD");
@@ -965,7 +977,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
 {
 	struct iommufd_hw_pagetable *hwpt;
 
-	hwpt = iommufd_hw_pagetable_detach(idev);
+	hwpt = iommufd_hw_pagetable_detach(idev, IOMMU_NO_PASID);
 	iommufd_hw_pagetable_put(idev->ictx, hwpt);
 	refcount_dec(&idev->obj.users);
 }
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 9a89f3a28dc5..9bf970b6a1c3 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -90,6 +90,7 @@ iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging)
  * @ictx: iommufd context
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
+ * @pasid: PASID to get an iommu_domain for
  * @flags: Flags from userspace
  * @immediate_attach: True if idev should be attached to the hwpt
  * @user_data: The user provided driver specific data describing the domain to
@@ -105,8 +106,8 @@ iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging)
  */
 struct iommufd_hwpt_paging *
 iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			  struct iommufd_device *idev, u32 flags,
-			  bool immediate_attach,
+			  struct iommufd_device *idev, ioasid_t pasid,
+			  u32 flags, bool immediate_attach,
 			  const struct iommu_user_data *user_data)
 {
 	const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT |
@@ -189,7 +190,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	 * sequence. Once those drivers are fixed this should be removed.
 	 */
 	if (immediate_attach) {
-		rc = iommufd_hw_pagetable_attach(hwpt, idev);
+		rc = iommufd_hw_pagetable_attach(hwpt, idev, pasid);
 		if (rc)
 			goto out_abort;
 	}
@@ -202,7 +203,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 
 out_detach:
 	if (immediate_attach)
-		iommufd_hw_pagetable_detach(idev);
+		iommufd_hw_pagetable_detach(idev, pasid);
 out_abort:
 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
 	return ERR_PTR(rc);
@@ -364,8 +365,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
 		mutex_lock(&ioas->mutex);
 		hwpt_paging = iommufd_hwpt_paging_alloc(
-			ucmd->ictx, ioas, idev, cmd->flags, false,
-			user_data.len ? &user_data : NULL);
+			ucmd->ictx, ioas, idev, IOMMU_NO_PASID, cmd->flags,
+			false, user_data.len ? &user_data : NULL);
 		if (IS_ERR(hwpt_paging)) {
 			rc = PTR_ERR(hwpt_paging);
 			goto out_unlock;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 8cda9c4672eb..1f81cd3787cb 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -366,13 +366,13 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd);
 
 struct iommufd_hwpt_paging *
 iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			  struct iommufd_device *idev, u32 flags,
-			  bool immediate_attach,
+			  struct iommufd_device *idev, ioasid_t pasid,
+			  u32 flags, bool immediate_attach,
 			  const struct iommu_user_data *user_data);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
-				struct iommufd_device *idev);
+				struct iommufd_device *idev, ioasid_t pasid);
 struct iommufd_hw_pagetable *
-iommufd_hw_pagetable_detach(struct iommufd_device *idev);
+iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid);
 void iommufd_hwpt_paging_destroy(struct iommufd_object *obj);
 void iommufd_hwpt_paging_abort(struct iommufd_object *obj);
 void iommufd_hwpt_nested_destroy(struct iommufd_object *obj);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 04/18] iommufd/device: Only add reserved_iova in non-pasid path
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (2 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 03/18] iommufd: Pass @pasid through the device attach/replace path Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-21  3:14   ` Baolu Lu
  2025-03-20 13:47 ` [PATCH v10 05/18] iommufd/device: Replace idev->igroup with local variable Yi Liu
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

As the pasid is passed through the attach/replace/detach helpers, it is
necessary to ensure only the non-pasid path adds reserved_iova.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 3c83fb014dcb..be5226d2883e 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -598,6 +598,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
+	bool attach_resv = hwpt_paging && pasid == IOMMU_NO_PASID;
 	int rc;
 
 	mutex_lock(&idev->igroup->lock);
@@ -607,7 +608,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		goto err_unlock;
 	}
 
-	if (hwpt_paging) {
+	if (attach_resv) {
 		rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
 		if (rc)
 			goto err_unlock;
@@ -631,7 +632,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	mutex_unlock(&idev->igroup->lock);
 	return 0;
 err_unresv:
-	if (hwpt_paging)
+	if (attach_resv)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
 err_unlock:
 	mutex_unlock(&idev->igroup->lock);
@@ -650,7 +651,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 		iommufd_hwpt_detach_device(hwpt, idev, pasid);
 		idev->igroup->hwpt = NULL;
 	}
-	if (hwpt_paging)
+	if (hwpt_paging && pasid == IOMMU_NO_PASID)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
 	mutex_unlock(&idev->igroup->lock);
 
@@ -717,6 +718,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 			  struct iommufd_hw_pagetable *hwpt)
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
+	bool attach_resv = hwpt_paging && pasid == IOMMU_NO_PASID;
 	struct iommufd_hwpt_paging *old_hwpt_paging;
 	struct iommufd_group *igroup = idev->igroup;
 	struct iommufd_hw_pagetable *old_hwpt;
@@ -741,7 +743,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	}
 
 	old_hwpt = igroup->hwpt;
-	if (hwpt_paging) {
+	if (attach_resv) {
 		rc = iommufd_group_do_replace_reserved_iova(igroup, hwpt_paging);
 		if (rc)
 			goto err_unlock;
@@ -752,7 +754,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 		goto err_unresv;
 
 	old_hwpt_paging = find_hwpt_paging(old_hwpt);
-	if (old_hwpt_paging &&
+	if (old_hwpt_paging && pasid == IOMMU_NO_PASID &&
 	    (!hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas))
 		iommufd_group_remove_reserved_iova(igroup, old_hwpt_paging);
 
@@ -772,7 +774,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	/* Caller must destroy old_hwpt */
 	return old_hwpt;
 err_unresv:
-	if (hwpt_paging)
+	if (attach_resv)
 		iommufd_group_remove_reserved_iova(igroup, hwpt_paging);
 err_unlock:
 	mutex_unlock(&idev->igroup->lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 05/18] iommufd/device: Replace idev->igroup with local variable
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (3 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 04/18] iommufd/device: Only add reserved_iova in non-pasid path Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-21  3:14   ` Baolu Lu
  2025-03-20 13:47 ` [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group Yi Liu
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

With more use of the fields of igroup, use a local vairable instead of
using the idev->igroup heavily.

No functional change expected.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 43 ++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index be5226d2883e..ac54d734b819 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -448,18 +448,19 @@ static int
 iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
 				    struct iommufd_hwpt_paging *hwpt_paging)
 {
+	struct iommufd_group *igroup = idev->igroup;
 	int rc;
 
-	lockdep_assert_held(&idev->igroup->lock);
+	lockdep_assert_held(&igroup->lock);
 
 	rc = iopt_table_enforce_dev_resv_regions(&hwpt_paging->ioas->iopt,
 						 idev->dev,
-						 &idev->igroup->sw_msi_start);
+						 &igroup->sw_msi_start);
 	if (rc)
 		return rc;
 
-	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommufd_group_setup_msi(idev->igroup, hwpt_paging);
+	if (list_empty(&igroup->device_list)) {
+		rc = iommufd_group_setup_msi(igroup, hwpt_paging);
 		if (rc) {
 			iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt,
 						  idev->dev);
@@ -599,11 +600,12 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 {
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
 	bool attach_resv = hwpt_paging && pasid == IOMMU_NO_PASID;
+	struct iommufd_group *igroup = idev->igroup;
 	int rc;
 
-	mutex_lock(&idev->igroup->lock);
+	mutex_lock(&igroup->lock);
 
-	if (idev->igroup->hwpt != NULL && idev->igroup->hwpt != hwpt) {
+	if (igroup->hwpt && igroup->hwpt != hwpt) {
 		rc = -EINVAL;
 		goto err_unlock;
 	}
@@ -621,39 +623,40 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * reserved regions are only updated during individual device
 	 * attachment.
 	 */
-	if (list_empty(&idev->igroup->device_list)) {
+	if (list_empty(&igroup->device_list)) {
 		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
 		if (rc)
 			goto err_unresv;
-		idev->igroup->hwpt = hwpt;
+		igroup->hwpt = hwpt;
 	}
 	refcount_inc(&hwpt->obj.users);
-	list_add_tail(&idev->group_item, &idev->igroup->device_list);
-	mutex_unlock(&idev->igroup->lock);
+	list_add_tail(&idev->group_item, &igroup->device_list);
+	mutex_unlock(&igroup->lock);
 	return 0;
 err_unresv:
 	if (attach_resv)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
 err_unlock:
-	mutex_unlock(&idev->igroup->lock);
+	mutex_unlock(&igroup->lock);
 	return rc;
 }
 
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 {
-	struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
+	struct iommufd_group *igroup = idev->igroup;
+	struct iommufd_hw_pagetable *hwpt = igroup->hwpt;
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
 
-	mutex_lock(&idev->igroup->lock);
+	mutex_lock(&igroup->lock);
 	list_del(&idev->group_item);
-	if (list_empty(&idev->igroup->device_list)) {
+	if (list_empty(&igroup->device_list)) {
 		iommufd_hwpt_detach_device(hwpt, idev, pasid);
-		idev->igroup->hwpt = NULL;
+		igroup->hwpt = NULL;
 	}
 	if (hwpt_paging && pasid == IOMMU_NO_PASID)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
-	mutex_unlock(&idev->igroup->lock);
+	mutex_unlock(&igroup->lock);
 
 	/* Caller must destroy hwpt */
 	return hwpt;
@@ -725,7 +728,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	unsigned int num_devices;
 	int rc;
 
-	mutex_lock(&idev->igroup->lock);
+	mutex_lock(&igroup->lock);
 
 	if (igroup->hwpt == NULL) {
 		rc = -EINVAL;
@@ -738,7 +741,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	}
 
 	if (hwpt == igroup->hwpt) {
-		mutex_unlock(&idev->igroup->lock);
+		mutex_unlock(&igroup->lock);
 		return NULL;
 	}
 
@@ -769,7 +772,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	if (num_devices > 1)
 		WARN_ON(refcount_sub_and_test(num_devices - 1,
 					      &old_hwpt->obj.users));
-	mutex_unlock(&idev->igroup->lock);
+	mutex_unlock(&igroup->lock);
 
 	/* Caller must destroy old_hwpt */
 	return old_hwpt;
@@ -777,7 +780,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	if (attach_resv)
 		iommufd_group_remove_reserved_iova(igroup, hwpt_paging);
 err_unlock:
-	mutex_unlock(&idev->igroup->lock);
+	mutex_unlock(&igroup->lock);
 	return ERR_PTR(rc);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (4 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 05/18] iommufd/device: Replace idev->igroup with local variable Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 15:36   ` Jason Gunthorpe
                     ` (2 more replies)
  2025-03-20 13:47 ` [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct Yi Liu
                   ` (12 subsequent siblings)
  18 siblings, 3 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

The existing code detects the first attach by checking the
igroup->device_list. However, the igroup->hwpt can also be used to detect
the first attach. In future modifications, it is better to check the
igroup->hwpt instead of the device_list. To improve readbility and also
prepare for further modifications on this part, this adds a helper for it.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 -> v10: It is patch 07 of v9, it's reworked hence renamed as well.
---
 drivers/iommu/iommufd/device.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index ac54d734b819..9db36346328f 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -444,6 +444,13 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
 	return 0;
 }
 
+static inline bool
+igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
+{
+	lockdep_assert_held(&igroup->lock);
+	return !igroup->hwpt;
+}
+
 static int
 iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
 				    struct iommufd_hwpt_paging *hwpt_paging)
@@ -459,7 +466,7 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
 	if (rc)
 		return rc;
 
-	if (list_empty(&igroup->device_list)) {
+	if (igroup_first_attach(igroup, IOMMU_NO_PASID)) {
 		rc = iommufd_group_setup_msi(igroup, hwpt_paging);
 		if (rc) {
 			iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt,
@@ -623,7 +630,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * reserved regions are only updated during individual device
 	 * attachment.
 	 */
-	if (list_empty(&igroup->device_list)) {
+	if (igroup_first_attach(igroup, pasid)) {
 		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
 		if (rc)
 			goto err_unresv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (5 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 15:48   ` Jason Gunthorpe
                     ` (2 more replies)
  2025-03-20 13:47 ` [PATCH v10 08/18] iommufd/device: Replace device_list with device_array Yi Liu
                   ` (11 subsequent siblings)
  18 siblings, 3 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

The igroup->hwpt and igroup->device_list are used to track the hwpt attach
of a group in the RID path. While the coming PASID path also needs such
tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into
attach struct which is allocated per attaching the first device of the
group and freed per detaching the last device of the group.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 76 ++++++++++++++++++-------
 drivers/iommu/iommufd/iommufd_private.h |  5 +-
 2 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 9db36346328f..7a3e105362cb 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -18,12 +18,17 @@ MODULE_PARM_DESC(
 	"Allow IOMMUFD to bind to devices even if the platform cannot isolate "
 	"the MSI interrupt window. Enabling this is a security weakness.");
 
+struct iommufd_attach {
+	struct iommufd_hw_pagetable *hwpt;
+	struct list_head device_list;
+};
+
 static void iommufd_group_release(struct kref *kref)
 {
 	struct iommufd_group *igroup =
 		container_of(kref, struct iommufd_group, ref);
 
-	WARN_ON(igroup->hwpt || !list_empty(&igroup->device_list));
+	WARN_ON(igroup->attach);
 
 	xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
 		   NULL, GFP_KERNEL);
@@ -90,7 +95,6 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
 
 	kref_init(&new_igroup->ref);
 	mutex_init(&new_igroup->lock);
-	INIT_LIST_HEAD(&new_igroup->device_list);
 	new_igroup->sw_msi_start = PHYS_ADDR_MAX;
 	/* group reference moves into new_igroup */
 	new_igroup->group = group;
@@ -448,7 +452,7 @@ static inline bool
 igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
 {
 	lockdep_assert_held(&igroup->lock);
-	return !igroup->hwpt;
+	return !igroup->attach;
 }
 
 static int
@@ -484,7 +488,7 @@ static bool iommufd_device_is_attached(struct iommufd_device *idev)
 {
 	struct iommufd_device *cur;
 
-	list_for_each_entry(cur, &idev->igroup->device_list, group_item)
+	list_for_each_entry(cur, &idev->igroup->attach->device_list, group_item)
 		if (cur == idev)
 			return true;
 	return false;
@@ -608,19 +612,33 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
 	bool attach_resv = hwpt_paging && pasid == IOMMU_NO_PASID;
 	struct iommufd_group *igroup = idev->igroup;
+	struct iommufd_hw_pagetable *old_hwpt;
+	struct iommufd_attach *attach;
 	int rc;
 
 	mutex_lock(&igroup->lock);
 
-	if (igroup->hwpt && igroup->hwpt != hwpt) {
+	attach = igroup->attach;
+	if (!attach) {
+		attach = kzalloc(sizeof(*attach), GFP_KERNEL);
+		if (!attach) {
+			rc = -ENOMEM;
+			goto err_unlock;
+		}
+		INIT_LIST_HEAD(&attach->device_list);
+	}
+
+	old_hwpt = attach->hwpt;
+
+	if (old_hwpt && old_hwpt != hwpt) {
 		rc = -EINVAL;
-		goto err_unlock;
+		goto err_free_attach;
 	}
 
 	if (attach_resv) {
 		rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
 		if (rc)
-			goto err_unlock;
+			goto err_free_attach;
 	}
 
 	/*
@@ -634,15 +652,19 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
 		if (rc)
 			goto err_unresv;
-		igroup->hwpt = hwpt;
+		attach->hwpt = hwpt;
+		igroup->attach = attach;
 	}
 	refcount_inc(&hwpt->obj.users);
-	list_add_tail(&idev->group_item, &igroup->device_list);
+	list_add_tail(&idev->group_item, &attach->device_list);
 	mutex_unlock(&igroup->lock);
 	return 0;
 err_unresv:
 	if (attach_resv)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
+err_free_attach:
+	if (igroup_first_attach(igroup, pasid))
+		kfree(attach);
 err_unlock:
 	mutex_unlock(&igroup->lock);
 	return rc;
@@ -652,14 +674,20 @@ struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_group *igroup = idev->igroup;
-	struct iommufd_hw_pagetable *hwpt = igroup->hwpt;
-	struct iommufd_hwpt_paging *hwpt_paging = find_hwpt_paging(hwpt);
+	struct iommufd_hwpt_paging *hwpt_paging;
+	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_attach *attach;
 
 	mutex_lock(&igroup->lock);
+	attach = igroup->attach;
+	hwpt = attach->hwpt;
+	hwpt_paging = find_hwpt_paging(hwpt);
+
 	list_del(&idev->group_item);
-	if (list_empty(&igroup->device_list)) {
+	if (list_empty(&attach->device_list)) {
 		iommufd_hwpt_detach_device(hwpt, idev, pasid);
-		igroup->hwpt = NULL;
+		igroup->attach = NULL;
+		kfree(attach);
 	}
 	if (hwpt_paging && pasid == IOMMU_NO_PASID)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
@@ -689,7 +717,7 @@ iommufd_group_remove_reserved_iova(struct iommufd_group *igroup,
 
 	lockdep_assert_held(&igroup->lock);
 
-	list_for_each_entry(cur, &igroup->device_list, group_item)
+	list_for_each_entry(cur, &igroup->attach->device_list, group_item)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, cur->dev);
 }
 
@@ -703,9 +731,10 @@ iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup,
 
 	lockdep_assert_held(&igroup->lock);
 
-	old_hwpt_paging = find_hwpt_paging(igroup->hwpt);
+	old_hwpt_paging = find_hwpt_paging(igroup->attach->hwpt);
 	if (!old_hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas) {
-		list_for_each_entry(cur, &igroup->device_list, group_item) {
+		list_for_each_entry(cur,
+				    &igroup->attach->device_list, group_item) {
 			rc = iopt_table_enforce_dev_resv_regions(
 				&hwpt_paging->ioas->iopt, cur->dev, NULL);
 			if (rc)
@@ -732,27 +761,32 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	struct iommufd_hwpt_paging *old_hwpt_paging;
 	struct iommufd_group *igroup = idev->igroup;
 	struct iommufd_hw_pagetable *old_hwpt;
+	struct iommufd_attach *attach;
 	unsigned int num_devices;
 	int rc;
 
 	mutex_lock(&igroup->lock);
 
-	if (igroup->hwpt == NULL) {
+	attach = igroup->attach;
+	if (!attach) {
 		rc = -EINVAL;
 		goto err_unlock;
 	}
 
+	old_hwpt = attach->hwpt;
+
+	WARN_ON(!old_hwpt || list_empty(&attach->device_list));
+
 	if (!iommufd_device_is_attached(idev)) {
 		rc = -EINVAL;
 		goto err_unlock;
 	}
 
-	if (hwpt == igroup->hwpt) {
+	if (hwpt == old_hwpt) {
 		mutex_unlock(&igroup->lock);
 		return NULL;
 	}
 
-	old_hwpt = igroup->hwpt;
 	if (attach_resv) {
 		rc = iommufd_group_do_replace_reserved_iova(igroup, hwpt_paging);
 		if (rc)
@@ -768,9 +802,9 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 	    (!hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas))
 		iommufd_group_remove_reserved_iova(igroup, old_hwpt_paging);
 
-	igroup->hwpt = hwpt;
+	attach->hwpt = hwpt;
 
-	num_devices = list_count_nodes(&igroup->device_list);
+	num_devices = list_count_nodes(&attach->device_list);
 	/*
 	 * Move the refcounts held by the device_list to the new hwpt. Retain a
 	 * refcount for this thread as the caller will free it.
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 1f81cd3787cb..09f5086e37cb 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -396,13 +396,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 	refcount_dec(&hwpt->obj.users);
 }
 
+struct iommufd_attach;
+
 struct iommufd_group {
 	struct kref ref;
 	struct mutex lock;
 	struct iommufd_ctx *ictx;
 	struct iommu_group *group;
-	struct iommufd_hw_pagetable *hwpt;
-	struct list_head device_list;
+	struct iommufd_attach *attach;
 	struct iommufd_sw_msi_maps required_sw_msi;
 	phys_addr_t sw_msi_start;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 08/18] iommufd/device: Replace device_list with device_array
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (6 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 17:20   ` Jason Gunthorpe
  2025-03-20 18:38   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach Yi Liu
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

igroup->attach->device_list is used to track attached device of a group
in the RID path. Such tracking is also needed in the PASID path in order
to share path with the RID path.

While there is only one list_head in the iommufd_device. It cannot work
if the device has been attached in both RID path and PASID path. To solve
it, replacing the device_list with an xarray. The attached iommufd_device
is stored in the entry indexed by the idev->obj.id.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 58 +++++++++++++++++++++++-----------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 7a3e105362cb..ef579b5d463d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -20,7 +20,7 @@ MODULE_PARM_DESC(
 
 struct iommufd_attach {
 	struct iommufd_hw_pagetable *hwpt;
-	struct list_head device_list;
+	struct xarray device_array;
 };
 
 static void iommufd_group_release(struct kref *kref)
@@ -298,6 +298,20 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
 
+static int iommufd_group_device_num(struct iommufd_group *igroup)
+{
+	struct iommufd_device *idev;
+	unsigned long index;
+	int count = 0;
+
+	lockdep_assert_held(&igroup->lock);
+
+	if (igroup->attach)
+		xa_for_each(&igroup->attach->device_array, index, idev)
+			count++;
+	return count;
+}
+
 /*
  * Get a iommufd_sw_msi_map for the msi physical address requested by the irq
  * layer. The mapping to IOVA is global to the iommufd file descriptor, every
@@ -486,12 +500,7 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
 /* Check if idev is attached to igroup->hwpt */
 static bool iommufd_device_is_attached(struct iommufd_device *idev)
 {
-	struct iommufd_device *cur;
-
-	list_for_each_entry(cur, &idev->igroup->attach->device_list, group_item)
-		if (cur == idev)
-			return true;
-	return false;
+	return xa_load(&idev->igroup->attach->device_array, idev->obj.id);
 }
 
 static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
@@ -625,20 +634,27 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 			rc = -ENOMEM;
 			goto err_unlock;
 		}
-		INIT_LIST_HEAD(&attach->device_list);
+		xa_init(&attach->device_array);
 	}
 
 	old_hwpt = attach->hwpt;
 
+	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
+		       GFP_KERNEL);
+	if (rc) {
+		WARN_ON(rc == -EBUSY && !old_hwpt);
+		goto err_free_attach;
+	}
+
 	if (old_hwpt && old_hwpt != hwpt) {
 		rc = -EINVAL;
-		goto err_free_attach;
+		goto err_release_devid;
 	}
 
 	if (attach_resv) {
 		rc = iommufd_device_attach_reserved_iova(idev, hwpt_paging);
 		if (rc)
-			goto err_free_attach;
+			goto err_release_devid;
 	}
 
 	/*
@@ -656,12 +672,15 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		igroup->attach = attach;
 	}
 	refcount_inc(&hwpt->obj.users);
-	list_add_tail(&idev->group_item, &attach->device_list);
+	WARN_ON(xa_is_err(xa_store(&attach->device_array, idev->obj.id,
+				   idev, GFP_KERNEL)));
 	mutex_unlock(&igroup->lock);
 	return 0;
 err_unresv:
 	if (attach_resv)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev);
+err_release_devid:
+	xa_release(&attach->device_array, idev->obj.id);
 err_free_attach:
 	if (igroup_first_attach(igroup, pasid))
 		kfree(attach);
@@ -683,8 +702,8 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 	hwpt = attach->hwpt;
 	hwpt_paging = find_hwpt_paging(hwpt);
 
-	list_del(&idev->group_item);
-	if (list_empty(&attach->device_list)) {
+	xa_erase(&attach->device_array, idev->obj.id);
+	if (xa_empty(&attach->device_array)) {
 		iommufd_hwpt_detach_device(hwpt, idev, pasid);
 		igroup->attach = NULL;
 		kfree(attach);
@@ -714,10 +733,11 @@ iommufd_group_remove_reserved_iova(struct iommufd_group *igroup,
 				   struct iommufd_hwpt_paging *hwpt_paging)
 {
 	struct iommufd_device *cur;
+	unsigned long index;
 
 	lockdep_assert_held(&igroup->lock);
 
-	list_for_each_entry(cur, &igroup->attach->device_list, group_item)
+	xa_for_each(&igroup->attach->device_array, index, cur)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, cur->dev);
 }
 
@@ -727,14 +747,14 @@ iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup,
 {
 	struct iommufd_hwpt_paging *old_hwpt_paging;
 	struct iommufd_device *cur;
+	unsigned long index;
 	int rc;
 
 	lockdep_assert_held(&igroup->lock);
 
 	old_hwpt_paging = find_hwpt_paging(igroup->attach->hwpt);
 	if (!old_hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas) {
-		list_for_each_entry(cur,
-				    &igroup->attach->device_list, group_item) {
+		xa_for_each(&igroup->attach->device_array, index, cur) {
 			rc = iopt_table_enforce_dev_resv_regions(
 				&hwpt_paging->ioas->iopt, cur->dev, NULL);
 			if (rc)
@@ -775,7 +795,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 
 	old_hwpt = attach->hwpt;
 
-	WARN_ON(!old_hwpt || list_empty(&attach->device_list));
+	WARN_ON(!old_hwpt || xa_empty(&attach->device_array));
 
 	if (!iommufd_device_is_attached(idev)) {
 		rc = -EINVAL;
@@ -804,9 +824,9 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 
 	attach->hwpt = hwpt;
 
-	num_devices = list_count_nodes(&attach->device_list);
+	num_devices = iommufd_group_device_num(igroup);
 	/*
-	 * Move the refcounts held by the device_list to the new hwpt. Retain a
+	 * Move the refcounts held by the device_array to the new hwpt. Retain a
 	 * refcount for this thread as the caller will free it.
 	 */
 	refcount_add(num_devices, &hwpt->obj.users);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (7 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 08/18] iommufd/device: Replace device_list with device_array Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 17:33   ` Jason Gunthorpe
  2025-03-20 19:19   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 10/18] iommufd: Enforce PASID-compatible domain in PASID path Yi Liu
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

PASIDs of PASID-capable device can be attached to hwpt separately, hence
a pasid array to track per-PASID attachment is necessary. The index
IOMMU_NO_PASID is used by the RID path. Hence drop the igroup->attach.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 68 +++++++++++++++++--------
 drivers/iommu/iommufd/iommufd_private.h |  2 +-
 2 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index ef579b5d463d..370aed636e8d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -28,7 +28,7 @@ static void iommufd_group_release(struct kref *kref)
 	struct iommufd_group *igroup =
 		container_of(kref, struct iommufd_group, ref);
 
-	WARN_ON(igroup->attach);
+	WARN_ON(!xa_empty(&igroup->pasid_attach));
 
 	xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
 		   NULL, GFP_KERNEL);
@@ -95,6 +95,7 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
 
 	kref_init(&new_igroup->ref);
 	mutex_init(&new_igroup->lock);
+	xa_init(&new_igroup->pasid_attach);
 	new_igroup->sw_msi_start = PHYS_ADDR_MAX;
 	/* group reference moves into new_igroup */
 	new_igroup->group = group;
@@ -298,16 +299,19 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
 
-static int iommufd_group_device_num(struct iommufd_group *igroup)
+static int iommufd_group_device_num(struct iommufd_group *igroup,
+				    ioasid_t pasid)
 {
+	struct iommufd_attach *attach;
 	struct iommufd_device *idev;
 	unsigned long index;
 	int count = 0;
 
 	lockdep_assert_held(&igroup->lock);
 
-	if (igroup->attach)
-		xa_for_each(&igroup->attach->device_array, index, idev)
+	attach = xa_load(&igroup->pasid_attach, pasid);
+	if (attach)
+		xa_for_each(&attach->device_array, index, idev)
 			count++;
 	return count;
 }
@@ -466,7 +470,7 @@ static inline bool
 igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
 {
 	lockdep_assert_held(&igroup->lock);
-	return !igroup->attach;
+	return !xa_load(&igroup->pasid_attach, pasid);
 }
 
 static int
@@ -497,10 +501,13 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
 
 /* The device attach/detach/replace helpers for attach_handle */
 
-/* Check if idev is attached to igroup->hwpt */
-static bool iommufd_device_is_attached(struct iommufd_device *idev)
+static bool iommufd_device_is_attached(struct iommufd_device *idev,
+				       ioasid_t pasid)
 {
-	return xa_load(&idev->igroup->attach->device_array, idev->obj.id);
+	struct iommufd_attach *attach;
+
+	attach = xa_load(&idev->igroup->pasid_attach, pasid);
+	return xa_load(&attach->device_array, idev->obj.id);
 }
 
 static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
@@ -627,19 +634,25 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 
 	mutex_lock(&igroup->lock);
 
-	attach = igroup->attach;
+	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
+			    XA_ZERO_ENTRY, GFP_KERNEL);
+	if (xa_is_err(attach)) {
+		rc = xa_err(attach);
+		goto err_unlock;
+	}
+
 	if (!attach) {
 		attach = kzalloc(sizeof(*attach), GFP_KERNEL);
 		if (!attach) {
 			rc = -ENOMEM;
-			goto err_unlock;
+			goto err_release_pasid;
 		}
 		xa_init(&attach->device_array);
 	}
 
 	old_hwpt = attach->hwpt;
 
-	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
+	rc = xa_insert(&attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
 		       GFP_KERNEL);
 	if (rc) {
 		WARN_ON(rc == -EBUSY && !old_hwpt);
@@ -669,7 +682,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		if (rc)
 			goto err_unresv;
 		attach->hwpt = hwpt;
-		igroup->attach = attach;
+		WARN_ON(xa_is_err(xa_store(&igroup->pasid_attach, pasid, attach,
+					   GFP_KERNEL)));
 	}
 	refcount_inc(&hwpt->obj.users);
 	WARN_ON(xa_is_err(xa_store(&attach->device_array, idev->obj.id,
@@ -684,6 +698,9 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 err_free_attach:
 	if (igroup_first_attach(igroup, pasid))
 		kfree(attach);
+err_release_pasid:
+	if (igroup_first_attach(igroup, pasid))
+		xa_release(&igroup->pasid_attach, pasid);
 err_unlock:
 	mutex_unlock(&igroup->lock);
 	return rc;
@@ -698,14 +715,14 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid)
 	struct iommufd_attach *attach;
 
 	mutex_lock(&igroup->lock);
-	attach = igroup->attach;
+	attach = xa_load(&igroup->pasid_attach, pasid);
 	hwpt = attach->hwpt;
 	hwpt_paging = find_hwpt_paging(hwpt);
 
 	xa_erase(&attach->device_array, idev->obj.id);
 	if (xa_empty(&attach->device_array)) {
 		iommufd_hwpt_detach_device(hwpt, idev, pasid);
-		igroup->attach = NULL;
+		xa_erase(&igroup->pasid_attach, pasid);
 		kfree(attach);
 	}
 	if (hwpt_paging && pasid == IOMMU_NO_PASID)
@@ -732,12 +749,14 @@ static void
 iommufd_group_remove_reserved_iova(struct iommufd_group *igroup,
 				   struct iommufd_hwpt_paging *hwpt_paging)
 {
+	struct iommufd_attach *attach;
 	struct iommufd_device *cur;
 	unsigned long index;
 
 	lockdep_assert_held(&igroup->lock);
 
-	xa_for_each(&igroup->attach->device_array, index, cur)
+	attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
+	xa_for_each(&attach->device_array, index, cur)
 		iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, cur->dev);
 }
 
@@ -746,15 +765,17 @@ iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup,
 				       struct iommufd_hwpt_paging *hwpt_paging)
 {
 	struct iommufd_hwpt_paging *old_hwpt_paging;
+	struct iommufd_attach *attach;
 	struct iommufd_device *cur;
 	unsigned long index;
 	int rc;
 
 	lockdep_assert_held(&igroup->lock);
 
-	old_hwpt_paging = find_hwpt_paging(igroup->attach->hwpt);
+	attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
+	old_hwpt_paging = find_hwpt_paging(attach->hwpt);
 	if (!old_hwpt_paging || hwpt_paging->ioas != old_hwpt_paging->ioas) {
-		xa_for_each(&igroup->attach->device_array, index, cur) {
+		xa_for_each(&attach->device_array, index, cur) {
 			rc = iopt_table_enforce_dev_resv_regions(
 				&hwpt_paging->ioas->iopt, cur->dev, NULL);
 			if (rc)
@@ -787,8 +808,15 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 
 	mutex_lock(&igroup->lock);
 
-	attach = igroup->attach;
+	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
+			    XA_ZERO_ENTRY, GFP_KERNEL);
+	if (xa_is_err(attach)) {
+		rc = xa_err(attach);
+		goto err_unlock;
+	}
+
 	if (!attach) {
+		xa_release(&igroup->pasid_attach, pasid);
 		rc = -EINVAL;
 		goto err_unlock;
 	}
@@ -797,7 +825,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 
 	WARN_ON(!old_hwpt || xa_empty(&attach->device_array));
 
-	if (!iommufd_device_is_attached(idev)) {
+	if (!iommufd_device_is_attached(idev, pasid)) {
 		rc = -EINVAL;
 		goto err_unlock;
 	}
@@ -824,7 +852,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
 
 	attach->hwpt = hwpt;
 
-	num_devices = iommufd_group_device_num(igroup);
+	num_devices = iommufd_group_device_num(igroup, pasid);
 	/*
 	 * Move the refcounts held by the device_array to the new hwpt. Retain a
 	 * refcount for this thread as the caller will free it.
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 09f5086e37cb..03a231f8f5bf 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -403,7 +403,7 @@ struct iommufd_group {
 	struct mutex lock;
 	struct iommufd_ctx *ictx;
 	struct iommu_group *group;
-	struct iommufd_attach *attach;
+	struct xarray pasid_attach;
 	struct iommufd_sw_msi_maps required_sw_msi;
 	phys_addr_t sw_msi_start;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 10/18] iommufd: Enforce PASID-compatible domain in PASID path
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (8 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 13:47 ` [PATCH v10 11/18] iommufd: Support pasid attach/replace Yi Liu
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

AMD IOMMU requires attaching PASID-compatible domains to PASID-capable
devices. This includes the domains attached to RID and PASIDs. Related
discussions in link [1] and [2]. ARM also has such a requirement, Intel
does not need it, but can live up with it. Hence, iommufd is going to
enforce this requirement as it is not harmful to vendors that do not
need it.

Mark the PASID-compatible domains and enforce it in the PASID path.

[1] https://lore.kernel.org/linux-iommu/20240709182303.GK14050@ziepe.ca/
[2] https://lore.kernel.org/linux-iommu/20240822124433.GD3468552@ziepe.ca/

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 17 +++++++++++++++++
 drivers/iommu/iommufd/hw_pagetable.c    |  3 +++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 370aed636e8d..6d003f9f0668 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -510,6 +510,15 @@ static bool iommufd_device_is_attached(struct iommufd_device *idev,
 	return xa_load(&attach->device_array, idev->obj.id);
 }
 
+static int iommufd_hwpt_pasid_compat(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev,
+				     ioasid_t pasid)
+{
+	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
+		return -EINVAL;
+	return 0;
+}
+
 static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 				      struct iommufd_device *idev,
 				      ioasid_t pasid)
@@ -519,6 +528,10 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 
 	lockdep_assert_held(&idev->igroup->lock);
 
+	rc = iommufd_hwpt_pasid_compat(hwpt, idev, pasid);
+	if (rc)
+		return rc;
+
 	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
 	if (!handle)
 		return -ENOMEM;
@@ -587,6 +600,10 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 
 	WARN_ON(pasid != IOMMU_NO_PASID);
 
+	rc = iommufd_hwpt_pasid_compat(hwpt, idev, pasid);
+	if (rc)
+		return rc;
+
 	old_handle = iommufd_device_get_attach_handle(idev, pasid);
 
 	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 9bf970b6a1c3..b8c1e5ca7be8 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -136,6 +136,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	if (IS_ERR(hwpt_paging))
 		return ERR_CAST(hwpt_paging);
 	hwpt = &hwpt_paging->common;
+	hwpt->pasid_compat = flags & IOMMU_HWPT_ALLOC_PASID;
 
 	INIT_LIST_HEAD(&hwpt_paging->hwpt_item);
 	/* Pairs with iommufd_hw_pagetable_destroy() */
@@ -244,6 +245,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	if (IS_ERR(hwpt_nested))
 		return ERR_CAST(hwpt_nested);
 	hwpt = &hwpt_nested->common;
+	hwpt->pasid_compat = flags & IOMMU_HWPT_ALLOC_PASID;
 
 	refcount_inc(&parent->common.obj.users);
 	hwpt_nested->parent = parent;
@@ -300,6 +302,7 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
 	if (IS_ERR(hwpt_nested))
 		return ERR_CAST(hwpt_nested);
 	hwpt = &hwpt_nested->common;
+	hwpt->pasid_compat = flags & IOMMU_HWPT_ALLOC_PASID;
 
 	hwpt_nested->viommu = viommu;
 	refcount_inc(&viommu->obj.users);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 03a231f8f5bf..84c89f594ed0 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -296,6 +296,7 @@ struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
 	struct iommu_domain *domain;
 	struct iommufd_fault *fault;
+	bool pasid_compat : 1;
 };
 
 struct iommufd_hwpt_paging {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (9 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 10/18] iommufd: Enforce PASID-compatible domain in PASID path Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 20:42   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID Yi Liu
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

This extends the below APIs to support PASID. Device drivers to manage pasid
attach/replace/detach.

    int iommufd_device_attach(struct iommufd_device *idev,
			      ioasid_t pasid, u32 *pt_id);
    int iommufd_device_replace(struct iommufd_device *idev,
			       ioasid_t pasid, u32 *pt_id);
    void iommufd_device_detach(struct iommufd_device *idev,
			       ioasid_t pasid);

The pasid operations share underlying attach/replace/detach infrastructure
with the device operations, but still have some different implications:

 - no reserved region per pasid otherwise SVA architecture is already
   broken (CPU address space doesn't count device reserved regions);

 - accordingly no sw_msi trick;

Cache coherency enforcement is still applied to pasid operations since
it is about memory accesses post page table walking (no matter the walk
is per RID or per PASID).

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 -> v10: minor tweaks per rebase
---
 drivers/iommu/iommufd/device.c   | 57 ++++++++++++++++++++------------
 drivers/iommu/iommufd/selftest.c |  8 ++---
 drivers/vfio/iommufd.c           | 10 +++---
 include/linux/iommufd.h          |  9 +++--
 4 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 6d003f9f0668..17c424d9a355 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -543,9 +543,12 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 	}
 
 	handle->idev = idev;
-	WARN_ON(pasid != IOMMU_NO_PASID);
-	rc = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
-				       &handle->handle);
+	if (pasid == IOMMU_NO_PASID)
+		rc = iommu_attach_group_handle(hwpt->domain, idev->igroup->group,
+					       &handle->handle);
+	else
+		rc = iommu_attach_device_pasid(hwpt->domain, idev->dev, pasid,
+					       &handle->handle);
 	if (rc)
 		goto out_disable_iopf;
 
@@ -579,10 +582,12 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
 {
 	struct iommufd_attach_handle *handle;
 
-	WARN_ON(pasid != IOMMU_NO_PASID);
+	if (pasid == IOMMU_NO_PASID)
+		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
+	else
+		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
 
 	handle = iommufd_device_get_attach_handle(idev, pasid);
-	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
 	if (hwpt->fault) {
 		iommufd_auto_response_faults(hwpt, handle);
 		iommufd_fault_iopf_disable(idev);
@@ -598,8 +603,6 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 	struct iommufd_attach_handle *handle, *old_handle;
 	int rc;
 
-	WARN_ON(pasid != IOMMU_NO_PASID);
-
 	rc = iommufd_hwpt_pasid_compat(hwpt, idev, pasid);
 	if (rc)
 		return rc;
@@ -617,8 +620,12 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 	}
 
 	handle->idev = idev;
-	rc = iommu_replace_group_handle(idev->igroup->group, hwpt->domain,
-					&handle->handle);
+	if (pasid == IOMMU_NO_PASID)
+		rc = iommu_replace_group_handle(idev->igroup->group,
+						hwpt->domain, &handle->handle);
+	else
+		rc = iommu_replace_device_pasid(hwpt->domain, idev->dev,
+						pasid, &handle->handle);
 	if (rc)
 		goto out_disable_iopf;
 
@@ -1026,22 +1033,25 @@ static int iommufd_device_change_pt(struct iommufd_device *idev,
 }
 
 /**
- * iommufd_device_attach - Connect a device to an iommu_domain
+ * iommufd_device_attach - Connect a device/pasid to an iommu_domain
  * @idev: device to attach
+ * @pasid: pasid to attach
  * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING
  *         Output the IOMMUFD_OBJ_HWPT_PAGING ID
  *
- * This connects the device to an iommu_domain, either automatically or manually
- * selected. Once this completes the device could do DMA.
+ * This connects the device/pasid to an iommu_domain, either automatically
+ * or manually selected. Once this completes the device could do DMA with
+ * @pasid. @pasid is IOMMU_NO_PASID if this attach is for no pasid usage.
  *
  * The caller should return the resulting pt_id back to userspace.
  * This function is undone by calling iommufd_device_detach().
  */
-int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
+int iommufd_device_attach(struct iommufd_device *idev, ioasid_t pasid,
+			  u32 *pt_id)
 {
 	int rc;
 
-	rc = iommufd_device_change_pt(idev, IOMMU_NO_PASID, pt_id,
+	rc = iommufd_device_change_pt(idev, pasid, pt_id,
 				      &iommufd_device_do_attach);
 	if (rc)
 		return rc;
@@ -1056,8 +1066,9 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, "IOMMUFD");
 
 /**
- * iommufd_device_replace - Change the device's iommu_domain
+ * iommufd_device_replace - Change the device/pasid's iommu_domain
  * @idev: device to change
+ * @pasid: pasid to change
  * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING
  *         Output the IOMMUFD_OBJ_HWPT_PAGING ID
  *
@@ -1068,27 +1079,31 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, "IOMMUFD");
  *
  * If it fails then no change is made to the attachment. The iommu driver may
  * implement this so there is no disruption in translation. This can only be
- * called if iommufd_device_attach() has already succeeded.
+ * called if iommufd_device_attach() has already succeeded. @pasid is
+ * IOMMU_NO_PASID for no pasid usage.
  */
-int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
+int iommufd_device_replace(struct iommufd_device *idev, ioasid_t pasid,
+			   u32 *pt_id)
 {
-	return iommufd_device_change_pt(idev, IOMMU_NO_PASID, pt_id,
+	return iommufd_device_change_pt(idev, pasid, pt_id,
 					&iommufd_device_do_replace);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, "IOMMUFD");
 
 /**
- * iommufd_device_detach - Disconnect a device to an iommu_domain
+ * iommufd_device_detach - Disconnect a device/device to an iommu_domain
  * @idev: device to detach
+ * @pasid: pasid to detach
  *
  * Undo iommufd_device_attach(). This disconnects the idev from the previously
  * attached pt_id. The device returns back to a blocked DMA translation.
+ * @pasid is IOMMU_NO_PASID for no pasid usage.
  */
-void iommufd_device_detach(struct iommufd_device *idev)
+void iommufd_device_detach(struct iommufd_device *idev, ioasid_t pasid)
 {
 	struct iommufd_hw_pagetable *hwpt;
 
-	hwpt = iommufd_hw_pagetable_detach(idev, IOMMU_NO_PASID);
+	hwpt = iommufd_hw_pagetable_detach(idev, pasid);
 	iommufd_hw_pagetable_put(idev->ictx, hwpt);
 	refcount_dec(&idev->obj.users);
 }
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d55dde28e9bc..0b3f5cbf242b 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -945,7 +945,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	}
 	sobj->idev.idev = idev;
 
-	rc = iommufd_device_attach(idev, &pt_id);
+	rc = iommufd_device_attach(idev, IOMMU_NO_PASID, &pt_id);
 	if (rc)
 		goto out_unbind;
 
@@ -960,7 +960,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	return 0;
 
 out_detach:
-	iommufd_device_detach(idev);
+	iommufd_device_detach(idev, IOMMU_NO_PASID);
 out_unbind:
 	iommufd_device_unbind(idev);
 out_mdev:
@@ -994,7 +994,7 @@ static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
 		goto out_dev_obj;
 	}
 
-	rc = iommufd_device_replace(sobj->idev.idev, &pt_id);
+	rc = iommufd_device_replace(sobj->idev.idev, IOMMU_NO_PASID, &pt_id);
 	if (rc)
 		goto out_dev_obj;
 
@@ -1655,7 +1655,7 @@ void iommufd_selftest_destroy(struct iommufd_object *obj)
 
 	switch (sobj->type) {
 	case TYPE_IDEV:
-		iommufd_device_detach(sobj->idev.idev);
+		iommufd_device_detach(sobj->idev.idev, IOMMU_NO_PASID);
 		iommufd_device_unbind(sobj->idev.idev);
 		mock_dev_destroy(sobj->idev.mock_dev);
 		break;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 516294fd901b..37e1efa2c7bf 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -128,7 +128,7 @@ void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	if (vdev->iommufd_attached) {
-		iommufd_device_detach(vdev->iommufd_device);
+		iommufd_device_detach(vdev->iommufd_device, IOMMU_NO_PASID);
 		vdev->iommufd_attached = false;
 	}
 	iommufd_device_unbind(vdev->iommufd_device);
@@ -146,9 +146,11 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 		return -EINVAL;
 
 	if (vdev->iommufd_attached)
-		rc = iommufd_device_replace(vdev->iommufd_device, pt_id);
+		rc = iommufd_device_replace(vdev->iommufd_device,
+					    IOMMU_NO_PASID, pt_id);
 	else
-		rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
+		rc = iommufd_device_attach(vdev->iommufd_device,
+					   IOMMU_NO_PASID, pt_id);
 	if (rc)
 		return rc;
 	vdev->iommufd_attached = true;
@@ -163,7 +165,7 @@ void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
 	if (WARN_ON(!vdev->iommufd_device) || !vdev->iommufd_attached)
 		return;
 
-	iommufd_device_detach(vdev->iommufd_device);
+	iommufd_device_detach(vdev->iommufd_device, IOMMU_NO_PASID);
 	vdev->iommufd_attached = false;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 60eff9272551..34b6e6ca4bfa 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -8,6 +8,7 @@
 
 #include <linux/err.h>
 #include <linux/errno.h>
+#include <linux/iommu.h>
 #include <linux/refcount.h>
 #include <linux/types.h>
 #include <linux/xarray.h>
@@ -54,9 +55,11 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
 void iommufd_device_unbind(struct iommufd_device *idev);
 
-int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
-int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id);
-void iommufd_device_detach(struct iommufd_device *idev);
+int iommufd_device_attach(struct iommufd_device *idev, ioasid_t pasid,
+			  u32 *pt_id);
+int iommufd_device_replace(struct iommufd_device *idev, ioasid_t pasid,
+			   u32 *pt_id);
+void iommufd_device_detach(struct iommufd_device *idev, ioasid_t pasid);
 
 struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
 u32 iommufd_device_to_id(struct iommufd_device *idev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (10 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 11/18] iommufd: Support pasid attach/replace Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 17:35   ` Jason Gunthorpe
  2025-03-20 22:23   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 13/18] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
                   ` (6 subsequent siblings)
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce
the RID to use PASID-compatible domain if PASID has been attached, and
vice versa. The PASID path has already enforced it. This adds the
enforcement in the RID path.

This enforcement requires a lock across the RID and PASID attach path,
the idev->igroup->lock is used as both the RID and the PASID path holds
it.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 -> v10: split the hwpt->pasid_compat check to a separate if in the
           else statement.
---
 drivers/iommu/iommufd/device.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 17c424d9a355..54ffef9c17f7 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -514,8 +514,28 @@ static int iommufd_hwpt_pasid_compat(struct iommufd_hw_pagetable *hwpt,
 				     struct iommufd_device *idev,
 				     ioasid_t pasid)
 {
-	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
-		return -EINVAL;
+	struct iommufd_group *igroup = idev->igroup;
+
+	lockdep_assert_held(&igroup->lock);
+
+	if (pasid == IOMMU_NO_PASID) {
+		unsigned long start = IOMMU_NO_PASID;
+
+		if (!hwpt->pasid_compat &&
+		    xa_find_after(&igroup->pasid_attach,
+				  &start, UINT_MAX, XA_PRESENT))
+			return -EINVAL;
+	} else {
+		struct iommufd_attach *attach;
+
+		if (!hwpt->pasid_compat)
+			return -EINVAL;
+
+		attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
+		if (attach && attach->hwpt && !attach->hwpt->pasid_compat)
+			return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -526,8 +546,6 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
 	struct iommufd_attach_handle *handle;
 	int rc;
 
-	lockdep_assert_held(&idev->igroup->lock);
-
 	rc = iommufd_hwpt_pasid_compat(hwpt, idev, pasid);
 	if (rc)
 		return rc;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 13/18] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (11 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 13:47 ` [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain Yi Liu
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

Intel iommu driver just treats it as a nop since Intel VT-d does not have
special requirement on domains attached to either the PASID or RID of a
PASID-capable device.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 -> v10: Dropped the auto_domain terms in commit message
---
 drivers/iommu/intel/iommu.c  | 3 ++-
 drivers/iommu/intel/nested.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc46098f875b..7bc890609b90 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3338,7 +3338,8 @@ intel_iommu_domain_alloc_paging_flags(struct device *dev, u32 flags,
 	bool first_stage;
 
 	if (flags &
-	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
+	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
+	       IOMMU_HWPT_ALLOC_PASID)))
 		return ERR_PTR(-EOPNOTSUPP);
 	if (nested_parent && !nested_supported(iommu))
 		return ERR_PTR(-EOPNOTSUPP);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index aba92c00b427..6ac5c534bef4 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -198,7 +198,7 @@ intel_iommu_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	struct dmar_domain *domain;
 	int ret;
 
-	if (!nested_supported(iommu) || flags)
+	if (!nested_supported(iommu) || flags & ~IOMMU_HWPT_ALLOC_PASID)
 		return ERR_PTR(-EOPNOTSUPP);
 
 	/* Must be nested domain */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (12 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 13/18] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 17:51   ` Jason Gunthorpe
  2025-03-20 22:36   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 15/18] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
                   ` (4 subsequent siblings)
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

The underlying infrastructure has supported the PASID attach and related
enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This
extends iommufd to support PASID compatible domain requested by userspace.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 -> v10: Dropped r-b tag as the uapi description for ALLOC_PASID is modified
---
 drivers/iommu/iommufd/device.c       | 4 +++-
 drivers/iommu/iommufd/hw_pagetable.c | 7 ++++---
 include/uapi/linux/iommufd.h         | 3 +++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 54ffef9c17f7..f09dcddf777b 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -973,7 +973,9 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
 	}
 
 	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, pasid,
-						0, immediate_attach, NULL);
+						pasid != IOMMU_NO_PASID ?
+						    IOMMU_HWPT_ALLOC_PASID : 0,
+						immediate_attach, NULL);
 	if (IS_ERR(hwpt_paging)) {
 		destroy_hwpt = ERR_CAST(hwpt_paging);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index b8c1e5ca7be8..3fba11754dad 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -112,7 +112,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 {
 	const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT |
 				IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
-				IOMMU_HWPT_FAULT_ID_VALID;
+				IOMMU_HWPT_FAULT_ID_VALID |
+				IOMMU_HWPT_ALLOC_PASID;
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
 	struct iommufd_hwpt_paging *hwpt_paging;
 	struct iommufd_hw_pagetable *hwpt;
@@ -233,7 +234,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+	if ((flags & ~(IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID)) ||
 	    !user_data->len || !ops->domain_alloc_nested)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (parent->auto_domain || !parent->nest_parent ||
@@ -290,7 +291,7 @@ iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if (flags & ~IOMMU_HWPT_FAULT_ID_VALID)
+	if (flags & ~(IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID))
 		return ERR_PTR(-EOPNOTSUPP);
 	if (!user_data->len)
 		return ERR_PTR(-EOPNOTSUPP);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 5fc7e27804b7..75f3b6829624 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -393,6 +393,9 @@ struct iommu_vfio_ioas {
  *                          Any domain attached to the non-PASID part of the
  *                          device must also be flagged, otherwise attaching a
  *                          PASID will blocked.
+ *                          For the user that wants to attach PASID, ioas is
+ *                          not recommended for the non-PASID part of the
+ *                          device.
  *                          If IOMMU does not support PASID it will return
  *                          error (-EOPNOTSUPP).
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 15/18] iommufd/selftest: Add set_dev_pasid in mock iommu
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (13 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 22:48   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 16/18] iommufd/selftest: Add a helper to get test device Yi Liu
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

The callback is needed to make pasid_attach/detach path complete for mock
device. A nop is enough for set_dev_pasid.

A MOCK_FLAGS_DEVICE_PASID is added to indicate a pasid-capable mock device
for the pasid test cases. Other test cases will still create a non-pasid
mock device. While the mock iommu always pretends to be pasid-capable.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
v9 -> v10: Added MOCK_PASID_WIDTH to replace 20
---
 drivers/iommu/iommufd/iommufd_test.h |  4 ++++
 drivers/iommu/iommufd/selftest.c     | 33 ++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 87e9165cea27..1a066feb8697 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -49,6 +49,7 @@ enum {
 enum {
 	MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
 	MOCK_FLAGS_DEVICE_HUGE_IOVA = 1 << 1,
+	MOCK_FLAGS_DEVICE_PASID = 1 << 2,
 };
 
 enum {
@@ -154,6 +155,9 @@ struct iommu_test_cmd {
 };
 #define IOMMU_TEST_CMD _IO(IOMMUFD_TYPE, IOMMUFD_CMD_BASE + 32)
 
+/* Mock device/iommu PASID width */
+#define MOCK_PASID_WIDTH 20
+
 /* Mock structs for IOMMU_DEVICE_GET_HW_INFO ioctl */
 #define IOMMU_HW_INFO_TYPE_SELFTEST	0xfeedbeef
 #define IOMMU_HW_INFO_SELFTEST_REGVAL	0xdeadbeef
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 0b3f5cbf242b..c26730375a72 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -223,8 +223,16 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
 	return 0;
 }
 
+static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
+					 struct device *dev, ioasid_t pasid,
+					 struct iommu_domain *old)
+{
+	return 0;
+}
+
 static const struct iommu_domain_ops mock_blocking_ops = {
 	.attach_dev = mock_domain_nop_attach,
+	.set_dev_pasid = mock_domain_set_dev_pasid_nop
 };
 
 static struct iommu_domain mock_blocking_domain = {
@@ -366,7 +374,7 @@ mock_domain_alloc_nested(struct device *dev, struct iommu_domain *parent,
 	struct mock_iommu_domain_nested *mock_nested;
 	struct mock_iommu_domain *mock_parent;
 
-	if (flags)
+	if (flags & ~IOMMU_HWPT_ALLOC_PASID)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (!parent || parent->ops != mock_ops.default_domain_ops)
 		return ERR_PTR(-EINVAL);
@@ -388,7 +396,8 @@ mock_domain_alloc_paging_flags(struct device *dev, u32 flags,
 {
 	bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
 	const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
-				 IOMMU_HWPT_ALLOC_NEST_PARENT;
+				 IOMMU_HWPT_ALLOC_NEST_PARENT |
+				 IOMMU_HWPT_ALLOC_PASID;
 	struct mock_dev *mdev = to_mock_dev(dev);
 	bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY;
 	struct mock_iommu_domain *mock;
@@ -608,7 +617,7 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
 	struct mock_viommu *mock_viommu = to_mock_viommu(viommu);
 	struct mock_iommu_domain_nested *mock_nested;
 
-	if (flags)
+	if (flags & ~IOMMU_HWPT_ALLOC_PASID)
 		return ERR_PTR(-EOPNOTSUPP);
 
 	mock_nested = __mock_domain_alloc_nested(user_data);
@@ -743,6 +752,7 @@ static const struct iommu_ops mock_ops = {
 			.map_pages = mock_domain_map_pages,
 			.unmap_pages = mock_domain_unmap_pages,
 			.iova_to_phys = mock_domain_iova_to_phys,
+			.set_dev_pasid = mock_domain_set_dev_pasid_nop,
 		},
 };
 
@@ -803,6 +813,7 @@ static struct iommu_domain_ops domain_nested_ops = {
 	.free = mock_domain_free_nested,
 	.attach_dev = mock_domain_nop_attach,
 	.cache_invalidate_user = mock_domain_cache_invalidate_user,
+	.set_dev_pasid = mock_domain_set_dev_pasid_nop,
 };
 
 static inline struct iommufd_hw_pagetable *
@@ -862,11 +873,16 @@ static void mock_dev_release(struct device *dev)
 
 static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 {
+	struct property_entry prop[] = {
+		PROPERTY_ENTRY_U32("pasid-num-bits", MOCK_PASID_WIDTH),
+		{},
+	};
 	struct mock_dev *mdev;
 	int rc, i;
 
 	if (dev_flags &
-	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
+	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY |
+		    MOCK_FLAGS_DEVICE_HUGE_IOVA | MOCK_FLAGS_DEVICE_PASID))
 		return ERR_PTR(-EINVAL);
 
 	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
@@ -890,6 +906,14 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 	if (rc)
 		goto err_put;
 
+	if (dev_flags & MOCK_FLAGS_DEVICE_PASID) {
+		rc = device_create_managed_software_node(&mdev->dev, prop, NULL);
+		if (rc) {
+			dev_err(&mdev->dev, "add pasid-num-bits property failed, rc: %d", rc);
+			goto err_put;
+		}
+	}
+
 	rc = device_add(&mdev->dev);
 	if (rc)
 		goto err_put;
@@ -1778,6 +1802,7 @@ int __init iommufd_test_init(void)
 	init_completion(&mock_iommu.complete);
 
 	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+	mock_iommu.iommu_dev.max_pasids = (1 << MOCK_PASID_WIDTH);
 
 	return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 16/18] iommufd/selftest: Add a helper to get test device
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (14 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 15/18] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 13:47 ` [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

There is need to get the selftest device (sobj->type == TYPE_IDEV) in
multiple places, so have a helper to for it.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/selftest.c | 36 ++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index c26730375a72..691e7a23f300 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -994,39 +994,49 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
-/* Replace the mock domain with a manually allocated hw_pagetable */
-static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
-					    unsigned int device_id, u32 pt_id,
-					    struct iommu_test_cmd *cmd)
+static struct selftest_obj *
+iommufd_test_get_selftest_obj(struct iommufd_ctx *ictx, u32 id)
 {
 	struct iommufd_object *dev_obj;
 	struct selftest_obj *sobj;
-	int rc;
 
 	/*
 	 * Prefer to use the OBJ_SELFTEST because the destroy_rwsem will ensure
 	 * it doesn't race with detach, which is not allowed.
 	 */
-	dev_obj =
-		iommufd_get_object(ucmd->ictx, device_id, IOMMUFD_OBJ_SELFTEST);
+	dev_obj = iommufd_get_object(ictx, id, IOMMUFD_OBJ_SELFTEST);
 	if (IS_ERR(dev_obj))
-		return PTR_ERR(dev_obj);
+		return ERR_CAST(dev_obj);
 
 	sobj = to_selftest_obj(dev_obj);
 	if (sobj->type != TYPE_IDEV) {
-		rc = -EINVAL;
-		goto out_dev_obj;
+		iommufd_put_object(ictx, dev_obj);
+		return ERR_PTR(-EINVAL);
 	}
+	return sobj;
+}
+
+/* Replace the mock domain with a manually allocated hw_pagetable */
+static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
+					    unsigned int device_id, u32 pt_id,
+					    struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+	int rc;
+
+	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, device_id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
 
 	rc = iommufd_device_replace(sobj->idev.idev, IOMMU_NO_PASID, &pt_id);
 	if (rc)
-		goto out_dev_obj;
+		goto out_sobj;
 
 	cmd->mock_domain_replace.pt_id = pt_id;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 
-out_dev_obj:
-	iommufd_put_object(ucmd->ictx, dev_obj);
+out_sobj:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
 	return rc;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (15 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 16/18] iommufd/selftest: Add a helper to get test device Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-20 23:17   ` Nicolin Chen
  2025-03-20 23:20   ` Nicolin Chen
  2025-03-20 13:47 ` [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd " Yi Liu
  2025-03-20 13:59 ` [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
  18 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

This adds 5 test ops for pasid attach/replace/detach testing. There are
ops to attach/detach pasid, and also op to check the attached domain of
a pasid.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  31 ++++++
 drivers/iommu/iommufd/selftest.c     | 151 +++++++++++++++++++++++++++
 2 files changed, 182 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 1a066feb8697..efcb509f7d56 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -25,6 +25,11 @@ enum {
 	IOMMU_TEST_OP_TRIGGER_IOPF,
 	IOMMU_TEST_OP_DEV_CHECK_CACHE,
 	IOMMU_TEST_OP_TRIGGER_VEVENT,
+	IOMMU_TEST_OP_PASID_ATTACH,
+	IOMMU_TEST_OP_PASID_REPLACE,
+	IOMMU_TEST_OP_PASID_MIX_REPLACE_HANDLE,
+	IOMMU_TEST_OP_PASID_DETACH,
+	IOMMU_TEST_OP_PASID_CHECK_DOMAIN,
 };
 
 enum {
@@ -150,6 +155,32 @@ struct iommu_test_cmd {
 		struct {
 			__u32 dev_id;
 		} trigger_vevent;
+		struct {
+			__u32 pasid;
+			__u32 pt_id;
+			/* @id is stdev_id
+			 * pasid#1024 is for special test, do not use it
+			 * in normal case.
+			 */
+		} pasid_attach;
+		struct {
+			__u32 pasid;
+			__u32 pt_id;
+			/* @id is stdev_id
+			 * pasid#1024 is for special test, do not use it
+			 * in normal case.
+			 */
+		} pasid_replace;
+		struct {
+			__u32 pasid;
+			/* @id is stdev_id */
+		} pasid_detach;
+		struct {
+			__u32 pasid;
+			__u32 hwpt_id;
+			__u64 out_result_ptr;
+			/* @id is stdev_id */
+		} pasid_check;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 691e7a23f300..37c9cd285541 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -223,10 +223,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
 	return 0;
 }
 
+static bool pasid_1024_attached;
+
 static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
 					 struct device *dev, ioasid_t pasid,
 					 struct iommu_domain *old)
 {
+	/*
+	 * First attach with pasid 1024 succ, second attach would fail.
+	 * This is helpful to test the case in which the iommu core needs
+	 * to rollback to old domain due to driver failure.
+	 */
+	if (pasid == 1024) {
+		if (domain->type == IOMMU_DOMAIN_BLOCKED) {
+			pasid_1024_attached = false;
+		} else if (pasid_1024_attached) {
+			pasid_1024_attached = false;
+			// Fake an error to fail the replacement
+			return -ENOMEM;
+		} else {
+			pasid_1024_attached = true;
+		}
+	}
+
 	return 0;
 }
 
@@ -1683,6 +1702,129 @@ static int iommufd_test_trigger_vevent(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+	struct iommufd_object *pt_obj;
+
+	pt_obj = iommufd_get_object(ucmd->ictx, id, IOMMUFD_OBJ_ANY);
+	if (IS_ERR(pt_obj))
+		return ERR_CAST(pt_obj);
+
+	if (pt_obj->type != IOMMUFD_OBJ_HWPT_NESTED &&
+	    pt_obj->type != IOMMUFD_OBJ_HWPT_PAGING) {
+		iommufd_put_object(ucmd->ictx, pt_obj);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+}
+
+static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
+					   struct iommu_test_cmd *cmd)
+{
+	struct iommu_domain *attached_domain, *expect_domain = NULL;
+	struct iommufd_hw_pagetable *hwpt = NULL;
+	struct iommu_attach_handle *handle;
+	struct selftest_obj *sobj;
+	struct mock_dev *mdev;
+	bool result;
+	int rc = 0;
+
+	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	mdev = sobj->idev.mock_dev;
+
+	handle = iommu_attach_handle_get(mdev->dev.iommu_group,
+					 cmd->pasid_check.pasid, 0);
+	if (IS_ERR(handle))
+		attached_domain = NULL;
+	else
+		attached_domain = handle->domain;
+
+	if (cmd->pasid_check.hwpt_id) {
+		hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
+		if (IS_ERR(hwpt)) {
+			rc = PTR_ERR(hwpt);
+			goto out_put_dev;
+		}
+		expect_domain = hwpt->domain;
+	}
+
+	result = (attached_domain == expect_domain) ? 1 : 0;
+	if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr),
+			 &result, sizeof(result)))
+		rc = -EFAULT;
+	if (hwpt)
+		iommufd_put_object(ucmd->ictx, &hwpt->obj);
+out_put_dev:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return rc;
+}
+
+static int iommufd_test_pasid_attach(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+	int rc;
+
+	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	rc = iommufd_device_attach(sobj->idev.idev, cmd->pasid_attach.pasid,
+				   &cmd->pasid_attach.pt_id);
+	if (rc)
+		goto out_sobj;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		iommufd_device_detach(sobj->idev.idev,
+				      cmd->pasid_attach.pasid);
+
+out_sobj:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return rc;
+}
+
+static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd,
+				      struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+	int rc;
+
+	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	rc = iommufd_device_replace(sobj->idev.idev, cmd->pasid_attach.pasid,
+				    &cmd->pasid_attach.pt_id);
+	if (rc)
+		goto out_sobj;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+
+out_sobj:
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return rc;
+}
+
+static int iommufd_test_pasid_detach(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct selftest_obj *sobj;
+
+	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
+	if (IS_ERR(sobj))
+		return PTR_ERR(sobj);
+
+	iommufd_device_detach(sobj->idev.idev, cmd->pasid_detach.pasid);
+	iommufd_put_object(ucmd->ictx, &sobj->obj);
+	return 0;
+}
+
 void iommufd_selftest_destroy(struct iommufd_object *obj)
 {
 	struct selftest_obj *sobj = to_selftest_obj(obj);
@@ -1766,6 +1908,14 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 		return iommufd_test_trigger_iopf(ucmd, cmd);
 	case IOMMU_TEST_OP_TRIGGER_VEVENT:
 		return iommufd_test_trigger_vevent(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_ATTACH:
+		return iommufd_test_pasid_attach(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_REPLACE:
+		return iommufd_test_pasid_replace(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_DETACH:
+		return iommufd_test_pasid_detach(ucmd, cmd);
+	case IOMMU_TEST_OP_PASID_CHECK_DOMAIN:
+		return iommufd_test_pasid_check_domain(ucmd, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -1813,6 +1963,7 @@ int __init iommufd_test_init(void)
 
 	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
 	mock_iommu.iommu_dev.max_pasids = (1 << MOCK_PASID_WIDTH);
+	pasid_1024_attached = false;
 
 	return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd pasid attach/detach
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (16 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
@ 2025-03-20 13:47 ` Yi Liu
  2025-03-21  0:34   ` Nicolin Chen
  2025-03-20 13:59 ` [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
  18 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:47 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, yi.l.liu, iommu, nicolinc

This tests iommufd pasid attach/replace/detach.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 344 ++++++++++++++++++
 .../selftests/iommu/iommufd_fail_nth.c        |  41 ++-
 tools/testing/selftests/iommu/iommufd_utils.h | 102 ++++++
 3 files changed, 480 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 156c74da53cd..f756c68084c9 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2996,4 +2996,348 @@ TEST_F(iommufd_viommu, vdevice_cache)
 	}
 }
 
+FIXTURE(iommufd_device_pasid)
+{
+	int fd;
+	uint32_t ioas_id;
+	uint32_t hwpt_id;
+	uint32_t stdev_id;
+	uint32_t device_id;
+	uint32_t no_pasid_stdev_id;
+	uint32_t no_pasid_device_id;
+};
+
+FIXTURE_VARIANT(iommufd_device_pasid)
+{
+	bool pasid_capable;
+};
+
+FIXTURE_SETUP(iommufd_device_pasid)
+{
+	self->fd = open("/dev/iommu", O_RDWR);
+	ASSERT_NE(-1, self->fd);
+	test_ioctl_ioas_alloc(&self->ioas_id);
+
+	test_cmd_mock_domain_flags(self->ioas_id,
+				   MOCK_FLAGS_DEVICE_PASID,
+				   &self->stdev_id, &self->hwpt_id,
+				   &self->device_id);
+	if (!variant->pasid_capable)
+		test_cmd_mock_domain_flags(self->ioas_id, 0,
+					   &self->no_pasid_stdev_id, NULL,
+					   &self->no_pasid_device_id);
+}
+
+FIXTURE_TEARDOWN(iommufd_device_pasid)
+{
+	teardown_iommufd(self->fd, _metadata);
+}
+
+FIXTURE_VARIANT_ADD(iommufd_device_pasid, no_pasid)
+{
+	.pasid_capable = false,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_device_pasid, has_pasid)
+{
+	.pasid_capable = true,
+};
+
+TEST_F(iommufd_device_pasid, pasid_attach)
+{
+	struct iommu_hwpt_selftest data = {
+		.iotlb =  IOMMU_TEST_IOTLB_DEFAULT,
+	};
+	uint32_t nested_hwpt_id[3] = {};
+	uint32_t parent_hwpt_id = 0;
+	uint32_t fault_id, fault_fd;
+	uint32_t s2_hwpt_id = 0;
+	uint32_t iopf_hwpt_id;
+	uint32_t pasid = 100;
+	uint32_t auto_hwpt;
+	uint32_t viommu_id;
+	bool result;
+
+	/* Allocate two nested hwpts sharing one common parent hwpt */
+	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+			    IOMMU_HWPT_ALLOC_NEST_PARENT,
+			    &parent_hwpt_id);
+	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+				   IOMMU_HWPT_ALLOC_PASID,
+				   &nested_hwpt_id[0],
+				   IOMMU_HWPT_DATA_SELFTEST,
+				   &data, sizeof(data));
+	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+				   IOMMU_HWPT_ALLOC_PASID,
+				   &nested_hwpt_id[1],
+				   IOMMU_HWPT_DATA_SELFTEST,
+				   &data, sizeof(data));
+
+	/* Faulte related preparation */
+	test_ioctl_fault_alloc(&fault_id, &fault_fd);
+	test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+				 IOMMU_HWPT_FAULT_ID_VALID | IOMMU_HWPT_ALLOC_PASID,
+				 &iopf_hwpt_id,
+				 IOMMU_HWPT_DATA_SELFTEST, &data,
+				 sizeof(data));
+
+	/* Allocate a regular nested hwpt based on viommu */
+	test_cmd_viommu_alloc(self->device_id, parent_hwpt_id,
+			      IOMMU_VIOMMU_TYPE_SELFTEST,
+			      &viommu_id);
+	test_cmd_hwpt_alloc_nested(self->device_id, viommu_id,
+				   IOMMU_HWPT_ALLOC_PASID,
+				   &nested_hwpt_id[2],
+				   IOMMU_HWPT_DATA_SELFTEST, &data,
+				   sizeof(data));
+
+	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+			    IOMMU_HWPT_ALLOC_PASID,
+			    &s2_hwpt_id);
+
+	/* Attach RID to non-pasid compat domain, */
+	test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
+	/* then attach to pasid should fail */
+	test_err_pasid_attach(EINVAL, pasid, s2_hwpt_id, NULL);
+
+	/* Attach RID to pasid compat domain, */
+	test_cmd_mock_domain_replace(self->stdev_id, s2_hwpt_id);
+	/* then attach to pasid should succeed, */
+	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
+	/* but attach RID to non-pasid compat domain should fail now. */
+	test_err_mock_domain_replace(EINVAL, self->stdev_id, parent_hwpt_id);
+	test_cmd_pasid_detach(pasid);
+
+	if (!variant->pasid_capable) {
+		/*
+		 * PASID-compatible domain can be used by non-PASID-capable
+		 * device.
+		 */
+		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, nested_hwpt_id[0]);
+		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, self->ioas_id);
+		/*
+		 * Attach hwpt to pasid#100 of non-PASID-capable device,
+		 * should fail, no matter domain is pasid-comapt or not.
+		 */
+		EXPECT_ERRNO(EINVAL,
+			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
+						    pasid, parent_hwpt_id, NULL));
+		EXPECT_ERRNO(EINVAL,
+			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
+						    pasid, s2_hwpt_id, NULL));
+	}
+
+	/*
+	 * Attach non pasid compat hwpt to pasid-capable device, should
+	 * fail, and have null domain.
+	 */
+	test_err_pasid_attach(EINVAL, pasid, parent_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Attach ioas to pasid 100, should succeed, domain should
+	 * be valid.
+	 */
+	test_cmd_pasid_attach(pasid, self->ioas_id, &auto_hwpt);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, auto_hwpt, &result));
+	EXPECT_EQ(1, result);
+
+	/* Attach to pasid 100 which has been attached, should fail. */
+	test_err_pasid_attach(EBUSY, pasid, self->ioas_id, &auto_hwpt);
+
+	/*
+	 * Try attach pasid 100 with another hwpt, should FAIL
+	 * as attach does not allow overwrite, use REPLACE instead.
+	 */
+	test_err_pasid_attach(EBUSY, pasid, nested_hwpt_id[0], NULL);
+
+	/*
+	 * Detach hwpt from pasid 100, and check if the pasid 100
+	 * has null domain. Should be done before the next attach.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Attach nested hwpt to pasid 100, should succeed, domain
+	 * should be valid.
+	 */
+	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, nested_hwpt_id[0],
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/* Attach to pasid 100 which has been attached, should fail. */
+	test_err_pasid_attach(EBUSY, pasid, nested_hwpt_id[0], NULL);
+
+	/*
+	 * Detach hwpt from pasid 100, and check if the pasid 100
+	 * has null domain
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/* Replace tests */
+
+	pasid = 200;
+	/*
+	 * Replace pasid 200 without attaching it first, should
+	 * fail with -EINVAL.
+	 */
+	test_err_cmd_pasid_replace(EINVAL, pasid, s2_hwpt_id, NULL);
+
+	/*
+	 * Attach a s2 hwpt to pasid 200, should succeed, domain should
+	 * be valid.
+	 */
+	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace pasid 200 with self->ioas_id, should succeed,
+	 * and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, self->ioas_id, &auto_hwpt);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, auto_hwpt,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace a nested hwpt for pasid 200, should succeed,
+	 * and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, nested_hwpt_id[0], NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, nested_hwpt_id[0],
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace with another nested hwpt for pasid 200, should
+	 * succeed, and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, nested_hwpt_id[1], NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, nested_hwpt_id[1],
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Detach hwpt from pasid 200, and check if the pasid 200
+	 * has null domain.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/* Negative Tests for pasid replace, use pasid 1024 */
+
+	/*
+	 * Attach a s2 hwpt to pasid 1024, should succeed, domain should
+	 * be valid.
+	 */
+	pasid = 1024;
+	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace pasid 1024 with self->ioas_id, should fail,
+	 * but have the old valid domain. This is a designed
+	 * negative case, normally replace with self->ioas_id
+	 * could succeed.
+	 */
+	test_err_cmd_pasid_replace(ENOMEM, pasid, self->ioas_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Detach hwpt from pasid 1024, and check if the pasid 1024
+	 * has null domain.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	/* Attach to iopf-capable hwpt */
+
+	/*
+	 * Attach an iopf hwpt to pasid 2048, should succeed, domain should
+	 * be valid.
+	 */
+	pasid = 2048;
+	test_cmd_pasid_attach(pasid, iopf_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, iopf_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Replace with s2_hwpt_id for pasid 2048, should
+	 * succeed, and have valid domain.
+	 */
+	test_cmd_pasid_replace(pasid, s2_hwpt_id, NULL);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, s2_hwpt_id,
+					      &result));
+	EXPECT_EQ(1, result);
+
+	/*
+	 * Detach hwpt from pasid 2048, and check if the pasid 2048
+	 * has null domain.
+	 */
+	test_cmd_pasid_detach(pasid);
+	ASSERT_EQ(0,
+		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
+					      pasid, 0, &result));
+	EXPECT_EQ(1, result);
+
+	test_ioctl_destroy(iopf_hwpt_id);
+	close(fault_fd);
+	test_ioctl_destroy(fault_id);
+
+	/* Detach the s2_hwpt_id from RID */
+	test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+
+	test_ioctl_destroy(nested_hwpt_id[0]);
+	test_ioctl_destroy(nested_hwpt_id[1]);
+	test_ioctl_destroy(nested_hwpt_id[2]);
+	test_ioctl_destroy(viommu_id);
+	test_ioctl_destroy(parent_hwpt_id);
+	test_ioctl_destroy(s2_hwpt_id);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 99a7f7897bb2..8f27a40ef2d9 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -209,12 +209,16 @@ FIXTURE(basic_fail_nth)
 {
 	int fd;
 	uint32_t access_id;
+	uint32_t stdev_id;
+	uint32_t pasid;
 };
 
 FIXTURE_SETUP(basic_fail_nth)
 {
 	self->fd = -1;
 	self->access_id = 0;
+	self->stdev_id = 0;
+	self->pasid = 0; //test should use a non-zero value
 }
 
 FIXTURE_TEARDOWN(basic_fail_nth)
@@ -226,6 +230,8 @@ FIXTURE_TEARDOWN(basic_fail_nth)
 		rc = _test_cmd_destroy_access(self->access_id);
 		assert(rc == 0);
 	}
+	if (self->pasid && self->stdev_id)
+		_test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid);
 	teardown_iommufd(self->fd, _metadata);
 }
 
@@ -624,7 +630,6 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	uint32_t fault_hwpt_id;
 	uint32_t ioas_id;
 	uint32_t ioas_id2;
-	uint32_t stdev_id;
 	uint32_t idev_id;
 	uint32_t hwpt_id;
 	uint32_t viommu_id;
@@ -655,25 +660,29 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
-				  &idev_id))
+	if (_test_cmd_mock_domain_flags(self->fd, ioas_id,
+					MOCK_FLAGS_DEVICE_PASID,
+					&self->stdev_id, NULL, &idev_id))
 		return -1;
 
 	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0,
+				 IOMMU_HWPT_ALLOC_PASID, &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
-	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
+	if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, ioas_id2, NULL))
 		return -1;
 
-	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL))
+	if (_test_cmd_mock_domain_replace(self->fd, self->stdev_id, hwpt_id, NULL))
 		return -1;
 
 	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0,
-				 IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id,
+				 IOMMU_HWPT_ALLOC_NEST_PARENT |
+						IOMMU_HWPT_ALLOC_PASID,
+				 &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
@@ -699,6 +708,24 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 		return -1;
 	close(veventq_fd);
 
+	self->pasid = 200;
+
+	/* Tests for pasid attach/replace/detach */
+	if (_test_cmd_pasid_attach(self->fd, self->stdev_id,
+				   self->pasid, ioas_id, NULL)) {
+		self->pasid = 0;
+		return -1;
+	}
+
+	if (_test_cmd_pasid_replace(self->fd, self->stdev_id,
+				    self->pasid, ioas_id2, NULL))
+		return -1;
+
+	if (_test_cmd_pasid_detach(self->fd, self->stdev_id, self->pasid))
+		return -1;
+
+	self->pasid = 0;
+
 	return 0;
 }
 
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 6f2ba2fa8f76..6140ebc83803 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1051,3 +1051,105 @@ static int _test_cmd_read_vevents(int fd, __u32 event_fd, __u32 nvevents,
 	EXPECT_ERRNO(_errno,                                                 \
 		     _test_cmd_read_vevents(self->fd, event_fd, nvevents,    \
 					    virt_id, prev_seq))
+
+static int _test_cmd_pasid_attach(int fd, __u32 stdev_id, __u32 pasid,
+				  __u32 pt_id, __u32 *out_pt_id)
+{
+	struct iommu_test_cmd test_attach = {
+		.size = sizeof(test_attach),
+		.op = IOMMU_TEST_OP_PASID_ATTACH,
+		.id = stdev_id,
+		.pasid_attach = {
+			.pasid = pasid,
+			.pt_id = pt_id,
+		},
+	};
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_ATTACH),
+		    &test_attach);
+	if (ret)
+		return ret;
+
+	if (out_pt_id)
+		*out_pt_id = test_attach.pasid_attach.pt_id;
+	return 0;
+}
+
+#define test_cmd_pasid_attach(pasid, hwpt_id, out_pt_id) \
+	ASSERT_EQ(0, _test_cmd_pasid_attach(self->fd, self->stdev_id, \
+					    pasid, hwpt_id, out_pt_id))
+
+#define test_err_pasid_attach(_errno, pasid, hwpt_id, out_pt_id) \
+	EXPECT_ERRNO(_errno, \
+		     _test_cmd_pasid_attach(self->fd, self->stdev_id, \
+					    pasid, hwpt_id, out_pt_id))
+
+static int _test_cmd_pasid_replace(int fd, __u32 stdev_id, __u32 pasid,
+				   __u32 pt_id, __u32 *out_pt_id)
+{
+	struct iommu_test_cmd test_replace = {
+		.size = sizeof(test_replace),
+		.op = IOMMU_TEST_OP_PASID_REPLACE,
+		.id = stdev_id,
+		.pasid_replace = {
+			.pasid = pasid,
+			.pt_id = pt_id,
+		},
+	};
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_REPLACE),
+		    &test_replace);
+	if (ret)
+		return ret;
+
+	if (out_pt_id)
+		*out_pt_id = test_replace.pasid_replace.pt_id;
+	return 0;
+}
+
+#define test_cmd_pasid_replace(pasid, hwpt_id, out_pt_id) \
+	ASSERT_EQ(0, _test_cmd_pasid_replace(self->fd, self->stdev_id, \
+					     pasid, hwpt_id, out_pt_id))
+
+#define test_err_cmd_pasid_replace(_errno, pasid, hwpt_id, out_pt_id) \
+	EXPECT_ERRNO(_errno, \
+		     _test_cmd_pasid_replace(self->fd, self->stdev_id, \
+					     pasid, hwpt_id, out_pt_id))
+
+static int _test_cmd_pasid_detach(int fd, __u32 stdev_id, __u32 pasid)
+{
+	struct iommu_test_cmd test_detach = {
+		.size = sizeof(test_detach),
+		.op = IOMMU_TEST_OP_PASID_DETACH,
+		.id = stdev_id,
+		.pasid_detach = {
+			.pasid = pasid,
+		},
+	};
+
+	return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_DETACH),
+		     &test_detach);
+}
+
+#define test_cmd_pasid_detach(pasid) \
+	ASSERT_EQ(0, _test_cmd_pasid_detach(self->fd, self->stdev_id, pasid))
+
+static int test_cmd_pasid_check_domain(int fd, __u32 stdev_id, __u32 pasid,
+				       __u32 hwpt_id, bool *result)
+{
+	struct iommu_test_cmd test_pasid_check = {
+		.size = sizeof(test_pasid_check),
+		.op = IOMMU_TEST_OP_PASID_CHECK_DOMAIN,
+		.id = stdev_id,
+		.pasid_check = {
+			.pasid = pasid,
+			.hwpt_id = hwpt_id,
+			.out_result_ptr = (__u64)result,
+		},
+	};
+
+	return ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_PASID_CHECK_DOMAIN),
+		     &test_pasid_check);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 00/18] iommufd support pasid attach/replace
  2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
                   ` (17 preceding siblings ...)
  2025-03-20 13:47 ` [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd " Yi Liu
@ 2025-03-20 13:59 ` Yi Liu
  18 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 13:59 UTC (permalink / raw)
  To: kevin.tian, jgg; +Cc: joro, baolu.lu, iommu, nicolinc

On 2025/3/20 21:47, Yi Liu wrote:
> PASID (Process Address Space ID) is a PCIe extension that tags the DMA
> transactions from a physical device. Most modern IOMMU hardware supports
> PASID-granular address translation. This allows a PASID-capable device
> to be attached to multiple hardware page tables (hwpts, also known as
> domains), with each attachment tagged by a PASID.
> 
> This series builds on previous series [1]. It begins by adding a missing
> IOMMU API to replace the domain for a PASID. Utilizing the IOMMU PASID
> attach/replace/detach APIs, this series introduces iommufd APIs for device
> drivers to attach, replace, or detach PASIDs to/from hwpts at the request
> of userspace. It also enforces PASID compatibility with domain requirements,
> allocates PASID-compatible hwpts in iommufd, and includes self-tests to
> validate the iommufd APIs.
> 
> The complete code is available at the following link [2]. Please note that
> the existing iommufd self-test was broken, and a temporary fix patch is at
> the top of the branch [2]. If you wish to run the iommufd self-test, please
> apply that fix. We apologize for any inconvenience.
> 
> The series is based on Jason's for-next branch plus one more patch [3].
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/commit/?h=for-next&id=a05df03a88bc1088be8e9d958f208d6484691e43

correct the base of this series. It is based on the latest commit of
Jason's for-next branch.

https://web.git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/commit/?h=for-next&id=da0c56520e880441d0503d0cf0d6853dcfb5f1a4

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle
  2025-03-20 13:47 ` [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle Yi Liu
@ 2025-03-20 15:23   ` Jason Gunthorpe
  2025-03-20 23:51     ` Yi Liu
  2025-03-21  2:35   ` Baolu Lu
  1 sibling, 1 reply; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 15:23 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:27AM -0700, Yi Liu wrote:
> Add kdoc to highligt the caller of iommu_[attach|replace]_group_handle()
> and iommu_attach_device_pasid() should always provide a new handle.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

May want to provide a code pointer to the lockless paths in the fault
functions in the commit message if you respin

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 13:47 ` [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group Yi Liu
@ 2025-03-20 15:36   ` Jason Gunthorpe
  2025-03-20 17:36   ` Nicolin Chen
  2025-03-21  3:18   ` Baolu Lu
  2 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 15:36 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:32AM -0700, Yi Liu wrote:
> The existing code detects the first attach by checking the
> igroup->device_list. However, the igroup->hwpt can also be used to detect
> the first attach. In future modifications, it is better to check the
> igroup->hwpt instead of the device_list. To improve readbility and also
> prepare for further modifications on this part, this adds a helper for it.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> v9 -> v10: It is patch 07 of v9, it's reworked hence renamed as well.
> ---
>  drivers/iommu/iommufd/device.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct
  2025-03-20 13:47 ` [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct Yi Liu
@ 2025-03-20 15:48   ` Jason Gunthorpe
  2025-03-20 18:03   ` Nicolin Chen
  2025-03-21  3:22   ` Baolu Lu
  2 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 15:48 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:33AM -0700, Yi Liu wrote:
> The igroup->hwpt and igroup->device_list are used to track the hwpt attach
> of a group in the RID path. While the coming PASID path also needs such
> tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into
> attach struct which is allocated per attaching the first device of the
> group and freed per detaching the last device of the group.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c          | 76 ++++++++++++++++++-------
>  drivers/iommu/iommufd/iommufd_private.h |  5 +-
>  2 files changed, 58 insertions(+), 23 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 08/18] iommufd/device: Replace device_list with device_array
  2025-03-20 13:47 ` [PATCH v10 08/18] iommufd/device: Replace device_list with device_array Yi Liu
@ 2025-03-20 17:20   ` Jason Gunthorpe
  2025-03-21  0:25     ` Yi Liu
  2025-03-20 18:38   ` Nicolin Chen
  1 sibling, 1 reply; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 17:20 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:34AM -0700, Yi Liu wrote:
> @@ -298,6 +298,20 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
>  
> +static int iommufd_group_device_num(struct iommufd_group *igroup)
> +{
> +	struct iommufd_device *idev;
> +	unsigned long index;
> +	int count = 0;

unsigned int and unsigned int return code too

> +	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
> +		       GFP_KERNEL);

Probably don't really care, but note the choice of obj.id here is
going to waste some memory in the xarray. 0 based would be more
memory efficient, but some of the other operations would be slower.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-20 13:47 ` [PATCH v10 02/18] iommu: Introduce a replace API for device pasid Yi Liu
@ 2025-03-20 17:24   ` Nicolin Chen
  2025-03-20 23:58     ` Yi Liu
  2025-03-21  3:08   ` Baolu Lu
  1 sibling, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 17:24 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:28AM -0700, Yi Liu wrote:
> Provide a high-level API to allow replacements of one domain with another
> for specific pasid of a device. This is similar to
> iommu_replace_group_handle() and it is expected to be used only by IOMMUFD.
> 
> Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

Some nits:

> @@ -3420,6 +3436,99 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
>  
> +/**
> + * iommu_replace_device_pasid - Replace the domain that a pasid
> + *                              is attached to

How about "... a specific pasid of the device is attached to"
aligning with the clearer narrative in commit log?

> +int iommu_replace_device_pasid(struct iommu_domain *domain,
> +			       struct device *dev, ioasid_t pasid,
> +			       struct iommu_attach_handle *handle)
> +{
> +	/* Caller must be a probed driver on dev */

What's "a probed driver on dev"? Mind rephrasing this?

Also should it be placed outside this function?

> +	struct iommu_group *group = dev->iommu_group;
> +	struct iommu_attach_handle *entry;
> +	struct iommu_domain *curr_domain;
> +	void *curr;
> +	int ret;
> +
> +	if (!group)
> +		return -ENODEV;
> +
> +	if (!domain->ops->set_dev_pasid)
> +		return -EOPNOTSUPP;
> +
> +	if (dev_iommu_ops(dev) != domain->owner ||
> +	    pasid == IOMMU_NO_PASID || !handle)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);

How about guard(mutex)(&group->mutex)?

> +	entry = iommu_make_pasid_array_entry(domain, handle);
> +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
> +			  XA_ZERO_ENTRY, GFP_KERNEL);
> +	if (xa_is_err(curr)) {
> +		ret = xa_err(curr);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * No domain (with or without handle) attached, hence not
> +	 * a replace case.
> +	 */
> +	if (!curr) {
> +		xa_release(&group->pasid_array, pasid);
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * Reusing handle is problematic as there are paths that refers
> +	 * the handle without lock. To avoid race, reject the callers that
> +	 * attempt it.
> +	 */
> +	if (handle && curr == entry) {
> +		WARN_ON(1);
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}

We rejected !handle and !curr cases. So it should be enough with:
	if (curr == entry) {
?

> +
> +	curr_domain = pasid_array_entry_to_domain(curr);
> +	ret = 0;
> +
> +	if (curr_domain != domain) {
> +		ret = __iommu_set_group_pasid(domain, group,
> +					      pasid, curr_domain);
> +		if (ret)
> +			goto out_unlock;
> +	}

Oh, does this mean that we can just use this function to replace
a handle if domain isn't changed? Maybe add this in the kdoc?

> +
> +	if (curr != entry) {

Hmm, since we rejected "curr == entry" already, we don't need to
double check any more?

> +		/*
> +		 * The above xa_cmpxchg() reserved the memory, and the
> +		 * group->mutex is held, this cannot fail.
> +		 */
> +		WARN_ON(xa_is_err(xa_store(&group->pasid_array,
> +					   pasid, entry, GFP_KERNEL)));
> +	}
> +
> +out_unlock:
> +	mutex_unlock(&group->mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
> +
>  /*
>   * iommu_detach_device_pasid() - Detach the domain from pasid of device
>   * @domain: the iommu domain.
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach
  2025-03-20 13:47 ` [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach Yi Liu
@ 2025-03-20 17:33   ` Jason Gunthorpe
  2025-03-20 19:19   ` Nicolin Chen
  1 sibling, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 17:33 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:35AM -0700, Yi Liu wrote:
> PASIDs of PASID-capable device can be attached to hwpt separately, hence
> a pasid array to track per-PASID attachment is necessary. The index
> IOMMU_NO_PASID is used by the RID path. Hence drop the igroup->attach.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c          | 68 +++++++++++++++++--------
>  drivers/iommu/iommufd/iommufd_private.h |  2 +-
>  2 files changed, 49 insertions(+), 21 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID
  2025-03-20 13:47 ` [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID Yi Liu
@ 2025-03-20 17:35   ` Jason Gunthorpe
  2025-03-20 22:23   ` Nicolin Chen
  1 sibling, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 17:35 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:38AM -0700, Yi Liu wrote:
> Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce
> the RID to use PASID-compatible domain if PASID has been attached, and
> vice versa. The PASID path has already enforced it. This adds the
> enforcement in the RID path.
> 
> This enforcement requires a lock across the RID and PASID attach path,
> the idev->igroup->lock is used as both the RID and the PASID path holds
> it.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> v9 -> v10: split the hwpt->pasid_compat check to a separate if in the
>            else statement.
> ---
>  drivers/iommu/iommufd/device.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 13:47 ` [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group Yi Liu
  2025-03-20 15:36   ` Jason Gunthorpe
@ 2025-03-20 17:36   ` Nicolin Chen
  2025-03-20 17:51     ` Nicolin Chen
  2025-03-20 18:04     ` Jason Gunthorpe
  2025-03-21  3:18   ` Baolu Lu
  2 siblings, 2 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 17:36 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:32AM -0700, Yi Liu wrote:
> The existing code detects the first attach by checking the
> igroup->device_list. However, the igroup->hwpt can also be used to detect
> the first attach. In future modifications, it is better to check the
> igroup->hwpt instead of the device_list. To improve readbility and also
> prepare for further modifications on this part, this adds a helper for it.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> v9 -> v10: It is patch 07 of v9, it's reworked hence renamed as well.
> ---
>  drivers/iommu/iommufd/device.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index ac54d734b819..9db36346328f 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -444,6 +444,13 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
>  	return 0;
>  }
>  
> +static inline bool
> +igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
> +{
> +	lockdep_assert_held(&igroup->lock);
> +	return !igroup->hwpt;
> +}
> +
>  static int
>  iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
>  				    struct iommufd_hwpt_paging *hwpt_paging)
> @@ -459,7 +466,7 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
>  	if (rc)
>  		return rc;
>  
> -	if (list_empty(&igroup->device_list)) {
> +	if (igroup_first_attach(igroup, IOMMU_NO_PASID)) {
>  		rc = iommufd_group_setup_msi(igroup, hwpt_paging);
>  		if (rc) {
>  			iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt,
> @@ -623,7 +630,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>  	 * reserved regions are only updated during individual device
>  	 * attachment.
>  	 */
> -	if (list_empty(&igroup->device_list)) {
> +	if (igroup_first_attach(igroup, pasid)) {
>  		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
>  		if (rc)
>  			goto err_unresv;

We have the same list_empty in the iommufd_hw_pagetable_detach()
and iommufd_group_release() too?

And I feel "igroup_is_not_attached" could be clearer, as it fits
the detach/release context too.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain
  2025-03-20 13:47 ` [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain Yi Liu
@ 2025-03-20 17:51   ` Jason Gunthorpe
  2025-03-21  0:52     ` Yi Liu
  2025-03-20 22:36   ` Nicolin Chen
  1 sibling, 1 reply; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 17:51 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On Thu, Mar 20, 2025 at 06:47:40AM -0700, Yi Liu wrote:
> The underlying infrastructure has supported the PASID attach and related
> enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This
> extends iommufd to support PASID compatible domain requested by userspace.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> v9 -> v10: Dropped r-b tag as the uapi description for ALLOC_PASID is modified
> ---
>  drivers/iommu/iommufd/device.c       | 4 +++-
>  drivers/iommu/iommufd/hw_pagetable.c | 7 ++++---
>  include/uapi/linux/iommufd.h         | 3 +++
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 54ffef9c17f7..f09dcddf777b 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -973,7 +973,9 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
>  	}
>  
>  	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, pasid,
> -						0, immediate_attach, NULL);
> +						pasid != IOMMU_NO_PASID ?
> +						    IOMMU_HWPT_ALLOC_PASID : 0,
> +						immediate_attach, NULL);

I wonder if there is any point to this since userspace couldn't
actually just use autodomains and have something work since the RID
autodomain won't have PASID. I think if userspace wants to use pasid
it has to manually allocate the HWPT for the RID and then why not also
allocate for the PASID?

Anyhow, it doesn't matter much as it is so simple for autodomains..

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 17:36   ` Nicolin Chen
@ 2025-03-20 17:51     ` Nicolin Chen
  2025-03-21  0:02       ` Yi Liu
  2025-03-20 18:04     ` Jason Gunthorpe
  1 sibling, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 17:51 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 10:37:01AM -0700, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:32AM -0700, Yi Liu wrote:
> > The existing code detects the first attach by checking the
> > igroup->device_list. However, the igroup->hwpt can also be used to detect
> > the first attach. In future modifications, it is better to check the
> > igroup->hwpt instead of the device_list. To improve readbility and also
> > prepare for further modifications on this part, this adds a helper for it.
> > 
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > v9 -> v10: It is patch 07 of v9, it's reworked hence renamed as well.
> > ---
> >  drivers/iommu/iommufd/device.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index ac54d734b819..9db36346328f 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -444,6 +444,13 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
> >  	return 0;
> >  }
> >  
> > +static inline bool
> > +igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
> > +{
> > +	lockdep_assert_held(&igroup->lock);
> > +	return !igroup->hwpt;
> > +}
> > +
> >  static int
> >  iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
> >  				    struct iommufd_hwpt_paging *hwpt_paging)
> > @@ -459,7 +466,7 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
> >  	if (rc)
> >  		return rc;
> >  
> > -	if (list_empty(&igroup->device_list)) {
> > +	if (igroup_first_attach(igroup, IOMMU_NO_PASID)) {
> >  		rc = iommufd_group_setup_msi(igroup, hwpt_paging);
> >  		if (rc) {
> >  			iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt,
> > @@ -623,7 +630,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
> >  	 * reserved regions are only updated during individual device
> >  	 * attachment.
> >  	 */
> > -	if (list_empty(&igroup->device_list)) {
> > +	if (igroup_first_attach(igroup, pasid)) {
> >  		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
> >  		if (rc)
> >  			goto err_unresv;
> 
> We have the same list_empty in the iommufd_hw_pagetable_detach()
> and iommufd_group_release() too?
> 
> And I feel "igroup_is_not_attached" could be clearer, as it fits
> the detach/release context too.

Oh, I just found that the following patch changes those paths.

Yet, at the end of the series this igroup_first_attach() is quite
similar to iommufd_device_is_attached(). So, maybe we could align
with that the naming here: iommufd_group_is_attached?

With that,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct
  2025-03-20 13:47 ` [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct Yi Liu
  2025-03-20 15:48   ` Jason Gunthorpe
@ 2025-03-20 18:03   ` Nicolin Chen
  2025-03-21  3:22   ` Baolu Lu
  2 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 18:03 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:33AM -0700, Yi Liu wrote:
> The igroup->hwpt and igroup->device_list are used to track the hwpt attach
> of a group in the RID path. While the coming PASID path also needs such
> tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into
> attach struct which is allocated per attaching the first device of the
> group and freed per detaching the last device of the group.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 17:36   ` Nicolin Chen
  2025-03-20 17:51     ` Nicolin Chen
@ 2025-03-20 18:04     ` Jason Gunthorpe
  2025-03-20 18:24       ` Nicolin Chen
  1 sibling, 1 reply; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 18:04 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 10:36:59AM -0700, Nicolin Chen wrote:
> We have the same list_empty in the iommufd_hw_pagetable_detach()
> and iommufd_group_release() too?

The one in iommufd_group_release() is checking that all devices have
been cleaned up before the device and
is replaced by:

	WARN_ON(!xa_empty(&igroup->pasid_attach));

And the one in iommufd_hw_pagetable_detach().. it can't use
igroup_first_attach() because it hasn't erased the xarray yet.

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 18:04     ` Jason Gunthorpe
@ 2025-03-20 18:24       ` Nicolin Chen
  0 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 18:24 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 03:04:20PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 10:36:59AM -0700, Nicolin Chen wrote:
> > We have the same list_empty in the iommufd_hw_pagetable_detach()
> > and iommufd_group_release() too?
> 
> The one in iommufd_group_release() is checking that all devices have
> been cleaned up before the device and
> is replaced by:
> 
> 	WARN_ON(!xa_empty(&igroup->pasid_attach));

Yea, I hadn't read till the end.

> And the one in iommufd_hw_pagetable_detach().. it can't use
> igroup_first_attach() because it hasn't erased the xarray yet.

And yes, iommufd_hw_pagetable_detach() can't use this helper,
since igroup->hwpt = NULL is behind the check.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 08/18] iommufd/device: Replace device_list with device_array
  2025-03-20 13:47 ` [PATCH v10 08/18] iommufd/device: Replace device_list with device_array Yi Liu
  2025-03-20 17:20   ` Jason Gunthorpe
@ 2025-03-20 18:38   ` Nicolin Chen
  2025-03-21  0:30     ` Yi Liu
  1 sibling, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 18:38 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:34AM -0700, Yi Liu wrote:
> igroup->attach->device_list is used to track attached device of a group
> in the RID path. Such tracking is also needed in the PASID path in order
> to share path with the RID path.
> 
> While there is only one list_head in the iommufd_device. It cannot work
> if the device has been attached in both RID path and PASID path. To solve
> it, replacing the device_list with an xarray. The attached iommufd_device
> is stored in the entry indexed by the idev->obj.id.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

Nit:

>  static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
> @@ -625,20 +634,27 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>  			rc = -ENOMEM;
>  			goto err_unlock;
>  		}
> -		INIT_LIST_HEAD(&attach->device_list);
> +		xa_init(&attach->device_array);
>  	}
>  
>  	old_hwpt = attach->hwpt;
>  
> +	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
> +		       GFP_KERNEL);
> +	if (rc) {
> +		WARN_ON(rc == -EBUSY && !old_hwpt);
> +		goto err_free_attach;
> +	}
> +
>  	if (old_hwpt && old_hwpt != hwpt) {
>  		rc = -EINVAL;
> -		goto err_free_attach;
> +		goto err_release_devid;
>  	}

Could we reject old_hwpt != hwpt (replace case) before xa_insert?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach
  2025-03-20 13:47 ` [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach Yi Liu
  2025-03-20 17:33   ` Jason Gunthorpe
@ 2025-03-20 19:19   ` Nicolin Chen
  2025-03-20 19:29     ` Jason Gunthorpe
  2025-03-21  0:15     ` Yi Liu
  1 sibling, 2 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 19:19 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:35AM -0700, Yi Liu wrote:
> @@ -497,10 +501,13 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
>  
>  /* The device attach/detach/replace helpers for attach_handle */
>  
> -/* Check if idev is attached to igroup->hwpt */
> -static bool iommufd_device_is_attached(struct iommufd_device *idev)
> +static bool iommufd_device_is_attached(struct iommufd_device *idev,
> +				       ioasid_t pasid)
>  {
> -	return xa_load(&idev->igroup->attach->device_array, idev->obj.id);
> +	struct iommufd_attach *attach;
> +
> +	attach = xa_load(&idev->igroup->pasid_attach, pasid);
> +	return xa_load(&attach->device_array, idev->obj.id);

This helper is called in iommufd_device_do_replace() after it does
xa_cmpxchg() on to the same igroup->pasid_attach?

>  static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
> @@ -627,19 +634,25 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>  
>  	mutex_lock(&igroup->lock);
>  
> -	attach = igroup->attach;
> +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
> +			    XA_ZERO_ENTRY, GFP_KERNEL);
> +	if (xa_is_err(attach)) {
> +		rc = xa_err(attach);
> +		goto err_unlock;
> +	}
> +
>  	if (!attach) {
>  		attach = kzalloc(sizeof(*attach), GFP_KERNEL);

Since this is attach() and we do xa_cmpxchg() with an "old=NULL",
should !attach always be true?

> -	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
> +	rc = xa_insert(&attach->device_array, idev->obj.id, XA_ZERO_ENTRY,

Nit: looks like this should have been done in the PATCH-8
"iommufd/device: Replace device_list with device_array"

> @@ -787,8 +808,15 @@ iommufd_device_do_replace(struct iommufd_device *idev, ioasid_t pasid,
>  
>  	mutex_lock(&igroup->lock);
>  
> -	attach = igroup->attach;
> +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
> +			    XA_ZERO_ENTRY, GFP_KERNEL);

Hmm, this is replace(). Should the "pasid" position in pasid_attach
be already filled? I mean the old entry shouldn't be "NULL" so as to
call it a "replace"?

> +	if (xa_is_err(attach)) {
> +		rc = xa_err(attach);
> +		goto err_unlock;
> +	}
> +
>  	if (!attach) {
> +		xa_release(&igroup->pasid_attach, pasid);
>  		rc = -EINVAL;
>  		goto err_unlock;
>  	}

Here too? Is attach always !NULL?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach
  2025-03-20 19:19   ` Nicolin Chen
@ 2025-03-20 19:29     ` Jason Gunthorpe
  2025-03-20 20:13       ` Nicolin Chen
  2025-03-21  0:15     ` Yi Liu
  1 sibling, 1 reply; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 19:29 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 12:19:33PM -0700, Nicolin Chen wrote:
> > -	attach = igroup->attach;
> > +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
> > +			    XA_ZERO_ENTRY, GFP_KERNEL);
> > +	if (xa_is_err(attach)) {
> > +		rc = xa_err(attach);
> > +		goto err_unlock;
> > +	}
> > +
> >  	if (!attach) {
> >  		attach = kzalloc(sizeof(*attach), GFP_KERNEL);
> 
> Since this is attach() and we do xa_cmpxchg() with an "old=NULL",
> should !attach always be true?

No, all this xa_cmpxchg is a sort of odd way to atomically tell what
is already in the xarray. If there is already an entry then it will
fail the cmpxchange and return the old entry. This gives the 'if
(attach)' condition.

> > -	attach = igroup->attach;
> > +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
> > +			    XA_ZERO_ENTRY, GFP_KERNEL);
> 
> Hmm, this is replace(). Should the "pasid" position in pasid_attach
> be already filled? I mean the old entry shouldn't be "NULL" so as to
> call it a "replace"?

Same xa oddness, the logic is sound but this could probably just be a
xa_load() under the mutex to confirm a non-NULL entry is present.

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach
  2025-03-20 19:29     ` Jason Gunthorpe
@ 2025-03-20 20:13       ` Nicolin Chen
  0 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 20:13 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 04:29:14PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 12:19:33PM -0700, Nicolin Chen wrote:
> > > -	attach = igroup->attach;
> > > +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
> > > +			    XA_ZERO_ENTRY, GFP_KERNEL);
> > > +	if (xa_is_err(attach)) {
> > > +		rc = xa_err(attach);
> > > +		goto err_unlock;
> > > +	}
> > > +
> > >  	if (!attach) {
> > >  		attach = kzalloc(sizeof(*attach), GFP_KERNEL);
> > 
> > Since this is attach() and we do xa_cmpxchg() with an "old=NULL",
> > should !attach always be true?
> 
> No, all this xa_cmpxchg is a sort of odd way to atomically tell what
> is already in the xarray. If there is already an entry then it will
> fail the cmpxchange and return the old entry. This gives the 'if
> (attach)' condition.

Ah, I forgot the nature of the xa_cmpxchg() returning the old
entry anyway unless xa_err.

> > > -	attach = igroup->attach;
> > > +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
> > > +			    XA_ZERO_ENTRY, GFP_KERNEL);
> > 
> > Hmm, this is replace(). Should the "pasid" position in pasid_attach
> > be already filled? I mean the old entry shouldn't be "NULL" so as to
> > call it a "replace"?
> 
> Same xa oddness, the logic is sound but this could probably just be a
> xa_load() under the mutex to confirm a non-NULL entry is present.

Yea, then no need of xa_erase() reverting XA_ZERO_ENTRY.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-20 13:47 ` [PATCH v10 11/18] iommufd: Support pasid attach/replace Yi Liu
@ 2025-03-20 20:42   ` Nicolin Chen
  2025-03-20 23:29     ` Jason Gunthorpe
  2025-03-21  0:31     ` Yi Liu
  0 siblings, 2 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 20:42 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:37AM -0700, Yi Liu wrote:
> @@ -579,10 +582,12 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
>  {
>  	struct iommufd_attach_handle *handle;
>  
> -	WARN_ON(pasid != IOMMU_NO_PASID);
> +	if (pasid == IOMMU_NO_PASID)
> +		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> +	else
> +		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
>  
>  	handle = iommufd_device_get_attach_handle(idev, pasid);
> -	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);

This changes the sequence of these calls?

Is it correct to do iommufd_device_get_attach_handle() after
iommu_detach_group_handle()?

Otherwise,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID
  2025-03-20 13:47 ` [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID Yi Liu
  2025-03-20 17:35   ` Jason Gunthorpe
@ 2025-03-20 22:23   ` Nicolin Chen
  2025-03-20 23:31     ` Jason Gunthorpe
  2025-03-21  0:41     ` Yi Liu
  1 sibling, 2 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 22:23 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:38AM -0700, Yi Liu wrote:
> Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce
> the RID to use PASID-compatible domain if PASID has been attached, and
> vice versa. The PASID path has already enforced it. This adds the
> enforcement in the RID path.
> 
> This enforcement requires a lock across the RID and PASID attach path,
> the idev->igroup->lock is used as both the RID and the PASID path holds
> it.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

A question:

this isn't about pasid_compat, yet...

> @@ -514,8 +514,28 @@ static int iommufd_hwpt_pasid_compat(struct iommufd_hw_pagetable *hwpt,
>  				     struct iommufd_device *idev,
>  				     ioasid_t pasid)
>  {
> -	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
> -		return -EINVAL;
> +	struct iommufd_group *igroup = idev->igroup;
> +
> +	lockdep_assert_held(&igroup->lock);
> +
> +	if (pasid == IOMMU_NO_PASID) {
> +		unsigned long start = IOMMU_NO_PASID;
> +
> +		if (!hwpt->pasid_compat &&
> +		    xa_find_after(&igroup->pasid_attach,
> +				  &start, UINT_MAX, XA_PRESENT))
> +			return -EINVAL;
> +	} else {
> +		struct iommufd_attach *attach;
> +
> +		if (!hwpt->pasid_compat)
> +			return -EINVAL;
> +
> +		attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
> +		if (attach && attach->hwpt && !attach->hwpt->pasid_compat)
> +			return -EINVAL;

Should we also make sure that RID hwpt is attached before storing
any PASID hwpt to a !IOMMU_NO_PASID slot in igroup->pasid_attach?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain
  2025-03-20 13:47 ` [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain Yi Liu
  2025-03-20 17:51   ` Jason Gunthorpe
@ 2025-03-20 22:36   ` Nicolin Chen
  1 sibling, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 22:36 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:40AM -0700, Yi Liu wrote:
> The underlying infrastructure has supported the PASID attach and related
> enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This
> extends iommufd to support PASID compatible domain requested by userspace.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 15/18] iommufd/selftest: Add set_dev_pasid in mock iommu
  2025-03-20 13:47 ` [PATCH v10 15/18] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
@ 2025-03-20 22:48   ` Nicolin Chen
  0 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 22:48 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:41AM -0700, Yi Liu wrote:
> The callback is needed to make pasid_attach/detach path complete for mock
> device. A nop is enough for set_dev_pasid.
> 
> A MOCK_FLAGS_DEVICE_PASID is added to indicate a pasid-capable mock device
> for the pasid test cases. Other test cases will still create a non-pasid
> mock device. While the mock iommu always pretends to be pasid-capable.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

Two nits:

>  static struct mock_dev *mock_dev_create(unsigned long dev_flags)
>  {
> +	struct property_entry prop[] = {
> +		PROPERTY_ENTRY_U32("pasid-num-bits", MOCK_PASID_WIDTH),
> +		{},
> +	};
>  	struct mock_dev *mdev;
>  	int rc, i;
>  
>  	if (dev_flags &
> -	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
> +	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY |
> +		    MOCK_FLAGS_DEVICE_HUGE_IOVA | MOCK_FLAGS_DEVICE_PASID))

Let's have a "const u32 valid_flags" at the top of the function.

>  		return ERR_PTR(-EINVAL);
>  
>  	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> @@ -890,6 +906,14 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
>  	if (rc)
>  		goto err_put;
>  
> +	if (dev_flags & MOCK_FLAGS_DEVICE_PASID) {
> +		rc = device_create_managed_software_node(&mdev->dev, prop, NULL);
> +		if (rc) {
> +			dev_err(&mdev->dev, "add pasid-num-bits property failed, rc: %d", rc);
> +			goto err_put;
> +		}
> +	}

Since max_pasids == 0 means the device doesn't support PASID, I
think we could create the node for !MOCK_FLAGS_DEVICE_PASID too
with a "pasid-num-bits"=0?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 13:47 ` [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
@ 2025-03-20 23:17   ` Nicolin Chen
  2025-03-20 23:33     ` Jason Gunthorpe
                       ` (2 more replies)
  2025-03-20 23:20   ` Nicolin Chen
  1 sibling, 3 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 23:17 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:43AM -0700, Yi Liu wrote:
> @@ -150,6 +155,32 @@ struct iommu_test_cmd {
>  		struct {
>  			__u32 dev_id;
>  		} trigger_vevent;
> +		struct {
> +			__u32 pasid;
> +			__u32 pt_id;
> +			/* @id is stdev_id
> +			 * pasid#1024 is for special test, do not use it
> +			 * in normal case.
> +			 */

How about add on top of these structs:
#define IOMMU_TEST_PASID_RESERVED 1024

Also, the coding style of the multi-line comments is a bit odd.

> +		} pasid_attach;
> +		struct {
> +			__u32 pasid;
> +			__u32 pt_id;
> +			/* @id is stdev_id
> +			 * pasid#1024 is for special test, do not use it
> +			 * in normal case.
> +			 */

Ditto

> diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
> index 691e7a23f300..37c9cd285541 100644
> --- a/drivers/iommu/iommufd/selftest.c
> +++ b/drivers/iommu/iommufd/selftest.c
> @@ -223,10 +223,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
>  	return 0;
>  }
>  
> +static bool pasid_1024_attached;

I recall syzkaller would do multi-threading... We might need a
global mutex or something atomic_t?

>  static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
>  					 struct device *dev, ioasid_t pasid,
>  					 struct iommu_domain *old)
>  {
> +	/*
> +	 * First attach with pasid 1024 succ, second attach would fail.

succeeds?

> +	 * This is helpful to test the case in which the iommu core needs
> +	 * to rollback to old domain due to driver failure.
> +	 */
> +	if (pasid == 1024) {
> +		if (domain->type == IOMMU_DOMAIN_BLOCKED) {
> +			pasid_1024_attached = false;
> +		} else if (pasid_1024_attached) {
> +			pasid_1024_attached = false;
> +			// Fake an error to fail the replacement
> +			return -ENOMEM;

/* Fake an error to fail the replacement */

While failing this, why does it detach pasid-1024? Maybe some extra
comments for what's doing?

> +static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
> +					   struct iommu_test_cmd *cmd)
> +{
> +	struct iommu_domain *attached_domain, *expect_domain = NULL;
> +	struct iommufd_hw_pagetable *hwpt = NULL;
> +	struct iommu_attach_handle *handle;
> +	struct selftest_obj *sobj;
> +	struct mock_dev *mdev;
> +	bool result;
> +	int rc = 0;
> +
> +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
> +	if (IS_ERR(sobj))
> +		return PTR_ERR(sobj);
> +
> +	mdev = sobj->idev.mock_dev;
> +
> +	handle = iommu_attach_handle_get(mdev->dev.iommu_group,
> +					 cmd->pasid_check.pasid, 0);
> +	if (IS_ERR(handle))
> +		attached_domain = NULL;
> +	else
> +		attached_domain = handle->domain;
> +
> +	if (cmd->pasid_check.hwpt_id) {
> +		hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
> +		if (IS_ERR(hwpt)) {

Do we need cmd->pasid_check.hwpt_id to be optional?

> +			rc = PTR_ERR(hwpt);
> +			goto out_put_dev;
> +		}
> +		expect_domain = hwpt->domain;
> +	}
> +
> +	result = (attached_domain == expect_domain) ? 1 : 0;
> +	if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr),
> +			 &result, sizeof(result)))
> +		rc = -EFAULT;

If we do want it to be optional, we can't unconditionally check the
result then?

> +static int iommufd_test_pasid_attach(struct iommufd_ucmd *ucmd,
> +				     struct iommu_test_cmd *cmd)
> +{
> +	struct selftest_obj *sobj;
> +	int rc;
> +
> +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
> +	if (IS_ERR(sobj))
> +		return PTR_ERR(sobj);
> +
> +	rc = iommufd_device_attach(sobj->idev.idev, cmd->pasid_attach.pasid,
> +				   &cmd->pasid_attach.pt_id);
> +	if (rc)
> +		goto out_sobj;
> +
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +	if (rc)
> +		iommufd_device_detach(sobj->idev.idev,
> +				      cmd->pasid_attach.pasid);
> +
> +out_sobj:
> +	iommufd_put_object(ucmd->ictx, &sobj->obj);
> +	return rc;
> +}
> +
> +static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd,
> +				      struct iommu_test_cmd *cmd)
> +{
> +	struct selftest_obj *sobj;
> +	int rc;
> +
> +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
> +	if (IS_ERR(sobj))
> +		return PTR_ERR(sobj);
> +
> +	rc = iommufd_device_replace(sobj->idev.idev, cmd->pasid_attach.pasid,
> +				    &cmd->pasid_attach.pt_id);
> +	if (rc)
> +		goto out_sobj;
> +
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +
> +out_sobj:
> +	iommufd_put_object(ucmd->ictx, &sobj->obj);
> +	return rc;

If iommufd_ucmd_respond fails, do we need to revert like we do in
iommufd_test_pasid_attach()?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 13:47 ` [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
  2025-03-20 23:17   ` Nicolin Chen
@ 2025-03-20 23:20   ` Nicolin Chen
  2025-03-21  1:20     ` Yi Liu
  1 sibling, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 23:20 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:43AM -0700, Yi Liu wrote:
> This adds 5 test ops for pasid attach/replace/detach testing. There are
> ops to attach/detach pasid, and also op to check the attached domain of
> a pasid.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_test.h |  31 ++++++
>  drivers/iommu/iommufd/selftest.c     | 151 +++++++++++++++++++++++++++
>  2 files changed, 182 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
> index 1a066feb8697..efcb509f7d56 100644
> --- a/drivers/iommu/iommufd/iommufd_test.h
> +++ b/drivers/iommu/iommufd/iommufd_test.h
> @@ -25,6 +25,11 @@ enum {
>  	IOMMU_TEST_OP_TRIGGER_IOPF,
>  	IOMMU_TEST_OP_DEV_CHECK_CACHE,
>  	IOMMU_TEST_OP_TRIGGER_VEVENT,
> +	IOMMU_TEST_OP_PASID_ATTACH,
> +	IOMMU_TEST_OP_PASID_REPLACE,
> +	IOMMU_TEST_OP_PASID_MIX_REPLACE_HANDLE,

And we ould drop IOMMU_TEST_OP_PASID_MIX_REPLACE_HANDLE?

And totally 4 test ops instead of "5" in the commit log.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-20 20:42   ` Nicolin Chen
@ 2025-03-20 23:29     ` Jason Gunthorpe
  2025-03-21  0:31     ` Yi Liu
  1 sibling, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 23:29 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 01:42:27PM -0700, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:37AM -0700, Yi Liu wrote:
> > @@ -579,10 +582,12 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
> >  {
> >  	struct iommufd_attach_handle *handle;
> >  
> > -	WARN_ON(pasid != IOMMU_NO_PASID);
> > +	if (pasid == IOMMU_NO_PASID)
> > +		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> > +	else
> > +		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
> >  
> >  	handle = iommufd_device_get_attach_handle(idev, pasid);
> > -	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> 
> This changes the sequence of these calls?
> 
> Is it correct to do iommufd_device_get_attach_handle() after
> iommu_detach_group_handle()?

Nope! It quietly leaks memory like this! Good find!

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID
  2025-03-20 22:23   ` Nicolin Chen
@ 2025-03-20 23:31     ` Jason Gunthorpe
  2025-03-21  0:45       ` Yi Liu
  2025-03-21  0:41     ` Yi Liu
  1 sibling, 1 reply; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 23:31 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 03:23:15PM -0700, Nicolin Chen wrote:

> > +		attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
> > +		if (attach && attach->hwpt && !attach->hwpt->pasid_compat)
> > +			return -EINVAL;
> 
> Should we also make sure that RID hwpt is attached before storing
> any PASID hwpt to a !IOMMU_NO_PASID slot in igroup->pasid_attach?

AFAIK, no.. If a driver cannot support the combination of blocked on
RID and a paging on PASID then it should fail the attach.

smmuv3 has support for this so, and I'd expect the same of other
drivers.

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 23:17   ` Nicolin Chen
@ 2025-03-20 23:33     ` Jason Gunthorpe
  2025-03-20 23:42     ` Nicolin Chen
  2025-03-21  1:43     ` Yi Liu
  2 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-20 23:33 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Yi Liu, kevin.tian, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 04:17:52PM -0700, Nicolin Chen wrote:
> > @@ -223,10 +223,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > +static bool pasid_1024_attached;
> 
> I recall syzkaller would do multi-threading... We might need a
> global mutex or something atomic_t?

It can't be a global it would mess up the model.. Store this in
the mock_device maybe?

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 23:17   ` Nicolin Chen
  2025-03-20 23:33     ` Jason Gunthorpe
@ 2025-03-20 23:42     ` Nicolin Chen
  2025-03-21  1:43     ` Yi Liu
  2 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-20 23:42 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 04:17:58PM -0700, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:43AM -0700, Yi Liu wrote:
> > +static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
> > +					   struct iommu_test_cmd *cmd)
> > +{
> > +	struct iommu_domain *attached_domain, *expect_domain = NULL;
> > +	struct iommufd_hw_pagetable *hwpt = NULL;
> > +	struct iommu_attach_handle *handle;
> > +	struct selftest_obj *sobj;
> > +	struct mock_dev *mdev;
> > +	bool result;
> > +	int rc = 0;
> > +
> > +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
> > +	if (IS_ERR(sobj))
> > +		return PTR_ERR(sobj);
> > +
> > +	mdev = sobj->idev.mock_dev;
> > +
> > +	handle = iommu_attach_handle_get(mdev->dev.iommu_group,
> > +					 cmd->pasid_check.pasid, 0);
> > +	if (IS_ERR(handle))
> > +		attached_domain = NULL;
> > +	else
> > +		attached_domain = handle->domain;
> > +
> > +	if (cmd->pasid_check.hwpt_id) {
> > +		hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
> > +		if (IS_ERR(hwpt)) {
> 
> Do we need cmd->pasid_check.hwpt_id to be optional?

Okay. After reading PATCH-18, it looks like what we want is to
verify whether the hwpt is detached or not using hwpt_id=0.

Similar to check_iotlb() and check_cache(), I think this can be:
	/* hwpt_id=0 is to check whether pasid is detached or not */
	if (!attached_domain && !cmd->pasid_check.hwpt_id)
		goto put_and_return_0;

	hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
	if (IS_ERR())
		goto put_and_return_PTR_ERR;

	if (hwpt->domain != attached_domain)
		rc = -EINVAL;

Also, given that we pass in hwpt_id, let's call it:
iommufd_test_pasid_check_hwpt/IOMMU_TEST_OP_PASID_CHECK_HWPT?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle
  2025-03-20 15:23   ` Jason Gunthorpe
@ 2025-03-20 23:51     ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 23:51 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On 2025/3/20 23:23, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 06:47:27AM -0700, Yi Liu wrote:
>> Add kdoc to highligt the caller of iommu_[attach|replace]_group_handle()
>> and iommu_attach_device_pasid() should always provide a new handle.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/iommu.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> May want to provide a code pointer to the lockless paths in the fault
> functions in the commit message if you respin

sure.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-20 17:24   ` Nicolin Chen
@ 2025-03-20 23:58     ` Yi Liu
  2025-03-21  0:14       ` Yi Liu
  2025-03-21  3:21       ` Nicolin Chen
  0 siblings, 2 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-20 23:58 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 01:24, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:28AM -0700, Yi Liu wrote:
>> Provide a high-level API to allow replacements of one domain with another
>> for specific pasid of a device. This is similar to
>> iommu_replace_group_handle() and it is expected to be used only by IOMMUFD.
>>
>> Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> Some nits:
> 
>> @@ -3420,6 +3436,99 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
>>   
>> +/**
>> + * iommu_replace_device_pasid - Replace the domain that a pasid
>> + *                              is attached to
> 
> How about "... a specific pasid of the device is attached to"
> aligning with the clearer narrative in commit log?
> 
>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>> +			       struct device *dev, ioasid_t pasid,
>> +			       struct iommu_attach_handle *handle)
>> +{
>> +	/* Caller must be a probed driver on dev */
> 
> What's "a probed driver on dev"? Mind rephrasing this?
> 
> Also should it be placed outside this function?
> 
>> +	struct iommu_group *group = dev->iommu_group;
>> +	struct iommu_attach_handle *entry;
>> +	struct iommu_domain *curr_domain;
>> +	void *curr;
>> +	int ret;
>> +
>> +	if (!group)
>> +		return -ENODEV;
>> +
>> +	if (!domain->ops->set_dev_pasid)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (dev_iommu_ops(dev) != domain->owner ||
>> +	    pasid == IOMMU_NO_PASID || !handle)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&group->mutex);
> 
> How about guard(mutex)(&group->mutex)?

yeah, baolu mentioned it as well. But given the
>> +	entry = iommu_make_pasid_array_entry(domain, handle);
>> +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
>> +			  XA_ZERO_ENTRY, GFP_KERNEL);
>> +	if (xa_is_err(curr)) {
>> +		ret = xa_err(curr);
>> +		goto out_unlock;
>> +	}
>> +
>> +	/*
>> +	 * No domain (with or without handle) attached, hence not
>> +	 * a replace case.
>> +	 */
>> +	if (!curr) {
>> +		xa_release(&group->pasid_array, pasid);
>> +		ret = -EINVAL;
>> +		goto out_unlock;
>> +	}
>> +
>> +	/*
>> +	 * Reusing handle is problematic as there are paths that refers
>> +	 * the handle without lock. To avoid race, reject the callers that
>> +	 * attempt it.
>> +	 */
>> +	if (handle && curr == entry) {
>> +		WARN_ON(1);
>> +		ret = -EINVAL;
>> +		goto out_unlock;
>> +	}
> 
> We rejected !handle and !curr cases. So it should be enough with:
> 	if (curr == entry) {
> ?

we only want to reject the case in which handle is valid and it is
the same with old handle. If handle is null, I think it is ok to have the
same domain.

>> +
>> +	curr_domain = pasid_array_entry_to_domain(curr);
>> +	ret = 0;
>> +
>> +	if (curr_domain != domain) {
>> +		ret = __iommu_set_group_pasid(domain, group,
>> +					      pasid, curr_domain);
>> +		if (ret)
>> +			goto out_unlock;
>> +	}
> 
> Oh, does this mean that we can just use this function to replace
> a handle if domain isn't changed? Maybe add this in the kdoc?

if handle is not the same and domain is the same, yes it should
succeed. yeah, doc it would help users.

> 
>> +
>> +	if (curr != entry) {
> 
> Hmm, since we rejected "curr == entry" already, we don't need to
> double check any more?

as the above response, it only rejects when handle is valid and
happens to be the same with the old one.

> 
>> +		/*
>> +		 * The above xa_cmpxchg() reserved the memory, and the
>> +		 * group->mutex is held, this cannot fail.
>> +		 */
>> +		WARN_ON(xa_is_err(xa_store(&group->pasid_array,
>> +					   pasid, entry, GFP_KERNEL)));
>> +	}
>> +
>> +out_unlock:
>> +	mutex_unlock(&group->mutex);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
>> +
>>   /*
>>    * iommu_detach_device_pasid() - Detach the domain from pasid of device
>>    * @domain: the iommu domain.
>> -- 
>> 2.34.1
>>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 17:51     ` Nicolin Chen
@ 2025-03-21  0:02       ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:02 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 01:51, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 10:37:01AM -0700, Nicolin Chen wrote:
>> On Thu, Mar 20, 2025 at 06:47:32AM -0700, Yi Liu wrote:
>>> The existing code detects the first attach by checking the
>>> igroup->device_list. However, the igroup->hwpt can also be used to detect
>>> the first attach. In future modifications, it is better to check the
>>> igroup->hwpt instead of the device_list. To improve readbility and also
>>> prepare for further modifications on this part, this adds a helper for it.
>>>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> ---
>>> v9 -> v10: It is patch 07 of v9, it's reworked hence renamed as well.
>>> ---
>>>   drivers/iommu/iommufd/device.c | 11 +++++++++--
>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
>>> index ac54d734b819..9db36346328f 100644
>>> --- a/drivers/iommu/iommufd/device.c
>>> +++ b/drivers/iommu/iommufd/device.c
>>> @@ -444,6 +444,13 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
>>>   	return 0;
>>>   }
>>>   
>>> +static inline bool
>>> +igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
>>> +{
>>> +	lockdep_assert_held(&igroup->lock);
>>> +	return !igroup->hwpt;
>>> +}
>>> +
>>>   static int
>>>   iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
>>>   				    struct iommufd_hwpt_paging *hwpt_paging)
>>> @@ -459,7 +466,7 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
>>>   	if (rc)
>>>   		return rc;
>>>   
>>> -	if (list_empty(&igroup->device_list)) {
>>> +	if (igroup_first_attach(igroup, IOMMU_NO_PASID)) {
>>>   		rc = iommufd_group_setup_msi(igroup, hwpt_paging);
>>>   		if (rc) {
>>>   			iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt,
>>> @@ -623,7 +630,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>>>   	 * reserved regions are only updated during individual device
>>>   	 * attachment.
>>>   	 */
>>> -	if (list_empty(&igroup->device_list)) {
>>> +	if (igroup_first_attach(igroup, pasid)) {
>>>   		rc = iommufd_hwpt_attach_device(hwpt, idev, pasid);
>>>   		if (rc)
>>>   			goto err_unresv;
>>
>> We have the same list_empty in the iommufd_hw_pagetable_detach()
>> and iommufd_group_release() too?
>>
>> And I feel "igroup_is_not_attached" could be clearer, as it fits
>> the detach/release context too.
> 
> Oh, I just found that the following patch changes those paths.
> 
> Yet, at the end of the series this igroup_first_attach() is quite
> similar to iommufd_device_is_attached(). So, maybe we could align
> with that the naming here: iommufd_group_is_attached?

maybe just iommufd_group_first_attach(), I would expect using it in
the attach path so far. I can add a kdoc for this.

> With that,
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-20 23:58     ` Yi Liu
@ 2025-03-21  0:14       ` Yi Liu
  2025-03-21  3:21       ` Nicolin Chen
  1 sibling, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:14 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 07:58, Yi Liu wrote:
> On 2025/3/21 01:24, Nicolin Chen wrote:
>> On Thu, Mar 20, 2025 at 06:47:28AM -0700, Yi Liu wrote:
>>> Provide a high-level API to allow replacements of one domain with another
>>> for specific pasid of a device. This is similar to
>>> iommu_replace_group_handle() and it is expected to be used only by IOMMUFD.
>>>
>>> Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>
>> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
>>
>> Some nits:
>>
>>> @@ -3420,6 +3436,99 @@ int iommu_attach_device_pasid(struct iommu_domain 
>>> *domain,
>>>   }
>>>   EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
>>> +/**
>>> + * iommu_replace_device_pasid - Replace the domain that a pasid
>>> + *                              is attached to
>>
>> How about "... a specific pasid of the device is attached to"
>> aligning with the clearer narrative in commit log?

click too soon. some remarks are not responded yet. yep

>>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>>> +                   struct device *dev, ioasid_t pasid,
>>> +                   struct iommu_attach_handle *handle)
>>> +{
>>> +    /* Caller must be a probed driver on dev */
>>
>> What's "a probed driver on dev"? Mind rephrasing this?
>>
>> Also should it be placed outside this function?

This just follows the same pattern with the iommu_attach_device_pasid().
It was added in the below commit. I think this reason still applies to
this new replace API. So just add it here.

commit e946f8e3e62bf05da21a14658f8cb05e2a616260
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date:   Tue Aug 22 13:15:56 2023 -0300

     iommu: Remove useless group refcounting

     Several functions obtain the group reference and then release it before
     returning. This gives the impression that the refcount is protecting
     something for the duration of the function.

     In truth all of these functions are called in places that know a device
     driver is probed to the device and our locking rules already require
     that dev->iommu_group cannot change while a driver is attached to the
     struct device.

     If this was not the case then this code is already at risk of triggering
     UAF as it is racy if the dev->iommu_group is concurrently going to
     NULL/free. refcount debugging will throw a WARN if kobject_get() is
     called on a 0 refcount object to highlight the bug.

     Remove the confusing refcounting and leave behind a comment about the
     restriction.

>>> +    struct iommu_group *group = dev->iommu_group;
>>> +    struct iommu_attach_handle *entry;
>>> +    struct iommu_domain *curr_domain;
>>> +    void *curr;
>>> +    int ret;
>>> +
>>> +    if (!group)
>>> +        return -ENODEV;
>>> +
>>> +    if (!domain->ops->set_dev_pasid)
>>> +        return -EOPNOTSUPP;
>>> +
>>> +    if (dev_iommu_ops(dev) != domain->owner ||
>>> +        pasid == IOMMU_NO_PASID || !handle)
>>> +        return -EINVAL;
>>> +
>>> +    mutex_lock(&group->mutex);
>>
>> How about guard(mutex)(&group->mutex)?
> 
> yeah, baolu mentioned it as well. But given the

I may listen to Joerg if he is ok to use it. :) I haven't seen it
in the iommu.c yet.

>>> +    entry = iommu_make_pasid_array_entry(domain, handle);
>>> +    curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
>>> +              XA_ZERO_ENTRY, GFP_KERNEL);
>>> +    if (xa_is_err(curr)) {
>>> +        ret = xa_err(curr);
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    /*
>>> +     * No domain (with or without handle) attached, hence not
>>> +     * a replace case.
>>> +     */
>>> +    if (!curr) {
>>> +        xa_release(&group->pasid_array, pasid);
>>> +        ret = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    /*
>>> +     * Reusing handle is problematic as there are paths that refers
>>> +     * the handle without lock. To avoid race, reject the callers that
>>> +     * attempt it.
>>> +     */
>>> +    if (handle && curr == entry) {
>>> +        WARN_ON(1);
>>> +        ret = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>
>> We rejected !handle and !curr cases. So it should be enough with:
>>     if (curr == entry) {
>> ?
> 
> we only want to reject the case in which handle is valid and it is
> the same with old handle. If handle is null, I think it is ok to have the
> same domain.
> 
>>> +
>>> +    curr_domain = pasid_array_entry_to_domain(curr);
>>> +    ret = 0;
>>> +
>>> +    if (curr_domain != domain) {
>>> +        ret = __iommu_set_group_pasid(domain, group,
>>> +                          pasid, curr_domain);
>>> +        if (ret)
>>> +            goto out_unlock;
>>> +    }
>>
>> Oh, does this mean that we can just use this function to replace
>> a handle if domain isn't changed? Maybe add this in the kdoc?
> 
> if handle is not the same and domain is the same, yes it should
> succeed. yeah, doc it would help users.
> 
>>
>>> +
>>> +    if (curr != entry) {
>>
>> Hmm, since we rejected "curr == entry" already, we don't need to
>> double check any more?
> 
> as the above response, it only rejects when handle is valid and
> happens to be the same with the old one.
> 
>>
>>> +        /*
>>> +         * The above xa_cmpxchg() reserved the memory, and the
>>> +         * group->mutex is held, this cannot fail.
>>> +         */
>>> +        WARN_ON(xa_is_err(xa_store(&group->pasid_array,
>>> +                       pasid, entry, GFP_KERNEL)));
>>> +    }
>>> +
>>> +out_unlock:
>>> +    mutex_unlock(&group->mutex);
>>> +    return ret;
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
>>> +
>>>   /*
>>>    * iommu_detach_device_pasid() - Detach the domain from pasid of device
>>>    * @domain: the iommu domain.
>>> -- 
>>> 2.34.1
>>>
> 

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach
  2025-03-20 19:19   ` Nicolin Chen
  2025-03-20 19:29     ` Jason Gunthorpe
@ 2025-03-21  0:15     ` Yi Liu
  1 sibling, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:15 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 03:19, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:35AM -0700, Yi Liu wrote:
>> @@ -497,10 +501,13 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev,
>>   
>>   /* The device attach/detach/replace helpers for attach_handle */
>>   
>> -/* Check if idev is attached to igroup->hwpt */
>> -static bool iommufd_device_is_attached(struct iommufd_device *idev)
>> +static bool iommufd_device_is_attached(struct iommufd_device *idev,
>> +				       ioasid_t pasid)
>>   {
>> -	return xa_load(&idev->igroup->attach->device_array, idev->obj.id);
>> +	struct iommufd_attach *attach;
>> +
>> +	attach = xa_load(&idev->igroup->pasid_attach, pasid);
>> +	return xa_load(&attach->device_array, idev->obj.id);
> 
> This helper is called in iommufd_device_do_replace() after it does
> xa_cmpxchg() on to the same igroup->pasid_attach?
> 
>>   static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
>> @@ -627,19 +634,25 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>>   
>>   	mutex_lock(&igroup->lock);
>>   
>> -	attach = igroup->attach;
>> +	attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL,
>> +			    XA_ZERO_ENTRY, GFP_KERNEL);
>> +	if (xa_is_err(attach)) {
>> +		rc = xa_err(attach);
>> +		goto err_unlock;
>> +	}
>> +
>>   	if (!attach) {
>>   		attach = kzalloc(sizeof(*attach), GFP_KERNEL);
> 
> Since this is attach() and we do xa_cmpxchg() with an "old=NULL",
> should !attach always be true?
> 
>> -	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
>> +	rc = xa_insert(&attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
> 
> Nit: looks like this should have been done in the PATCH-8
> "iommufd/device: Replace device_list with device_array"

aha, yes. we can save one line here in this patch.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 08/18] iommufd/device: Replace device_list with device_array
  2025-03-20 17:20   ` Jason Gunthorpe
@ 2025-03-21  0:25     ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:25 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On 2025/3/21 01:20, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 06:47:34AM -0700, Yi Liu wrote:
>> @@ -298,6 +298,20 @@ u32 iommufd_device_to_id(struct iommufd_device *idev)
>>   }
>>   EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, "IOMMUFD");
>>   
>> +static int iommufd_group_device_num(struct iommufd_group *igroup)
>> +{
>> +	struct iommufd_device *idev;
>> +	unsigned long index;
>> +	int count = 0;
> 
> unsigned int and unsigned int return code too

got it.

>> +	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
>> +		       GFP_KERNEL);
> 
> Probably don't really care, but note the choice of obj.id here is
> going to waste some memory in the xarray. 0 based would be more
> memory efficient, but some of the other operations would be slower.

yes. The iommufd_device_is_attached() and the detach path might be slower
since it would need to loop all the devices. I also considered to store the
id for indexing the device_array. But the problem is idev can be stored in
multiple device_arrays. A single id does not just like a single list_head
cannot work. Seems no better choice...

> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 08/18] iommufd/device: Replace device_list with device_array
  2025-03-20 18:38   ` Nicolin Chen
@ 2025-03-21  0:30     ` Yi Liu
  2025-03-21  3:25       ` Nicolin Chen
  0 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:30 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 02:38, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:34AM -0700, Yi Liu wrote:
>> igroup->attach->device_list is used to track attached device of a group
>> in the RID path. Such tracking is also needed in the PASID path in order
>> to share path with the RID path.
>>
>> While there is only one list_head in the iommufd_device. It cannot work
>> if the device has been attached in both RID path and PASID path. To solve
>> it, replacing the device_list with an xarray. The attached iommufd_device
>> is stored in the entry indexed by the idev->obj.id.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> Nit:
> 
>>   static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
>> @@ -625,20 +634,27 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>>   			rc = -ENOMEM;
>>   			goto err_unlock;
>>   		}
>> -		INIT_LIST_HEAD(&attach->device_list);
>> +		xa_init(&attach->device_array);
>>   	}
>>   
>>   	old_hwpt = attach->hwpt;
>>   
>> +	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
>> +		       GFP_KERNEL);
>> +	if (rc) {
>> +		WARN_ON(rc == -EBUSY && !old_hwpt);
>> +		goto err_free_attach;
>> +	}
>> +
>>   	if (old_hwpt && old_hwpt != hwpt) {
>>   		rc = -EINVAL;
>> -		goto err_free_attach;
>> +		goto err_release_devid;
>>   	}
> 
> Could we reject old_hwpt != hwpt (replace case) before xa_insert?

reject it after xa_insert() has an extra benefit. It can detect duplicated
attach on the same device and hwpt. This is supposed to be -EBUSY error.
This is aligned with __iommu_attach_group(). I think this was missed in the
before.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-20 20:42   ` Nicolin Chen
  2025-03-20 23:29     ` Jason Gunthorpe
@ 2025-03-21  0:31     ` Yi Liu
  2025-03-21  0:35       ` Nicolin Chen
  1 sibling, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:31 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 04:42, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:37AM -0700, Yi Liu wrote:
>> @@ -579,10 +582,12 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
>>   {
>>   	struct iommufd_attach_handle *handle;
>>   
>> -	WARN_ON(pasid != IOMMU_NO_PASID);
>> +	if (pasid == IOMMU_NO_PASID)
>> +		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
>> +	else
>> +		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
>>   
>>   	handle = iommufd_device_get_attach_handle(idev, pasid);
>> -	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> 
> This changes the sequence of these calls?
> 
> Is it correct to do iommufd_device_get_attach_handle() after
> iommu_detach_group_handle()?
> 

oops, yes it is. It should keep the order. will fix it.

> Otherwise,
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd pasid attach/detach
  2025-03-20 13:47 ` [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd " Yi Liu
@ 2025-03-21  0:34   ` Nicolin Chen
  2025-03-21 15:26     ` Yi Liu
  0 siblings, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-21  0:34 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Thu, Mar 20, 2025 at 06:47:44AM -0700, Yi Liu wrote:

> +TEST_F(iommufd_device_pasid, pasid_attach)
> +{
> +	struct iommu_hwpt_selftest data = {
> +		.iotlb =  IOMMU_TEST_IOTLB_DEFAULT,
> +	};
> +	uint32_t nested_hwpt_id[3] = {};
> +	uint32_t parent_hwpt_id = 0;
> +	uint32_t fault_id, fault_fd;
> +	uint32_t s2_hwpt_id = 0;
> +	uint32_t iopf_hwpt_id;
> +	uint32_t pasid = 100;
> +	uint32_t auto_hwpt;
> +	uint32_t viommu_id;
> +	bool result;
> +
> +	/* Allocate two nested hwpts sharing one common parent hwpt */
> +	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
> +			    IOMMU_HWPT_ALLOC_NEST_PARENT,
> +			    &parent_hwpt_id);
> +	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
> +				   IOMMU_HWPT_ALLOC_PASID,
> +				   &nested_hwpt_id[0],
> +				   IOMMU_HWPT_DATA_SELFTEST,
> +				   &data, sizeof(data));
> +	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
> +				   IOMMU_HWPT_ALLOC_PASID,
> +				   &nested_hwpt_id[1],
> +				   IOMMU_HWPT_DATA_SELFTEST,
> +				   &data, sizeof(data));
> +
> +	/* Faulte related preparation */

Fault

> +	/* Allocate a regular nested hwpt based on viommu */
> +	test_cmd_viommu_alloc(self->device_id, parent_hwpt_id,
> +			      IOMMU_VIOMMU_TYPE_SELFTEST,
> +			      &viommu_id);
> +	test_cmd_hwpt_alloc_nested(self->device_id, viommu_id,
> +				   IOMMU_HWPT_ALLOC_PASID,
> +				   &nested_hwpt_id[2],
> +				   IOMMU_HWPT_DATA_SELFTEST, &data,
> +				   sizeof(data));
> +
> +	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
> +			    IOMMU_HWPT_ALLOC_PASID,
> +			    &s2_hwpt_id);
> +
> +	/* Attach RID to non-pasid compat domain, */
> +	test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
> +	/* then attach to pasid should fail */
> +	test_err_pasid_attach(EINVAL, pasid, s2_hwpt_id, NULL);
> +
> +	/* Attach RID to pasid compat domain, */
> +	test_cmd_mock_domain_replace(self->stdev_id, s2_hwpt_id);
> +	/* then attach to pasid should succeed, */
> +	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
> +	/* but attach RID to non-pasid compat domain should fail now. */
> +	test_err_mock_domain_replace(EINVAL, self->stdev_id, parent_hwpt_id);
> +	test_cmd_pasid_detach(pasid);
> +
> +	if (!variant->pasid_capable) {
> +		/*
> +		 * PASID-compatible domain can be used by non-PASID-capable
> +		 * device.
> +		 */
> +		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, nested_hwpt_id[0]);
> +		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, self->ioas_id);
> +		/*
> +		 * Attach hwpt to pasid#100 of non-PASID-capable device,
> +		 * should fail, no matter domain is pasid-comapt or not.
> +		 */
> +		EXPECT_ERRNO(EINVAL,
> +			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
> +						    pasid, parent_hwpt_id, NULL));
> +		EXPECT_ERRNO(EINVAL,
> +			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
> +						    pasid, s2_hwpt_id, NULL));
> +	}

It seems that we should test these anyway without a variant?

> +
> +	/*
> +	 * Attach non pasid compat hwpt to pasid-capable device, should
> +	 * fail, and have null domain.
> +	 */
> +	test_err_pasid_attach(EINVAL, pasid, parent_hwpt_id, NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, 0, &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Attach ioas to pasid 100, should succeed, domain should
> +	 * be valid.
> +	 */
> +	test_cmd_pasid_attach(pasid, self->ioas_id, &auto_hwpt);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, auto_hwpt, &result));
> +	EXPECT_EQ(1, result);

Hmm, I thought that a non-RID PASID slot could only attach a PASID-
compatible HWPT. I think I am totally confused now... lol

Perhaps we need a detailed documentation somewhere, at least as a
reminder or so?

> +
> +	/* Attach to pasid 100 which has been attached, should fail. */
> +	test_err_pasid_attach(EBUSY, pasid, self->ioas_id, &auto_hwpt);
> +
> +	/*
> +	 * Try attach pasid 100 with another hwpt, should FAIL
> +	 * as attach does not allow overwrite, use REPLACE instead.
> +	 */
> +	test_err_pasid_attach(EBUSY, pasid, nested_hwpt_id[0], NULL);
> +
> +	/*
> +	 * Detach hwpt from pasid 100, and check if the pasid 100
> +	 * has null domain. Should be done before the next attach.
> +	 */
> +	test_cmd_pasid_detach(pasid);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, 0, &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Attach nested hwpt to pasid 100, should succeed, domain
> +	 * should be valid.
> +	 */
> +	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, nested_hwpt_id[0],
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/* Attach to pasid 100 which has been attached, should fail. */
> +	test_err_pasid_attach(EBUSY, pasid, nested_hwpt_id[0], NULL);
> +
> +	/*
> +	 * Detach hwpt from pasid 100, and check if the pasid 100
> +	 * has null domain
> +	 */
> +	test_cmd_pasid_detach(pasid);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, 0, &result));
> +	EXPECT_EQ(1, result);
> +
> +	/* Replace tests */
> +
> +	pasid = 200;
> +	/*
> +	 * Replace pasid 200 without attaching it first, should
> +	 * fail with -EINVAL.
> +	 */
> +	test_err_cmd_pasid_replace(EINVAL, pasid, s2_hwpt_id, NULL);
> +
> +	/*
> +	 * Attach a s2 hwpt to pasid 200, should succeed, domain should

Attach the ..

> +	 * be valid.
> +	 */
> +	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, s2_hwpt_id,
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Replace pasid 200 with self->ioas_id, should succeed,
> +	 * and have valid domain.
> +	 */
> +	test_cmd_pasid_replace(pasid, self->ioas_id, &auto_hwpt);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, auto_hwpt,
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Replace a nested hwpt for pasid 200, should succeed,
> +	 * and have valid domain.
> +	 */
> +	test_cmd_pasid_replace(pasid, nested_hwpt_id[0], NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, nested_hwpt_id[0],
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Replace with another nested hwpt for pasid 200, should
> +	 * succeed, and have valid domain.
> +	 */
> +	test_cmd_pasid_replace(pasid, nested_hwpt_id[1], NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, nested_hwpt_id[1],
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Detach hwpt from pasid 200, and check if the pasid 200
> +	 * has null domain.
> +	 */
> +	test_cmd_pasid_detach(pasid);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, 0, &result));
> +	EXPECT_EQ(1, result);
> +
> +	/* Negative Tests for pasid replace, use pasid 1024 */
> +
> +	/*
> +	 * Attach a s2 hwpt to pasid 1024, should succeed, domain should

Attach the ...

> +	 * be valid.
> +	 */
> +	pasid = 1024;
> +	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, s2_hwpt_id,
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Replace pasid 1024 with self->ioas_id, should fail,
> +	 * but have the old valid domain. This is a designed
> +	 * negative case, normally replace with self->ioas_id
> +	 * could succeed.
> +	 */
> +	test_err_cmd_pasid_replace(ENOMEM, pasid, self->ioas_id, NULL);
> +	ASSERT_EQ(0,
> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> +					      pasid, s2_hwpt_id,
> +					      &result));
> +	EXPECT_EQ(1, result);
> +
> +	/*
> +	 * Detach hwpt from pasid 1024, and check if the pasid 1024
> +	 * has null domain.
> +	 */
> +	test_cmd_pasid_detach(pasid);

The designed "failing" replace does "pasid_1024_attached = false",
meaning that this detach() isn't necessary?

Or perhaps the designed "failing" shouldn't set "attached = false"?

> +	/* Detach the s2_hwpt_id from RID */
> +	test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
> +
> +	test_ioctl_destroy(nested_hwpt_id[0]);
> +	test_ioctl_destroy(nested_hwpt_id[1]);
> +	test_ioctl_destroy(nested_hwpt_id[2]);
> +	test_ioctl_destroy(viommu_id);
> +	test_ioctl_destroy(parent_hwpt_id);
> +	test_ioctl_destroy(s2_hwpt_id);

Once detached, all the destroys can be done automatically?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-21  0:31     ` Yi Liu
@ 2025-03-21  0:35       ` Nicolin Chen
  2025-03-21  1:05         ` Yi Liu
  0 siblings, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-21  0:35 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Fri, Mar 21, 2025 at 08:31:23AM +0800, Yi Liu wrote:
> On 2025/3/21 04:42, Nicolin Chen wrote:
> > On Thu, Mar 20, 2025 at 06:47:37AM -0700, Yi Liu wrote:
> > > @@ -579,10 +582,12 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
> > >   {
> > >   	struct iommufd_attach_handle *handle;
> > > -	WARN_ON(pasid != IOMMU_NO_PASID);
> > > +	if (pasid == IOMMU_NO_PASID)
> > > +		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> > > +	else
> > > +		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
> > >   	handle = iommufd_device_get_attach_handle(idev, pasid);
> > > -	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
> > 
> > This changes the sequence of these calls?
> > 
> > Is it correct to do iommufd_device_get_attach_handle() after
> > iommu_detach_group_handle()?
> > 
> 
> oops, yes it is. It should keep the order. will fix it.

Would you please also see if this could be covered by the selftest
given that all designed tests didn't catch this?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID
  2025-03-20 22:23   ` Nicolin Chen
  2025-03-20 23:31     ` Jason Gunthorpe
@ 2025-03-21  0:41     ` Yi Liu
  1 sibling, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:41 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 06:23, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:38AM -0700, Yi Liu wrote:
>> Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce
>> the RID to use PASID-compatible domain if PASID has been attached, and
>> vice versa. The PASID path has already enforced it. This adds the
>> enforcement in the RID path.
>>
>> This enforcement requires a lock across the RID and PASID attach path,
>> the idev->igroup->lock is used as both the RID and the PASID path holds
>> it.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> A question:
> 
> this isn't about pasid_compat, yet...
> 
>> @@ -514,8 +514,28 @@ static int iommufd_hwpt_pasid_compat(struct iommufd_hw_pagetable *hwpt,
>>   				     struct iommufd_device *idev,
>>   				     ioasid_t pasid)
>>   {
>> -	if (pasid != IOMMU_NO_PASID && !hwpt->pasid_compat)
>> -		return -EINVAL;
>> +	struct iommufd_group *igroup = idev->igroup;
>> +
>> +	lockdep_assert_held(&igroup->lock);
>> +
>> +	if (pasid == IOMMU_NO_PASID) {
>> +		unsigned long start = IOMMU_NO_PASID;
>> +
>> +		if (!hwpt->pasid_compat &&
>> +		    xa_find_after(&igroup->pasid_attach,
>> +				  &start, UINT_MAX, XA_PRESENT))
>> +			return -EINVAL;
>> +	} else {
>> +		struct iommufd_attach *attach;
>> +
>> +		if (!hwpt->pasid_compat)
>> +			return -EINVAL;
>> +
>> +		attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
>> +		if (attach && attach->hwpt && !attach->hwpt->pasid_compat)
>> +			return -EINVAL;
> 
> Should we also make sure that RID hwpt is attached before storing
> any PASID hwpt to a !IOMMU_NO_PASID slot in igroup->pasid_attach?

I doubt if it is required conceptually. At least on VT-d scalable mode, it
is allowed to use PASID before the RID is attached although it does not
exist in reality.

For VM usages, I think the RID is always attached before PASIDs since the
PASID usage is invoked by guest. Not sure about userspace drivers, it might
attach PASID before attaching RID if it only uses PASID. In such scenarios,
the RID is blocked.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID
  2025-03-20 23:31     ` Jason Gunthorpe
@ 2025-03-21  0:45       ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen; +Cc: kevin.tian, joro, baolu.lu, iommu

On 2025/3/21 07:31, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 03:23:15PM -0700, Nicolin Chen wrote:
> 
>>> +		attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID);
>>> +		if (attach && attach->hwpt && !attach->hwpt->pasid_compat)
>>> +			return -EINVAL;
>>
>> Should we also make sure that RID hwpt is attached before storing
>> any PASID hwpt to a !IOMMU_NO_PASID slot in igroup->pasid_attach?
> 
> AFAIK, no.. If a driver cannot support the combination of blocked on
> RID and a paging on PASID then it should fail the attach.

I think this is by architecture, not need to really check what the
RID path is attached. right?

> smmuv3 has support for this so, and I'd expect the same of other
> drivers.

The VT-d scalable mode allows each PASID (includes the IOMMU_NO_PASID which
is used by the RID path) to be attached to different domains (blocked,
paging, identity, sva and nested). So I think intel iommu side should
support it as well.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain
  2025-03-20 17:51   ` Jason Gunthorpe
@ 2025-03-21  0:52     ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  0:52 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, joro, baolu.lu, iommu, nicolinc

On 2025/3/21 01:51, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2025 at 06:47:40AM -0700, Yi Liu wrote:
>> The underlying infrastructure has supported the PASID attach and related
>> enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This
>> extends iommufd to support PASID compatible domain requested by userspace.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>> v9 -> v10: Dropped r-b tag as the uapi description for ALLOC_PASID is modified
>> ---
>>   drivers/iommu/iommufd/device.c       | 4 +++-
>>   drivers/iommu/iommufd/hw_pagetable.c | 7 ++++---
>>   include/uapi/linux/iommufd.h         | 3 +++
>>   3 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
>> index 54ffef9c17f7..f09dcddf777b 100644
>> --- a/drivers/iommu/iommufd/device.c
>> +++ b/drivers/iommu/iommufd/device.c
>> @@ -973,7 +973,9 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, ioasid_t pasid,
>>   	}
>>   
>>   	hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, pasid,
>> -						0, immediate_attach, NULL);
>> +						pasid != IOMMU_NO_PASID ?
>> +						    IOMMU_HWPT_ALLOC_PASID : 0,
>> +						immediate_attach, NULL);
> 
> I wonder if there is any point to this since userspace couldn't
> actually just use autodomains and have something work since the RID
> autodomain won't have PASID. I think if userspace wants to use pasid
> it has to manually allocate the HWPT for the RID and then why not also
> allocate for the PASID?
> 
> Anyhow, it doesn't matter much as it is so simple for autodomains..

hmmm. just drop this then. I was just making the PASID path to be on par
with the RID path. Given that RID path is not able to use it with PASID
usage. Dropping it would be better. Then the uapi doc can be

@@ -393,6 +393,9 @@ struct iommu_vfio_ioas {
   *                          Any domain attached to the non-PASID part of the
   *                          device must also be flagged, otherwise 
attaching a
   *                          PASID will blocked.
+ *                          For the user that wants to attach PASID, ioas is
+ *                          not recommended for both the non-PASID part
+ *                          and PASID part of the device.
   *                          If IOMMU does not support PASID it will return
   *                          error (-EOPNOTSUPP).
   */


> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-21  0:35       ` Nicolin Chen
@ 2025-03-21  1:05         ` Yi Liu
  2025-03-21 11:45           ` Jason Gunthorpe
  0 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-21  1:05 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 08:35, Nicolin Chen wrote:
> On Fri, Mar 21, 2025 at 08:31:23AM +0800, Yi Liu wrote:
>> On 2025/3/21 04:42, Nicolin Chen wrote:
>>> On Thu, Mar 20, 2025 at 06:47:37AM -0700, Yi Liu wrote:
>>>> @@ -579,10 +582,12 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt,
>>>>    {
>>>>    	struct iommufd_attach_handle *handle;
>>>> -	WARN_ON(pasid != IOMMU_NO_PASID);
>>>> +	if (pasid == IOMMU_NO_PASID)
>>>> +		iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
>>>> +	else
>>>> +		iommu_detach_device_pasid(hwpt->domain, idev->dev, pasid);
>>>>    	handle = iommufd_device_get_attach_handle(idev, pasid);
>>>> -	iommu_detach_group_handle(hwpt->domain, idev->igroup->group);
>>>
>>> This changes the sequence of these calls?
>>>
>>> Is it correct to do iommufd_device_get_attach_handle() after
>>> iommu_detach_group_handle()?
>>>
>>
>> oops, yes it is. It should keep the order. will fix it.
> 
> Would you please also see if this could be covered by the selftest
> given that all designed tests didn't catch this?

I suppose it can be caught by the iommufd_fail_nth. But the fact is not.
I think the iommufd_auto_response_faults() should be able to catch it
since it parses the handle if there is pending PRIs. Maybe I just add
the below line between attach and replace/detach.

test_cmd_trigger_iopf(self->device_id, fault_fd);

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 23:20   ` Nicolin Chen
@ 2025-03-21  1:20     ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  1:20 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 07:20, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:43AM -0700, Yi Liu wrote:
>> This adds 5 test ops for pasid attach/replace/detach testing. There are
>> ops to attach/detach pasid, and also op to check the attached domain of
>> a pasid.
>>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>   drivers/iommu/iommufd/iommufd_test.h |  31 ++++++
>>   drivers/iommu/iommufd/selftest.c     | 151 +++++++++++++++++++++++++++
>>   2 files changed, 182 insertions(+)
>>
>> diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
>> index 1a066feb8697..efcb509f7d56 100644
>> --- a/drivers/iommu/iommufd/iommufd_test.h
>> +++ b/drivers/iommu/iommufd/iommufd_test.h
>> @@ -25,6 +25,11 @@ enum {
>>   	IOMMU_TEST_OP_TRIGGER_IOPF,
>>   	IOMMU_TEST_OP_DEV_CHECK_CACHE,
>>   	IOMMU_TEST_OP_TRIGGER_VEVENT,
>> +	IOMMU_TEST_OP_PASID_ATTACH,
>> +	IOMMU_TEST_OP_PASID_REPLACE,
>> +	IOMMU_TEST_OP_PASID_MIX_REPLACE_HANDLE,
> 
> And we ould drop IOMMU_TEST_OP_PASID_MIX_REPLACE_HANDLE?
> 
> And totally 4 test ops instead of "5" in the commit log.

yes!!this should be dropped.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-20 23:17   ` Nicolin Chen
  2025-03-20 23:33     ` Jason Gunthorpe
  2025-03-20 23:42     ` Nicolin Chen
@ 2025-03-21  1:43     ` Yi Liu
  2025-03-21 17:25       ` Nicolin Chen
  2 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-21  1:43 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 07:17, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:43AM -0700, Yi Liu wrote:
>> @@ -150,6 +155,32 @@ struct iommu_test_cmd {
>>   		struct {
>>   			__u32 dev_id;
>>   		} trigger_vevent;
>> +		struct {
>> +			__u32 pasid;
>> +			__u32 pt_id;
>> +			/* @id is stdev_id
>> +			 * pasid#1024 is for special test, do not use it
>> +			 * in normal case.
>> +			 */
> 
> How about add on top of these structs:
> #define IOMMU_TEST_PASID_RESERVED 1024

yep

> Also, the coding style of the multi-line comments is a bit odd.

yeah, but it cannot be finished in one line. And I think it is necessary
to add it to note how userspace should set the id field and pasid field.

>> +		} pasid_attach;
>> +		struct {
>> +			__u32 pasid;
>> +			__u32 pt_id;
>> +			/* @id is stdev_id
>> +			 * pasid#1024 is for special test, do not use it
>> +			 * in normal case.
>> +			 */
> 
> Ditto
> 
>> diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
>> index 691e7a23f300..37c9cd285541 100644
>> --- a/drivers/iommu/iommufd/selftest.c
>> +++ b/drivers/iommu/iommufd/selftest.c
>> @@ -223,10 +223,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
>>   	return 0;
>>   }
>>   
>> +static bool pasid_1024_attached;
> 
> I recall syzkaller would do multi-threading... We might need a
> global mutex or something atomic_t?

maybe move it to mdev as Jason suggested in another email.

>>   static int mock_domain_set_dev_pasid_nop(struct iommu_domain *domain,
>>   					 struct device *dev, ioasid_t pasid,
>>   					 struct iommu_domain *old)
>>   {
>> +	/*
>> +	 * First attach with pasid 1024 succ, second attach would fail.
> 
> succeeds?

yep

> 
>> +	 * This is helpful to test the case in which the iommu core needs
>> +	 * to rollback to old domain due to driver failure.
>> +	 */
>> +	if (pasid == 1024) {
>> +		if (domain->type == IOMMU_DOMAIN_BLOCKED) {
>> +			pasid_1024_attached = false;
>> +		} else if (pasid_1024_attached) {
>> +			pasid_1024_attached = false;
>> +			// Fake an error to fail the replacement
>> +			return -ENOMEM;
> 
> /* Fake an error to fail the replacement */
> 
> While failing this, why does it detach pasid-1024? Maybe some extra
> comments for what's doing?

do you mean when does it detach?

>> +static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
>> +					   struct iommu_test_cmd *cmd)
>> +{
>> +	struct iommu_domain *attached_domain, *expect_domain = NULL;
>> +	struct iommufd_hw_pagetable *hwpt = NULL;
>> +	struct iommu_attach_handle *handle;
>> +	struct selftest_obj *sobj;
>> +	struct mock_dev *mdev;
>> +	bool result;
>> +	int rc = 0;
>> +
>> +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
>> +	if (IS_ERR(sobj))
>> +		return PTR_ERR(sobj);
>> +
>> +	mdev = sobj->idev.mock_dev;
>> +
>> +	handle = iommu_attach_handle_get(mdev->dev.iommu_group,
>> +					 cmd->pasid_check.pasid, 0);
>> +	if (IS_ERR(handle))
>> +		attached_domain = NULL;
>> +	else
>> +		attached_domain = handle->domain;
>> +
>> +	if (cmd->pasid_check.hwpt_id) {
>> +		hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
>> +		if (IS_ERR(hwpt)) {
> 
> Do we need cmd->pasid_check.hwpt_id to be optional?

not intend to make it optional. just wants to use 0 as a special
value hence no need to retrieve hwpt. Hence be able to check if this
pasid is attached or not.

> 
>> +			rc = PTR_ERR(hwpt);
>> +			goto out_put_dev;
>> +		}
>> +		expect_domain = hwpt->domain;
>> +	}
>> +
>> +	result = (attached_domain == expect_domain) ? 1 : 0;
>> +	if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr),
>> +			 &result, sizeof(result)))
>> +		rc = -EFAULT;
> 
> If we do want it to be optional, we can't unconditionally check the
> result then?
> 
>> +static int iommufd_test_pasid_attach(struct iommufd_ucmd *ucmd,
>> +				     struct iommu_test_cmd *cmd)
>> +{
>> +	struct selftest_obj *sobj;
>> +	int rc;
>> +
>> +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
>> +	if (IS_ERR(sobj))
>> +		return PTR_ERR(sobj);
>> +
>> +	rc = iommufd_device_attach(sobj->idev.idev, cmd->pasid_attach.pasid,
>> +				   &cmd->pasid_attach.pt_id);
>> +	if (rc)
>> +		goto out_sobj;
>> +
>> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
>> +	if (rc)
>> +		iommufd_device_detach(sobj->idev.idev,
>> +				      cmd->pasid_attach.pasid);
>> +
>> +out_sobj:
>> +	iommufd_put_object(ucmd->ictx, &sobj->obj);
>> +	return rc;
>> +}
>> +
>> +static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd,
>> +				      struct iommu_test_cmd *cmd)
>> +{
>> +	struct selftest_obj *sobj;
>> +	int rc;
>> +
>> +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
>> +	if (IS_ERR(sobj))
>> +		return PTR_ERR(sobj);
>> +
>> +	rc = iommufd_device_replace(sobj->idev.idev, cmd->pasid_attach.pasid,
>> +				    &cmd->pasid_attach.pt_id);
>> +	if (rc)
>> +		goto out_sobj;
>> +
>> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
>> +
>> +out_sobj:
>> +	iommufd_put_object(ucmd->ictx, &sobj->obj);
>> +	return rc;
> 
> If iommufd_ucmd_respond fails, do we need to revert like we do in
> iommufd_test_pasid_attach()?

It should be reverting to the old hwpt. It lacks of a helper to get the old
hwpt so far. I can add one since we have pasid_attach array now. But it
ends up with helpers used only by selftest which is not so positive. Also,
it requires a mock_dev->lock to sync the attach/replace/detach. Then I
found iommufd_test_mock_domain_replace() just returns without revert. So
I chose the simpler way.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle
  2025-03-20 13:47 ` [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle Yi Liu
  2025-03-20 15:23   ` Jason Gunthorpe
@ 2025-03-21  2:35   ` Baolu Lu
  1 sibling, 0 replies; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  2:35 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> Add kdoc to highligt the caller of iommu_[attach|replace]_group_handle()
> and iommu_attach_device_pasid() should always provide a new handle.
> 
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>

I supposed that we could add a line of code to check and enforce this.
But anyway it's fine to make it mandatory in the kdoc.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Thanks,
baolu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-20 13:47 ` [PATCH v10 02/18] iommu: Introduce a replace API for device pasid Yi Liu
  2025-03-20 17:24   ` Nicolin Chen
@ 2025-03-21  3:08   ` Baolu Lu
  2025-03-21  4:19     ` Yi Liu
  1 sibling, 1 reply; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  3:08 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> Provide a high-level API to allow replacements of one domain with another
> for specific pasid of a device. This is similar to
> iommu_replace_group_handle() and it is expected to be used only by IOMMUFD.
> 
> Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> v9 - > v10: Convert to the v8 version, added a check to fail the case
>              in which the passed handle is equal to the existing one.
> ---
>   drivers/iommu/iommu-priv.h |   4 ++
>   drivers/iommu/iommu.c      | 117 +++++++++++++++++++++++++++++++++++--
>   2 files changed, 117 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index b4508423e13b..2985f05d699f 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -43,4 +43,8 @@ void iommu_detach_group_handle(struct iommu_domain *domain,
>   int iommu_replace_group_handle(struct iommu_group *group,
>   			       struct iommu_domain *new_domain,
>   			       struct iommu_attach_handle *handle);
> +
> +int iommu_replace_device_pasid(struct iommu_domain *domain,
> +			       struct device *dev, ioasid_t pasid,
> +			       struct iommu_attach_handle *handle);
>   #endif /* __LINUX_IOMMU_PRIV_H */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index cffd96e3efd2..07134bb85c00 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -513,6 +513,13 @@ static void iommu_deinit_device(struct device *dev)
>   	dev_iommu_free(dev);
>   }
>   
> +static inline struct iommu_domain *pasid_array_entry_to_domain(void *entry)
> +{
> +	if (xa_pointer_tag(entry) == IOMMU_PASID_ARRAY_DOMAIN)
> +		return xa_untag_pointer(entry);
> +	return ((struct iommu_attach_handle *)xa_untag_pointer(entry))->domain;
> +}

It's not good practice to put an inline helper in a C file. Probably
change it to a regular function or move it to iommu_priv.h?

> +
>   DEFINE_MUTEX(iommu_probe_device_lock);
>   
>   static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
> @@ -3311,14 +3318,15 @@ static void iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid,
>   }
>   
>   static int __iommu_set_group_pasid(struct iommu_domain *domain,
> -				   struct iommu_group *group, ioasid_t pasid)
> +				   struct iommu_group *group, ioasid_t pasid,
> +				   struct iommu_domain *old)
>   {
>   	struct group_device *device, *last_gdev;
>   	int ret;
>   
>   	for_each_group_device(group, device) {
>   		ret = domain->ops->set_dev_pasid(domain, device->dev,
> -						 pasid, NULL);
> +						 pasid, old);
>   		if (ret)
>   			goto err_revert;
>   	}
> @@ -3330,7 +3338,15 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain,
>   	for_each_group_device(group, device) {
>   		if (device == last_gdev)
>   			break;
> -		iommu_remove_dev_pasid(device->dev, pasid, domain);
> +		/*
> +		 * If no old domain, undo the succeeded devices/pasid.
> +		 * Otherwise, rollback the succeeded devices/pasid to the old
> +		 * domain. And it is a driver bug to fail attaching with a
> +		 * previously good domain.
> +		 */
> +		if (!old || WARN_ON(old->ops->set_dev_pasid(old, device->dev,
> +							    pasid, domain)))
> +			iommu_remove_dev_pasid(device->dev, pasid, domain);
>   	}
>   	return ret;
>   }
> @@ -3399,7 +3415,7 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
>   	if (ret)
>   		goto out_unlock;
>   
> -	ret = __iommu_set_group_pasid(domain, group, pasid);
> +	ret = __iommu_set_group_pasid(domain, group, pasid, NULL);
>   	if (ret) {
>   		xa_release(&group->pasid_array, pasid);
>   		goto out_unlock;
> @@ -3420,6 +3436,99 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
>   }
>   EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
>   
> +/**
> + * iommu_replace_device_pasid - Replace the domain that a pasid
> + *                              is attached to
> + * @domain: the new iommu domain
> + * @dev: the attached device.
> + * @pasid: the pasid of the device.
> + * @handle: the attach handle.
> + *
> + * This API allows the pasid to switch domains. The @pasid should have been
> + * attached. Otherwise, this fails. The pasid will keep the old configuration
> + * if replacement failed.
> + *
> + * Caller should always provide a new handle to avoid race with the paths
> + * that have lockless reference to handle if it intends to pass a valid handle.
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_replace_device_pasid(struct iommu_domain *domain,
> +			       struct device *dev, ioasid_t pasid,
> +			       struct iommu_attach_handle *handle)
> +{
> +	/* Caller must be a probed driver on dev */
> +	struct iommu_group *group = dev->iommu_group;
> +	struct iommu_attach_handle *entry;
> +	struct iommu_domain *curr_domain;
> +	void *curr;
> +	int ret;
> +
> +	if (!group)
> +		return -ENODEV;
> +
> +	if (!domain->ops->set_dev_pasid)
> +		return -EOPNOTSUPP;
> +
> +	if (dev_iommu_ops(dev) != domain->owner ||
> +	    pasid == IOMMU_NO_PASID || !handle)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);
> +	entry = iommu_make_pasid_array_entry(domain, handle);
> +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
> +			  XA_ZERO_ENTRY, GFP_KERNEL);
> +	if (xa_is_err(curr)) {
> +		ret = xa_err(curr);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * No domain (with or without handle) attached, hence not
> +	 * a replace case.
> +	 */
> +	if (!curr) {
> +		xa_release(&group->pasid_array, pasid);
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * Reusing handle is problematic as there are paths that refers
> +	 * the handle without lock. To avoid race, reject the callers that
> +	 * attempt it.
> +	 */
> +	if (handle && curr == entry) {
> +		WARN_ON(1);
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}

"handle" should never be a NULL. Or not?

> +
> +	curr_domain = pasid_array_entry_to_domain(curr);
> +	ret = 0;
> +
> +	if (curr_domain != domain) {

Is there a real use case where a caller needs to replace a domain with a
different attach handle? If not, let start from simple, just don't
support the same domain case...

> +		ret = __iommu_set_group_pasid(domain, group,
> +					      pasid, curr_domain);
> +		if (ret)
> +			goto out_unlock;
> +	}
> +
> +	if (curr != entry) {
> +		/*
> +		 * The above xa_cmpxchg() reserved the memory, and the
> +		 * group->mutex is held, this cannot fail.
> +		 */
> +		WARN_ON(xa_is_err(xa_store(&group->pasid_array,
> +					   pasid, entry, GFP_KERNEL)));
> +	}

... then the code could be simplified like this,

         curr_domain = pasid_array_entry_to_domain(curr);
         if (curr == entry || curr_domain == domain) {
                 ret = -EINVAL;
                 goto out_unlock;
         }

         ret = __iommu_set_group_pasid(domain, group, pasid, curr_domain);
         if (ret)
                 goto out_unlock;

         /*
          * The above xa_cmpxchg() reserved the memory, and the
          * group->mutex is held, this cannot fail.
          */
         WARN_ON(xa_is_err(xa_store(&group->pasid_array, pasid, entry, 
GFP_KERNEL)));

out_unlock:
         mutex_unlock(&group->mutex);
         return ret;

Anything overlooked?

> +
> +out_unlock:
> +	mutex_unlock(&group->mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
> +
>   /*
>    * iommu_detach_device_pasid() - Detach the domain from pasid of device
>    * @domain: the iommu domain.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 03/18] iommufd: Pass @pasid through the device attach/replace path
  2025-03-20 13:47 ` [PATCH v10 03/18] iommufd: Pass @pasid through the device attach/replace path Yi Liu
@ 2025-03-21  3:13   ` Baolu Lu
  0 siblings, 0 replies; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  3:13 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> Most of the core logic before conducting the actual device attach/
> replace operation can be shared with pasid attach/replace. So pass
> @pasid through the device attach/replace helpers to prepare adding
> pasid attach/replace.
> 
> So far the @pasid should only be IOMMU_NO_PASID. No functional change.
> 
> Signed-off-by: Kevin Tian<kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe<jgg@nvidia.com>
> Reviewed-by: Nicolin Chen<nicolinc@nvidia.com>
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 04/18] iommufd/device: Only add reserved_iova in non-pasid path
  2025-03-20 13:47 ` [PATCH v10 04/18] iommufd/device: Only add reserved_iova in non-pasid path Yi Liu
@ 2025-03-21  3:14   ` Baolu Lu
  0 siblings, 0 replies; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  3:14 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> As the pasid is passed through the attach/replace/detach helpers, it is
> necessary to ensure only the non-pasid path adds reserved_iova.
> 
> Reviewed-by: Jason Gunthorpe<jgg@nvidia.com>
> Reviewed-by: Kevin Tian<kevin.tian@intel.com>
> Reviewed-by: Nicolin Chen<nicolinc@nvidia.com>
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 05/18] iommufd/device: Replace idev->igroup with local variable
  2025-03-20 13:47 ` [PATCH v10 05/18] iommufd/device: Replace idev->igroup with local variable Yi Liu
@ 2025-03-21  3:14   ` Baolu Lu
  0 siblings, 0 replies; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  3:14 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> With more use of the fields of igroup, use a local vairable instead of
> using the idev->igroup heavily.
> 
> No functional change expected.
> 
> Reviewed-by: Jason Gunthorpe<jgg@nvidia.com>
> Reviewed-by: Nicolin Chen<nicolinc@nvidia.com>
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group
  2025-03-20 13:47 ` [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group Yi Liu
  2025-03-20 15:36   ` Jason Gunthorpe
  2025-03-20 17:36   ` Nicolin Chen
@ 2025-03-21  3:18   ` Baolu Lu
  2 siblings, 0 replies; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  3:18 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> The existing code detects the first attach by checking the
> igroup->device_list. However, the igroup->hwpt can also be used to detect
> the first attach. In future modifications, it is better to check the
> igroup->hwpt instead of the device_list. To improve readbility and also
> prepare for further modifications on this part, this adds a helper for it.
> 
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>
> ---
> v9 -> v10: It is patch 07 of v9, it's reworked hence renamed as well.
> ---
>   drivers/iommu/iommufd/device.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index ac54d734b819..9db36346328f 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -444,6 +444,13 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup,
>   	return 0;
>   }
>   
> +static inline bool
> +igroup_first_attach(struct iommufd_group *igroup, ioasid_t pasid)
> +{
> +	lockdep_assert_held(&igroup->lock);
> +	return !igroup->hwpt;
> +}

Nit: avoid inline helpers in the C file. Others look good to me.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-20 23:58     ` Yi Liu
  2025-03-21  0:14       ` Yi Liu
@ 2025-03-21  3:21       ` Nicolin Chen
  2025-03-21  4:06         ` Yi Liu
  1 sibling, 1 reply; 78+ messages in thread
From: Nicolin Chen @ 2025-03-21  3:21 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Fri, Mar 21, 2025 at 07:58:51AM +0800, Yi Liu wrote:
> > > +	entry = iommu_make_pasid_array_entry(domain, handle);
> > > +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
> > > +			  XA_ZERO_ENTRY, GFP_KERNEL);
> > > +	if (xa_is_err(curr)) {
> > > +		ret = xa_err(curr);
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	/*
> > > +	 * No domain (with or without handle) attached, hence not
> > > +	 * a replace case.
> > > +	 */
> > > +	if (!curr) {
> > > +		xa_release(&group->pasid_array, pasid);
> > > +		ret = -EINVAL;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Reusing handle is problematic as there are paths that refers
> > > +	 * the handle without lock. To avoid race, reject the callers that
> > > +	 * attempt it.
> > > +	 */
> > > +	if (handle && curr == entry) {
> > > +		WARN_ON(1);
> > > +		ret = -EINVAL;
> > > +		goto out_unlock;
> > > +	}
> > 
> > We rejected !handle and !curr cases. So it should be enough with:
> > 	if (curr == entry) {
> > ?
> 
> we only want to reject the case in which handle is valid and it is
> the same with old handle. If handle is null, I think it is ok to have the
> same domain.

But !handle is already rejected in the 3rd if validation:
+	if (dev_iommu_ops(dev) != domain->owner ||
+	    pasid == IOMMU_NO_PASID || !handle)
+		return -EINVAL;

So, it can't be NULL at this line, right?

Nic

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct
  2025-03-20 13:47 ` [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct Yi Liu
  2025-03-20 15:48   ` Jason Gunthorpe
  2025-03-20 18:03   ` Nicolin Chen
@ 2025-03-21  3:22   ` Baolu Lu
  2 siblings, 0 replies; 78+ messages in thread
From: Baolu Lu @ 2025-03-21  3:22 UTC (permalink / raw)
  To: Yi Liu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 3/20/25 21:47, Yi Liu wrote:
> The igroup->hwpt and igroup->device_list are used to track the hwpt attach
> of a group in the RID path. While the coming PASID path also needs such
> tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into
> attach struct which is allocated per attaching the first device of the
> group and freed per detaching the last device of the group.
> 
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 08/18] iommufd/device: Replace device_list with device_array
  2025-03-21  0:30     ` Yi Liu
@ 2025-03-21  3:25       ` Nicolin Chen
  0 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-21  3:25 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Fri, Mar 21, 2025 at 08:30:10AM +0800, Yi Liu wrote:
> On 2025/3/21 02:38, Nicolin Chen wrote:
> > On Thu, Mar 20, 2025 at 06:47:34AM -0700, Yi Liu wrote:
> > > igroup->attach->device_list is used to track attached device of a group
> > > in the RID path. Such tracking is also needed in the PASID path in order
> > > to share path with the RID path.
> > > 
> > > While there is only one list_head in the iommufd_device. It cannot work
> > > if the device has been attached in both RID path and PASID path. To solve
> > > it, replacing the device_list with an xarray. The attached iommufd_device
> > > is stored in the entry indexed by the idev->obj.id.
> > > 
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > 
> > Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> > 
> > Nit:
> > 
> > >   static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt,
> > > @@ -625,20 +634,27 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
> > >   			rc = -ENOMEM;
> > >   			goto err_unlock;
> > >   		}
> > > -		INIT_LIST_HEAD(&attach->device_list);
> > > +		xa_init(&attach->device_array);
> > >   	}
> > >   	old_hwpt = attach->hwpt;
> > > +	rc = xa_insert(&igroup->attach->device_array, idev->obj.id, XA_ZERO_ENTRY,
> > > +		       GFP_KERNEL);
> > > +	if (rc) {
> > > +		WARN_ON(rc == -EBUSY && !old_hwpt);
> > > +		goto err_free_attach;
> > > +	}
> > > +
> > >   	if (old_hwpt && old_hwpt != hwpt) {
> > >   		rc = -EINVAL;
> > > -		goto err_free_attach;
> > > +		goto err_release_devid;
> > >   	}
> > 
> > Could we reject old_hwpt != hwpt (replace case) before xa_insert?
> 
> reject it after xa_insert() has an extra benefit. It can detect duplicated
> attach on the same device and hwpt. This is supposed to be -EBUSY error.
> This is aligned with __iommu_attach_group(). I think this was missed in the
> before.

I see. --Nic

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-21  3:21       ` Nicolin Chen
@ 2025-03-21  4:06         ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  4:06 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On 2025/3/21 11:21, Nicolin Chen wrote:
> On Fri, Mar 21, 2025 at 07:58:51AM +0800, Yi Liu wrote:
>>>> +	entry = iommu_make_pasid_array_entry(domain, handle);
>>>> +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
>>>> +			  XA_ZERO_ENTRY, GFP_KERNEL);
>>>> +	if (xa_is_err(curr)) {
>>>> +		ret = xa_err(curr);
>>>> +		goto out_unlock;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * No domain (with or without handle) attached, hence not
>>>> +	 * a replace case.
>>>> +	 */
>>>> +	if (!curr) {
>>>> +		xa_release(&group->pasid_array, pasid);
>>>> +		ret = -EINVAL;
>>>> +		goto out_unlock;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Reusing handle is problematic as there are paths that refers
>>>> +	 * the handle without lock. To avoid race, reject the callers that
>>>> +	 * attempt it.
>>>> +	 */
>>>> +	if (handle && curr == entry) {
>>>> +		WARN_ON(1);
>>>> +		ret = -EINVAL;
>>>> +		goto out_unlock;
>>>> +	}
>>>
>>> We rejected !handle and !curr cases. So it should be enough with:
>>> 	if (curr == entry) {
>>> ?

hmmm. yes. I'm a bit struggled if we should allow null handle in this
API. There is really no caller doing it. To be aligned with the
attach API, it looks better to allow it then drop the !handle check

>> we only want to reject the case in which handle is valid and it is
>> the same with old handle. If handle is null, I think it is ok to have the
>> same domain.
> 
> But !handle is already rejected in the 3rd if validation:
> +	if (dev_iommu_ops(dev) != domain->owner ||
> +	    pasid == IOMMU_NO_PASID || !handle)
> +		return -EINVAL;
> 
> So, it can't be NULL at this line, right?

yes. depends on if we want to allow null handle for this API. My
gut feeling is to not allow it, then yes I can simplify the handle
reuse check.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 02/18] iommu: Introduce a replace API for device pasid
  2025-03-21  3:08   ` Baolu Lu
@ 2025-03-21  4:19     ` Yi Liu
  0 siblings, 0 replies; 78+ messages in thread
From: Yi Liu @ 2025-03-21  4:19 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, jgg; +Cc: joro, iommu, nicolinc

On 2025/3/21 11:08, Baolu Lu wrote:
> On 3/20/25 21:47, Yi Liu wrote:
>> Provide a high-level API to allow replacements of one domain with another
>> for specific pasid of a device. This is similar to
>> iommu_replace_group_handle() and it is expected to be used only by IOMMUFD.
>>
>> Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>> v9 - > v10: Convert to the v8 version, added a check to fail the case
>>              in which the passed handle is equal to the existing one.
>> ---
>>   drivers/iommu/iommu-priv.h |   4 ++
>>   drivers/iommu/iommu.c      | 117 +++++++++++++++++++++++++++++++++++--
>>   2 files changed, 117 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>> index b4508423e13b..2985f05d699f 100644
>> --- a/drivers/iommu/iommu-priv.h
>> +++ b/drivers/iommu/iommu-priv.h
>> @@ -43,4 +43,8 @@ void iommu_detach_group_handle(struct iommu_domain 
>> *domain,
>>   int iommu_replace_group_handle(struct iommu_group *group,
>>                      struct iommu_domain *new_domain,
>>                      struct iommu_attach_handle *handle);
>> +
>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>> +                   struct device *dev, ioasid_t pasid,
>> +                   struct iommu_attach_handle *handle);
>>   #endif /* __LINUX_IOMMU_PRIV_H */
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index cffd96e3efd2..07134bb85c00 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -513,6 +513,13 @@ static void iommu_deinit_device(struct device *dev)
>>       dev_iommu_free(dev);
>>   }
>> +static inline struct iommu_domain *pasid_array_entry_to_domain(void *entry)
>> +{
>> +    if (xa_pointer_tag(entry) == IOMMU_PASID_ARRAY_DOMAIN)
>> +        return xa_untag_pointer(entry);
>> +    return ((struct iommu_attach_handle *)xa_untag_pointer(entry))->domain;
>> +}
> 
> It's not good practice to put an inline helper in a C file. Probably
> change it to a regular function or move it to iommu_priv.h?

got it.

>> +
>>   DEFINE_MUTEX(iommu_probe_device_lock);
>>   static int __iommu_probe_device(struct device *dev, struct list_head 
>> *group_list)
>> @@ -3311,14 +3318,15 @@ static void iommu_remove_dev_pasid(struct device 
>> *dev, ioasid_t pasid,
>>   }
>>   static int __iommu_set_group_pasid(struct iommu_domain *domain,
>> -                   struct iommu_group *group, ioasid_t pasid)
>> +                   struct iommu_group *group, ioasid_t pasid,
>> +                   struct iommu_domain *old)
>>   {
>>       struct group_device *device, *last_gdev;
>>       int ret;
>>       for_each_group_device(group, device) {
>>           ret = domain->ops->set_dev_pasid(domain, device->dev,
>> -                         pasid, NULL);
>> +                         pasid, old);
>>           if (ret)
>>               goto err_revert;
>>       }
>> @@ -3330,7 +3338,15 @@ static int __iommu_set_group_pasid(struct 
>> iommu_domain *domain,
>>       for_each_group_device(group, device) {
>>           if (device == last_gdev)
>>               break;
>> -        iommu_remove_dev_pasid(device->dev, pasid, domain);
>> +        /*
>> +         * If no old domain, undo the succeeded devices/pasid.
>> +         * Otherwise, rollback the succeeded devices/pasid to the old
>> +         * domain. And it is a driver bug to fail attaching with a
>> +         * previously good domain.
>> +         */
>> +        if (!old || WARN_ON(old->ops->set_dev_pasid(old, device->dev,
>> +                                pasid, domain)))
>> +            iommu_remove_dev_pasid(device->dev, pasid, domain);
>>       }
>>       return ret;
>>   }
>> @@ -3399,7 +3415,7 @@ int iommu_attach_device_pasid(struct iommu_domain 
>> *domain,
>>       if (ret)
>>           goto out_unlock;
>> -    ret = __iommu_set_group_pasid(domain, group, pasid);
>> +    ret = __iommu_set_group_pasid(domain, group, pasid, NULL);
>>       if (ret) {
>>           xa_release(&group->pasid_array, pasid);
>>           goto out_unlock;
>> @@ -3420,6 +3436,99 @@ int iommu_attach_device_pasid(struct iommu_domain 
>> *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
>> +/**
>> + * iommu_replace_device_pasid - Replace the domain that a pasid
>> + *                              is attached to
>> + * @domain: the new iommu domain
>> + * @dev: the attached device.
>> + * @pasid: the pasid of the device.
>> + * @handle: the attach handle.
>> + *
>> + * This API allows the pasid to switch domains. The @pasid should have been
>> + * attached. Otherwise, this fails. The pasid will keep the old 
>> configuration
>> + * if replacement failed.
>> + *
>> + * Caller should always provide a new handle to avoid race with the paths
>> + * that have lockless reference to handle if it intends to pass a valid 
>> handle.
>> + *
>> + * Return 0 on success, or an error.
>> + */
>> +int iommu_replace_device_pasid(struct iommu_domain *domain,
>> +                   struct device *dev, ioasid_t pasid,
>> +                   struct iommu_attach_handle *handle)
>> +{
>> +    /* Caller must be a probed driver on dev */
>> +    struct iommu_group *group = dev->iommu_group;
>> +    struct iommu_attach_handle *entry;
>> +    struct iommu_domain *curr_domain;
>> +    void *curr;
>> +    int ret;
>> +
>> +    if (!group)
>> +        return -ENODEV;
>> +
>> +    if (!domain->ops->set_dev_pasid)
>> +        return -EOPNOTSUPP;
>> +
>> +    if (dev_iommu_ops(dev) != domain->owner ||
>> +        pasid == IOMMU_NO_PASID || !handle)
>> +        return -EINVAL;
>> +
>> +    mutex_lock(&group->mutex);
>> +    entry = iommu_make_pasid_array_entry(domain, handle);
>> +    curr = xa_cmpxchg(&group->pasid_array, pasid, NULL,
>> +              XA_ZERO_ENTRY, GFP_KERNEL);
>> +    if (xa_is_err(curr)) {
>> +        ret = xa_err(curr);
>> +        goto out_unlock;
>> +    }
>> +
>> +    /*
>> +     * No domain (with or without handle) attached, hence not
>> +     * a replace case.
>> +     */
>> +    if (!curr) {
>> +        xa_release(&group->pasid_array, pasid);
>> +        ret = -EINVAL;
>> +        goto out_unlock;
>> +    }
>> +
>> +    /*
>> +     * Reusing handle is problematic as there are paths that refers
>> +     * the handle without lock. To avoid race, reject the callers that
>> +     * attempt it.
>> +     */
>> +    if (handle && curr == entry) {
>> +        WARN_ON(1);
>> +        ret = -EINVAL;
>> +        goto out_unlock;
>> +    }
> 
> "handle" should never be a NULL. Or not?

yes.

>> +
>> +    curr_domain = pasid_array_entry_to_domain(curr);
>> +    ret = 0;
>> +
>> +    if (curr_domain != domain) {
> 
> Is there a real use case where a caller needs to replace a domain with a
> different attach handle? If not, let start from simple, just don't
> support the same domain case...

The RID path allows it. __iommu_group_set_domain_internal().  I think it's
better to keep the two path aligned. Especially, there is chance to
consolidate the RID and PASID path in the core[1] as well in the future.
There would something to be done before that for sure. :)

[1] https://lore.kernel.org/linux-iommu/20250228151250.GT39591@nvidia.com/

> 
>> +        ret = __iommu_set_group_pasid(domain, group,
>> +                          pasid, curr_domain);
>> +        if (ret)
>> +            goto out_unlock;
>> +    }
>> +
>> +    if (curr != entry) {
>> +        /*
>> +         * The above xa_cmpxchg() reserved the memory, and the
>> +         * group->mutex is held, this cannot fail.
>> +         */
>> +        WARN_ON(xa_is_err(xa_store(&group->pasid_array,
>> +                       pasid, entry, GFP_KERNEL)));
>> +    }
> 
> ... then the code could be simplified like this,
> 
>          curr_domain = pasid_array_entry_to_domain(curr);
>          if (curr == entry || curr_domain == domain) {
>                  ret = -EINVAL;
>                  goto out_unlock;
>          }
> 
>          ret = __iommu_set_group_pasid(domain, group, pasid, curr_domain);
>          if (ret)
>                  goto out_unlock;
> 
>          /*
>           * The above xa_cmpxchg() reserved the memory, and the
>           * group->mutex is held, this cannot fail.
>           */
>          WARN_ON(xa_is_err(xa_store(&group->pasid_array, pasid, entry, 
> GFP_KERNEL)));
> 
> out_unlock:
>          mutex_unlock(&group->mutex);
>          return ret;
> 
> Anything overlooked?
> 
>> +
>> +out_unlock:
>> +    mutex_unlock(&group->mutex);
>> +    return ret;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(iommu_replace_device_pasid, "IOMMUFD_INTERNAL");
>> +
>>   /*
>>    * iommu_detach_device_pasid() - Detach the domain from pasid of device
>>    * @domain: the iommu domain.
> 
> Thanks,
> baolu

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 11/18] iommufd: Support pasid attach/replace
  2025-03-21  1:05         ` Yi Liu
@ 2025-03-21 11:45           ` Jason Gunthorpe
  0 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2025-03-21 11:45 UTC (permalink / raw)
  To: Yi Liu; +Cc: Nicolin Chen, kevin.tian, joro, baolu.lu, iommu

On Fri, Mar 21, 2025 at 09:05:45AM +0800, Yi Liu wrote:
> > Would you please also see if this could be covered by the selftest
> > given that all designed tests didn't catch this?
> 
> I suppose it can be caught by the iommufd_fail_nth. But the fact is not.
> I think the iommufd_auto_response_faults() should be able to catch it
> since it parses the handle if there is pending PRIs. Maybe I just add
> the below line between attach and replace/detach.

It is just a memory leak AFAIC, the tests don't look for those unless
you run them with kmemleak or something

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd pasid attach/detach
  2025-03-21  0:34   ` Nicolin Chen
@ 2025-03-21 15:26     ` Yi Liu
  2025-03-21 17:10       ` Nicolin Chen
  0 siblings, 1 reply; 78+ messages in thread
From: Yi Liu @ 2025-03-21 15:26 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu



On 2025/3/21 08:34, Nicolin Chen wrote:
> On Thu, Mar 20, 2025 at 06:47:44AM -0700, Yi Liu wrote:
> 
>> +TEST_F(iommufd_device_pasid, pasid_attach)
>> +{
>> +	struct iommu_hwpt_selftest data = {
>> +		.iotlb =  IOMMU_TEST_IOTLB_DEFAULT,
>> +	};
>> +	uint32_t nested_hwpt_id[3] = {};
>> +	uint32_t parent_hwpt_id = 0;
>> +	uint32_t fault_id, fault_fd;
>> +	uint32_t s2_hwpt_id = 0;
>> +	uint32_t iopf_hwpt_id;
>> +	uint32_t pasid = 100;
>> +	uint32_t auto_hwpt;
>> +	uint32_t viommu_id;
>> +	bool result;
>> +
>> +	/* Allocate two nested hwpts sharing one common parent hwpt */
>> +	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
>> +			    IOMMU_HWPT_ALLOC_NEST_PARENT,
>> +			    &parent_hwpt_id);
>> +	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
>> +				   IOMMU_HWPT_ALLOC_PASID,
>> +				   &nested_hwpt_id[0],
>> +				   IOMMU_HWPT_DATA_SELFTEST,
>> +				   &data, sizeof(data));
>> +	test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
>> +				   IOMMU_HWPT_ALLOC_PASID,
>> +				   &nested_hwpt_id[1],
>> +				   IOMMU_HWPT_DATA_SELFTEST,
>> +				   &data, sizeof(data));
>> +
>> +	/* Faulte related preparation */
> 
> Fault
> 
>> +	/* Allocate a regular nested hwpt based on viommu */
>> +	test_cmd_viommu_alloc(self->device_id, parent_hwpt_id,
>> +			      IOMMU_VIOMMU_TYPE_SELFTEST,
>> +			      &viommu_id);
>> +	test_cmd_hwpt_alloc_nested(self->device_id, viommu_id,
>> +				   IOMMU_HWPT_ALLOC_PASID,
>> +				   &nested_hwpt_id[2],
>> +				   IOMMU_HWPT_DATA_SELFTEST, &data,
>> +				   sizeof(data));
>> +
>> +	test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
>> +			    IOMMU_HWPT_ALLOC_PASID,
>> +			    &s2_hwpt_id);
>> +
>> +	/* Attach RID to non-pasid compat domain, */
>> +	test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
>> +	/* then attach to pasid should fail */
>> +	test_err_pasid_attach(EINVAL, pasid, s2_hwpt_id, NULL);
>> +
>> +	/* Attach RID to pasid compat domain, */
>> +	test_cmd_mock_domain_replace(self->stdev_id, s2_hwpt_id);
>> +	/* then attach to pasid should succeed, */
>> +	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
>> +	/* but attach RID to non-pasid compat domain should fail now. */
>> +	test_err_mock_domain_replace(EINVAL, self->stdev_id, parent_hwpt_id);
>> +	test_cmd_pasid_detach(pasid);
>> +
>> +	if (!variant->pasid_capable) {
>> +		/*
>> +		 * PASID-compatible domain can be used by non-PASID-capable
>> +		 * device.
>> +		 */
>> +		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, nested_hwpt_id[0]);
>> +		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, self->ioas_id);
>> +		/*
>> +		 * Attach hwpt to pasid#100 of non-PASID-capable device,
>> +		 * should fail, no matter domain is pasid-comapt or not.
>> +		 */
>> +		EXPECT_ERRNO(EINVAL,
>> +			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
>> +						    pasid, parent_hwpt_id, NULL));
>> +		EXPECT_ERRNO(EINVAL,
>> +			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
>> +						    pasid, s2_hwpt_id, NULL));
>> +	}
> 
> It seems that we should test these anyway without a variant?

these are for non-pasid-capable device. without variant, we only create the
pasid-capable device, hence the above test in the if statement are not
necessary.

>> +
>> +	/*
>> +	 * Attach non pasid compat hwpt to pasid-capable device, should
>> +	 * fail, and have null domain.
>> +	 */
>> +	test_err_pasid_attach(EINVAL, pasid, parent_hwpt_id, NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, 0, &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Attach ioas to pasid 100, should succeed, domain should
>> +	 * be valid.
>> +	 */
>> +	test_cmd_pasid_attach(pasid, self->ioas_id, &auto_hwpt);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, auto_hwpt, &result));
>> +	EXPECT_EQ(1, result);
> 
> Hmm, I thought that a non-RID PASID slot could only attach a PASID-
> compatible HWPT. I think I am totally confused now... lol
> 
> Perhaps we need a detailed documentation somewhere, at least as a
> reminder or so?


In this v10, attaching pasid to ioas will allocated a pasid-comapt
hwpt. But this is really messy. So I will make the auto_hwpt always
non-pasid-compat. Hence it's aligned between RID and PASID path.

> 
>> +
>> +	/* Attach to pasid 100 which has been attached, should fail. */
>> +	test_err_pasid_attach(EBUSY, pasid, self->ioas_id, &auto_hwpt);
>> +
>> +	/*
>> +	 * Try attach pasid 100 with another hwpt, should FAIL
>> +	 * as attach does not allow overwrite, use REPLACE instead.
>> +	 */
>> +	test_err_pasid_attach(EBUSY, pasid, nested_hwpt_id[0], NULL);
>> +
>> +	/*
>> +	 * Detach hwpt from pasid 100, and check if the pasid 100
>> +	 * has null domain. Should be done before the next attach.
>> +	 */
>> +	test_cmd_pasid_detach(pasid);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, 0, &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Attach nested hwpt to pasid 100, should succeed, domain
>> +	 * should be valid.
>> +	 */
>> +	test_cmd_pasid_attach(pasid, nested_hwpt_id[0], NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, nested_hwpt_id[0],
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/* Attach to pasid 100 which has been attached, should fail. */
>> +	test_err_pasid_attach(EBUSY, pasid, nested_hwpt_id[0], NULL);
>> +
>> +	/*
>> +	 * Detach hwpt from pasid 100, and check if the pasid 100
>> +	 * has null domain
>> +	 */
>> +	test_cmd_pasid_detach(pasid);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, 0, &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/* Replace tests */
>> +
>> +	pasid = 200;
>> +	/*
>> +	 * Replace pasid 200 without attaching it first, should
>> +	 * fail with -EINVAL.
>> +	 */
>> +	test_err_cmd_pasid_replace(EINVAL, pasid, s2_hwpt_id, NULL);
>> +
>> +	/*
>> +	 * Attach a s2 hwpt to pasid 200, should succeed, domain should
> 
> Attach the ..

got it

>> +	 * be valid.
>> +	 */
>> +	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, s2_hwpt_id,
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Replace pasid 200 with self->ioas_id, should succeed,
>> +	 * and have valid domain.
>> +	 */
>> +	test_cmd_pasid_replace(pasid, self->ioas_id, &auto_hwpt);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, auto_hwpt,
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Replace a nested hwpt for pasid 200, should succeed,
>> +	 * and have valid domain.
>> +	 */
>> +	test_cmd_pasid_replace(pasid, nested_hwpt_id[0], NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, nested_hwpt_id[0],
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Replace with another nested hwpt for pasid 200, should
>> +	 * succeed, and have valid domain.
>> +	 */
>> +	test_cmd_pasid_replace(pasid, nested_hwpt_id[1], NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, nested_hwpt_id[1],
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Detach hwpt from pasid 200, and check if the pasid 200
>> +	 * has null domain.
>> +	 */
>> +	test_cmd_pasid_detach(pasid);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, 0, &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/* Negative Tests for pasid replace, use pasid 1024 */
>> +
>> +	/*
>> +	 * Attach a s2 hwpt to pasid 1024, should succeed, domain should
> 
> Attach the ...

got it

> 
>> +	 * be valid.
>> +	 */
>> +	pasid = 1024;
>> +	test_cmd_pasid_attach(pasid, s2_hwpt_id, NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, s2_hwpt_id,
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Replace pasid 1024 with self->ioas_id, should fail,
>> +	 * but have the old valid domain. This is a designed
>> +	 * negative case, normally replace with self->ioas_id
>> +	 * could succeed.
>> +	 */
>> +	test_err_cmd_pasid_replace(ENOMEM, pasid, self->ioas_id, NULL);
>> +	ASSERT_EQ(0,
>> +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
>> +					      pasid, s2_hwpt_id,
>> +					      &result));
>> +	EXPECT_EQ(1, result);
>> +
>> +	/*
>> +	 * Detach hwpt from pasid 1024, and check if the pasid 1024
>> +	 * has null domain.
>> +	 */
>> +	test_cmd_pasid_detach(pasid);
> 
> The designed "failing" replace does "pasid_1024_attached = false",
> meaning that this detach() isn't necessary?
> 
> Or perhaps the designed "failing" shouldn't set "attached = false"?

hmmm. This naming is a bit tricky. It still requires the user side
to detach 1024. That flag is more for the conveniency of faking error.

>> +	/* Detach the s2_hwpt_id from RID */
>> +	test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
>> +
>> +	test_ioctl_destroy(nested_hwpt_id[0]);
>> +	test_ioctl_destroy(nested_hwpt_id[1]);
>> +	test_ioctl_destroy(nested_hwpt_id[2]);
>> +	test_ioctl_destroy(viommu_id);
>> +	test_ioctl_destroy(parent_hwpt_id);
>> +	test_ioctl_destroy(s2_hwpt_id);
> 
> Once detached, all the destroys can be done automatically?

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd pasid attach/detach
  2025-03-21 15:26     ` Yi Liu
@ 2025-03-21 17:10       ` Nicolin Chen
  0 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-21 17:10 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Fri, Mar 21, 2025 at 11:26:02PM +0800, Yi Liu wrote:
> On 2025/3/21 08:34, Nicolin Chen wrote:
> > > +	if (!variant->pasid_capable) {
> > > +		/*
> > > +		 * PASID-compatible domain can be used by non-PASID-capable
> > > +		 * device.
> > > +		 */
> > > +		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, nested_hwpt_id[0]);
> > > +		test_cmd_mock_domain_replace(self->no_pasid_stdev_id, self->ioas_id);
> > > +		/*
> > > +		 * Attach hwpt to pasid#100 of non-PASID-capable device,
> > > +		 * should fail, no matter domain is pasid-comapt or not.
> > > +		 */
> > > +		EXPECT_ERRNO(EINVAL,
> > > +			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
> > > +						    pasid, parent_hwpt_id, NULL));
> > > +		EXPECT_ERRNO(EINVAL,
> > > +			     _test_cmd_pasid_attach(self->fd, self->no_pasid_stdev_id,
> > > +						    pasid, s2_hwpt_id, NULL));
> > > +	}
> > 
> > It seems that we should test these anyway without a variant?
> 
> these are for non-pasid-capable device. without variant, we only create the
> pasid-capable device, hence the above test in the if statement are not
> necessary.

Yea, I mean, just create two devices unconditionally, one for
pasid and another for non-pasid. I think it's good to have a
non-pasid device to cover these negative pasid attach cases.

> > > +
> > > +	/*
> > > +	 * Attach non pasid compat hwpt to pasid-capable device, should
> > > +	 * fail, and have null domain.
> > > +	 */
> > > +	test_err_pasid_attach(EINVAL, pasid, parent_hwpt_id, NULL);
> > > +	ASSERT_EQ(0,
> > > +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> > > +					      pasid, 0, &result));
> > > +	EXPECT_EQ(1, result);
> > > +
> > > +	/*
> > > +	 * Attach ioas to pasid 100, should succeed, domain should
> > > +	 * be valid.
> > > +	 */
> > > +	test_cmd_pasid_attach(pasid, self->ioas_id, &auto_hwpt);
> > > +	ASSERT_EQ(0,
> > > +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> > > +					      pasid, auto_hwpt, &result));
> > > +	EXPECT_EQ(1, result);
> > 
> > Hmm, I thought that a non-RID PASID slot could only attach a PASID-
> > compatible HWPT. I think I am totally confused now... lol
> > 
> > Perhaps we need a detailed documentation somewhere, at least as a
> > reminder or so?
> 
> In this v10, attaching pasid to ioas will allocated a pasid-comapt
> hwpt. But this is really messy. So I will make the auto_hwpt always
> non-pasid-compat. Hence it's aligned between RID and PASID path.

I see. A pasid slot can only store a pasid-compat HWPT. That's
clear now :)

> > > +	/*
> > > +	 * Replace pasid 1024 with self->ioas_id, should fail,
> > > +	 * but have the old valid domain. This is a designed
> > > +	 * negative case, normally replace with self->ioas_id
> > > +	 * could succeed.
> > > +	 */
> > > +	test_err_cmd_pasid_replace(ENOMEM, pasid, self->ioas_id, NULL);
> > > +	ASSERT_EQ(0,
> > > +		  test_cmd_pasid_check_domain(self->fd, self->stdev_id,
> > > +					      pasid, s2_hwpt_id,
> > > +					      &result));
> > > +	EXPECT_EQ(1, result);
> > > +
> > > +	/*
> > > +	 * Detach hwpt from pasid 1024, and check if the pasid 1024
> > > +	 * has null domain.
> > > +	 */
> > > +	test_cmd_pasid_detach(pasid);
> > 
> > The designed "failing" replace does "pasid_1024_attached = false",
> > meaning that this detach() isn't necessary?
> > 
> > Or perhaps the designed "failing" shouldn't set "attached = false"?
> 
> hmmm. This naming is a bit tricky. It still requires the user side
> to detach 1024. That flag is more for the conveniency of faking error.

Ah, so maybe that "pasid_1024_attached" should be named just
"fake_attach_error?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach
  2025-03-21  1:43     ` Yi Liu
@ 2025-03-21 17:25       ` Nicolin Chen
  0 siblings, 0 replies; 78+ messages in thread
From: Nicolin Chen @ 2025-03-21 17:25 UTC (permalink / raw)
  To: Yi Liu; +Cc: kevin.tian, jgg, joro, baolu.lu, iommu

On Fri, Mar 21, 2025 at 09:43:48AM +0800, Yi Liu wrote:
> On 2025/3/21 07:17, Nicolin Chen wrote:
> > On Thu, Mar 20, 2025 at 06:47:43AM -0700, Yi Liu wrote:
> > > @@ -150,6 +155,32 @@ struct iommu_test_cmd {
> > >   		struct {
> > >   			__u32 dev_id;
> > >   		} trigger_vevent;
> > > +		struct {
> > > +			__u32 pasid;
> > > +			__u32 pt_id;
> > > +			/* @id is stdev_id
> > > +			 * pasid#1024 is for special test, do not use it
> > > +			 * in normal case.
> > > +			 */
> > 
> > How about add on top of these structs:
> > #define IOMMU_TEST_PASID_RESERVED 1024
> 
> yep
> 
> > Also, the coding style of the multi-line comments is a bit odd.
> 
> yeah, but it cannot be finished in one line. And I think it is necessary
> to add it to note how userspace should set the id field and pasid field.

At least multi-line comments in general should be:
 /*
  * abc
  * efg
  */

And I think now we have IOMMU_TEST_PASID_RESERVED, we can move
that line of "pasid" to the macro, so what's left will be just
"@id is stdev_id".

> > > diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
> > > index 691e7a23f300..37c9cd285541 100644
> > > --- a/drivers/iommu/iommufd/selftest.c
> > > +++ b/drivers/iommu/iommufd/selftest.c
> > > @@ -223,10 +223,29 @@ static int mock_domain_nop_attach(struct iommu_domain *domain,
> > >   	return 0;
> > >   }
> > > +static bool pasid_1024_attached;
> > 
> > I recall syzkaller would do multi-threading... We might need a
> > global mutex or something atomic_t?
> 
> maybe move it to mdev as Jason suggested in another email.

Yes

> > > +	 * This is helpful to test the case in which the iommu core needs
> > > +	 * to rollback to old domain due to driver failure.
> > > +	 */
> > > +	if (pasid == 1024) {
> > > +		if (domain->type == IOMMU_DOMAIN_BLOCKED) {
> > > +			pasid_1024_attached = false;
> > > +		} else if (pasid_1024_attached) {
> > > +			pasid_1024_attached = false;
> > > +			// Fake an error to fail the replacement
> > > +			return -ENOMEM;
> > 
> > /* Fake an error to fail the replacement */
> > 
> > While failing this, why does it detach pasid-1024? Maybe some extra
> > comments for what's doing?
> 
> do you mean when does it detach?

So, this after all is a "toggle-to-fake-an-error" thing, right?
Let's make it straightforward then: "fake_attach_error" or so?

> > > +static int iommufd_test_pasid_check_domain(struct iommufd_ucmd *ucmd,
> > > +					   struct iommu_test_cmd *cmd)
> > > +{
> > > +	struct iommu_domain *attached_domain, *expect_domain = NULL;
> > > +	struct iommufd_hw_pagetable *hwpt = NULL;
> > > +	struct iommu_attach_handle *handle;
> > > +	struct selftest_obj *sobj;
> > > +	struct mock_dev *mdev;
> > > +	bool result;
> > > +	int rc = 0;
> > > +
> > > +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
> > > +	if (IS_ERR(sobj))
> > > +		return PTR_ERR(sobj);
> > > +
> > > +	mdev = sobj->idev.mock_dev;
> > > +
> > > +	handle = iommu_attach_handle_get(mdev->dev.iommu_group,
> > > +					 cmd->pasid_check.pasid, 0);
> > > +	if (IS_ERR(handle))
> > > +		attached_domain = NULL;
> > > +	else
> > > +		attached_domain = handle->domain;
> > > +
> > > +	if (cmd->pasid_check.hwpt_id) {
> > > +		hwpt = iommufd_get_hwpt(ucmd, cmd->pasid_check.hwpt_id);
> > > +		if (IS_ERR(hwpt)) {
> > 
> > Do we need cmd->pasid_check.hwpt_id to be optional?
> 
> not intend to make it optional. just wants to use 0 as a special
> value hence no need to retrieve hwpt. Hence be able to check if this
> pasid is attached or not.
> 
> > 
> > > +			rc = PTR_ERR(hwpt);
> > > +			goto out_put_dev;
> > > +		}
> > > +		expect_domain = hwpt->domain;
> > > +	}
> > > +
> > > +	result = (attached_domain == expect_domain) ? 1 : 0;
> > > +	if (copy_to_user(u64_to_user_ptr(cmd->pasid_check.out_result_ptr),
> > > +			 &result, sizeof(result)))
> > > +		rc = -EFAULT;
> > 
> > If we do want it to be optional, we can't unconditionally check the
> > result then?

I have the other reply that I think we may try getting rid of the
"result" and just use the ioctl return value to tell user space
tester whether everything is okay or not, given that all the user
space expects is a succeeded "result".

> > > +static int iommufd_test_pasid_replace(struct iommufd_ucmd *ucmd,
> > > +				      struct iommu_test_cmd *cmd)
> > > +{
> > > +	struct selftest_obj *sobj;
> > > +	int rc;
> > > +
> > > +	sobj = iommufd_test_get_selftest_obj(ucmd->ictx, cmd->id);
> > > +	if (IS_ERR(sobj))
> > > +		return PTR_ERR(sobj);
> > > +
> > > +	rc = iommufd_device_replace(sobj->idev.idev, cmd->pasid_attach.pasid,
> > > +				    &cmd->pasid_attach.pt_id);
> > > +	if (rc)
> > > +		goto out_sobj;
> > > +
> > > +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > > +
> > > +out_sobj:
> > > +	iommufd_put_object(ucmd->ictx, &sobj->obj);
> > > +	return rc;
> > 
> > If iommufd_ucmd_respond fails, do we need to revert like we do in
> > iommufd_test_pasid_attach()?
> 
> It should be reverting to the old hwpt. It lacks of a helper to get the old
> hwpt so far. I can add one since we have pasid_attach array now. But it
> ends up with helpers used only by selftest which is not so positive. Also,
> it requires a mock_dev->lock to sync the attach/replace/detach. Then I
> found iommufd_test_mock_domain_replace() just returns without revert. So
> I chose the simpler way.

Yea, I see that the existing replace() doesn't revert either, so
I think we can be fine with this too. If anything bad happen to
a basic iommufd_ucmd_respond, the test wouldn't probably finish
anyway to provide us an accurate result.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2025-03-21 17:25 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-20 13:47 [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu
2025-03-20 13:47 ` [PATCH v10 01/18] iommu: Require passing new handles to APIs supporting handle Yi Liu
2025-03-20 15:23   ` Jason Gunthorpe
2025-03-20 23:51     ` Yi Liu
2025-03-21  2:35   ` Baolu Lu
2025-03-20 13:47 ` [PATCH v10 02/18] iommu: Introduce a replace API for device pasid Yi Liu
2025-03-20 17:24   ` Nicolin Chen
2025-03-20 23:58     ` Yi Liu
2025-03-21  0:14       ` Yi Liu
2025-03-21  3:21       ` Nicolin Chen
2025-03-21  4:06         ` Yi Liu
2025-03-21  3:08   ` Baolu Lu
2025-03-21  4:19     ` Yi Liu
2025-03-20 13:47 ` [PATCH v10 03/18] iommufd: Pass @pasid through the device attach/replace path Yi Liu
2025-03-21  3:13   ` Baolu Lu
2025-03-20 13:47 ` [PATCH v10 04/18] iommufd/device: Only add reserved_iova in non-pasid path Yi Liu
2025-03-21  3:14   ` Baolu Lu
2025-03-20 13:47 ` [PATCH v10 05/18] iommufd/device: Replace idev->igroup with local variable Yi Liu
2025-03-21  3:14   ` Baolu Lu
2025-03-20 13:47 ` [PATCH v10 06/18] iommufd/device: Add helper to detect the first attach of a group Yi Liu
2025-03-20 15:36   ` Jason Gunthorpe
2025-03-20 17:36   ` Nicolin Chen
2025-03-20 17:51     ` Nicolin Chen
2025-03-21  0:02       ` Yi Liu
2025-03-20 18:04     ` Jason Gunthorpe
2025-03-20 18:24       ` Nicolin Chen
2025-03-21  3:18   ` Baolu Lu
2025-03-20 13:47 ` [PATCH v10 07/18] iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach struct Yi Liu
2025-03-20 15:48   ` Jason Gunthorpe
2025-03-20 18:03   ` Nicolin Chen
2025-03-21  3:22   ` Baolu Lu
2025-03-20 13:47 ` [PATCH v10 08/18] iommufd/device: Replace device_list with device_array Yi Liu
2025-03-20 17:20   ` Jason Gunthorpe
2025-03-21  0:25     ` Yi Liu
2025-03-20 18:38   ` Nicolin Chen
2025-03-21  0:30     ` Yi Liu
2025-03-21  3:25       ` Nicolin Chen
2025-03-20 13:47 ` [PATCH v10 09/18] iommufd/device: Add pasid_attach array to track per-PASID attach Yi Liu
2025-03-20 17:33   ` Jason Gunthorpe
2025-03-20 19:19   ` Nicolin Chen
2025-03-20 19:29     ` Jason Gunthorpe
2025-03-20 20:13       ` Nicolin Chen
2025-03-21  0:15     ` Yi Liu
2025-03-20 13:47 ` [PATCH v10 10/18] iommufd: Enforce PASID-compatible domain in PASID path Yi Liu
2025-03-20 13:47 ` [PATCH v10 11/18] iommufd: Support pasid attach/replace Yi Liu
2025-03-20 20:42   ` Nicolin Chen
2025-03-20 23:29     ` Jason Gunthorpe
2025-03-21  0:31     ` Yi Liu
2025-03-21  0:35       ` Nicolin Chen
2025-03-21  1:05         ` Yi Liu
2025-03-21 11:45           ` Jason Gunthorpe
2025-03-20 13:47 ` [PATCH v10 12/18] iommufd: Enforce PASID-compatible domain for RID Yi Liu
2025-03-20 17:35   ` Jason Gunthorpe
2025-03-20 22:23   ` Nicolin Chen
2025-03-20 23:31     ` Jason Gunthorpe
2025-03-21  0:45       ` Yi Liu
2025-03-21  0:41     ` Yi Liu
2025-03-20 13:47 ` [PATCH v10 13/18] iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support Yi Liu
2025-03-20 13:47 ` [PATCH v10 14/18] iommufd: Allow allocating PASID-compatible domain Yi Liu
2025-03-20 17:51   ` Jason Gunthorpe
2025-03-21  0:52     ` Yi Liu
2025-03-20 22:36   ` Nicolin Chen
2025-03-20 13:47 ` [PATCH v10 15/18] iommufd/selftest: Add set_dev_pasid in mock iommu Yi Liu
2025-03-20 22:48   ` Nicolin Chen
2025-03-20 13:47 ` [PATCH v10 16/18] iommufd/selftest: Add a helper to get test device Yi Liu
2025-03-20 13:47 ` [PATCH v10 17/18] iommufd/selftest: Add test ops to test pasid attach/detach Yi Liu
2025-03-20 23:17   ` Nicolin Chen
2025-03-20 23:33     ` Jason Gunthorpe
2025-03-20 23:42     ` Nicolin Chen
2025-03-21  1:43     ` Yi Liu
2025-03-21 17:25       ` Nicolin Chen
2025-03-20 23:20   ` Nicolin Chen
2025-03-21  1:20     ` Yi Liu
2025-03-20 13:47 ` [PATCH v10 18/18] iommufd/selftest: Add coverage for iommufd " Yi Liu
2025-03-21  0:34   ` Nicolin Chen
2025-03-21 15:26     ` Yi Liu
2025-03-21 17:10       ` Nicolin Chen
2025-03-20 13:59 ` [PATCH v10 00/18] iommufd support pasid attach/replace Yi Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.