linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind
@ 2025-07-09  4:02 Xu Yilun
  2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
                   ` (6 more replies)
  0 siblings, 7 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

It is to solve the lifecycle issue that vdevice may outlive idevice. It
is a prerequisite for TIO, to ensure extra secure configurations (e.g.
TSM Bind/Unbind) against vdevice could be rolled back on idevice unbind,
so that VFIO could still work on the physical device without surprise.

Changelog:
v4:
 - Rebase to iommufd for-next.
 - A new patch to roll back iommufd_object_alloc_ucmd() for vdevice.
 - Fix the mistake trying to xa_destroy ictx->groups on
   iommufd_fops_release().
 - Move 'empty' flag inside destroy loop for iommufd_fops_release().
 - Refactor vdev/idev destroy syncing.
   - Drop the iommufd_vdevice_abort() reentrant idea.
   - A new patch that adds pre_destroy() op.
   - Hold short term reference during the whole vdevice's lifecycle.
   - Wait on short term reference on idev's pre_destroy().
   - Add a 'destroying' flag for idev to prevent new reference after
     pre_destroy().
 - Rephrase/fix some comments.
 - Add review tags.

v3: https://lore.kernel.org/linux-iommu/20250627033809.1730752-1-yilun.xu@linux.intel.com/
 - No bother clean each tombstone in iommufd_fops_release().
 - Drop vdev->ictx initialization fix patch.
 - Optimize control flow in iommufd_device_remove_vdev().
 - Make iommufd_vdevice_abort() reentrant.
 - Call iommufd_vdevice_abort() directly instead of waiting for it.
 - Rephrase/fix some comments.
 - A new patch to remove vdev->dev.
 - A new patch to explicitly skip existing viommu tests for no_iommu.
 - Also skip vdevice tombstone test for no_iommu.
 - Allow me to add SoB from Aneesh.

v2: https://lore.kernel.org/linux-iommu/20250623094946.1714996-1-yilun.xu@linux.intel.com/

v1/rfc: https://lore.kernel.org/linux-iommu/20250610065146.1321816-1-aneesh.kumar@kernel.org/

The series is based on iommufd for-next


Xu Yilun (7):
  iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
  iommufd: Add iommufd_object_tombstone_user() helper
  iommufd: Add a pre_destroy() op for objects
  iommufd: Destroy vdevice on idevice destroy
  iommufd/vdevice: Remove struct device reference from struct vdevice
  iommufd/selftest: Explicitly skip tests for inapplicable variant
  iommufd/selftest: Add coverage for vdevice tombstone

 drivers/iommu/iommufd/device.c          |  51 ++++
 drivers/iommu/iommufd/driver.c          |   4 +-
 drivers/iommu/iommufd/iommufd_private.h |  37 ++-
 drivers/iommu/iommufd/main.c            |  30 +-
 drivers/iommu/iommufd/viommu.c          |  64 +++-
 tools/testing/selftests/iommu/iommufd.c | 388 ++++++++++++------------
 6 files changed, 373 insertions(+), 201 deletions(-)


base-commit: 3e2a9811f6a9cefd310cc33cab73d5435b4a4caa
-- 
2.25.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  7:32   ` Tian, Kevin
                     ` (2 more replies)
  2025-07-09  4:02 ` [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

To solve the vdevice lifecycle issue, future patches make the vdevice
allocation protected by lock. That will make
iommufd_object_alloc_ucmd() not applicable for vdevice. Roll back to
use iommufd_object_alloc() for preparation.

Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/viommu.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 25ac08fbb52a..f556a65f3062 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -144,7 +144,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 		goto out_put_idev;
 	}
 
-	vdev = iommufd_object_alloc_ucmd(ucmd, vdev, IOMMUFD_OBJ_VDEVICE);
+	vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE);
 	if (IS_ERR(vdev)) {
 		rc = PTR_ERR(vdev);
 		goto out_put_idev;
@@ -159,12 +159,18 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
 	if (curr) {
 		rc = xa_err(curr) ?: -EEXIST;
-		goto out_put_idev;
+		goto out_abort;
 	}
 
 	cmd->out_vdevice_id = vdev->obj.id;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_abort;
+	iommufd_object_finalize(ucmd->ictx, &vdev->obj);
+	goto out_put_idev;
 
+out_abort:
+	iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
 out_put_idev:
 	iommufd_put_object(ucmd->ictx, &idev->obj);
 out_put_viommu:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
  2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  7:35   ` Tian, Kevin
                     ` (2 more replies)
  2025-07-09  4:02 ` [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects Xu Yilun
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

Add the iommufd_object_tombstone_user() helper, which allows the caller
to destroy an iommufd object created by userspace.

This is useful on some destroy paths when the kernel caller finds the
object should have been removed by userspace but is still alive. With
this helper, the caller destroys the object but leave the object ID
reserved (so called tombstone). The tombstone prevents repurposing the
object ID without awareness of the original user.

Since this happens for abnormal userspace behavior, for simplicity, the
tombstoned object ID would be permanently leaked until
iommufd_fops_release(). I.e. the original user gets an error when
calling ioctl(IOMMU_DESTROY) on that ID.

The first use case would be to ensure the iommufd_vdevice can't outlive
the associated iommufd_device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
 drivers/iommu/iommufd/main.c            | 24 ++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 4f5e8cd99c96..da1bced8c945 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -188,7 +188,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
 			     struct iommufd_object *obj);
 
 enum {
-	REMOVE_WAIT_SHORTTERM = 1,
+	REMOVE_WAIT_SHORTTERM	= BIT(0),
+	REMOVE_OBJ_TOMBSTONE	= BIT(1),
 };
 int iommufd_object_remove(struct iommufd_ctx *ictx,
 			  struct iommufd_object *to_destroy, u32 id,
@@ -214,6 +215,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
 	WARN_ON(ret);
 }
 
+/*
+ * Similar to iommufd_object_destroy_user(), except that the object ID is left
+ * reserved/tombstoned.
+ */
+static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
+						 struct iommufd_object *obj)
+{
+	int ret;
+
+	ret = iommufd_object_remove(ictx, obj, obj->id,
+				    REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
+
+	/*
+	 * If there is a bug and we couldn't destroy the object then we did put
+	 * back the caller's users refcount and will eventually try to free it
+	 * again during close.
+	 */
+	WARN_ON(ret);
+}
+
 /*
  * The HWPT allocated by autodomains is used in possibly many devices and
  * is automatically destroyed when its refcount reaches zero.
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 778694d7c207..25ab2f41d650 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -216,7 +216,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
 		goto err_xa;
 	}
 
-	xas_store(&xas, NULL);
+	xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
 	if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
 		ictx->vfio_ioas = NULL;
 	xa_unlock(&ictx->objects);
@@ -298,22 +298,42 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
 	 * until the entire list is destroyed. If this can't progress then there
 	 * is some bug related to object refcounting.
 	 */
-	while (!xa_empty(&ictx->objects)) {
+	for (;;) {
 		unsigned int destroyed = 0;
 		unsigned long index;
+		bool empty = true;
 
+		/*
+		 * xa_for_each() will not return tomestones (zeroed entries),
+		 * which prevent the xarray being empty. So use an empty flag
+		 * instead of xa_empty() to indicate all entries are either
+		 * NULLed or tomestoned.
+		 */
 		xa_for_each(&ictx->objects, index, obj) {
+			empty = false;
 			if (!refcount_dec_if_one(&obj->users))
 				continue;
+
 			destroyed++;
 			xa_erase(&ictx->objects, index);
 			iommufd_object_ops[obj->type].destroy(obj);
 			kfree(obj);
 		}
+
+		if (empty)
+			break;
+
 		/* Bug related to users refcount */
 		if (WARN_ON(!destroyed))
 			break;
 	}
+
+	/*
+	 * There may be some tombstones left over from
+	 * iommufd_object_tombstone_user()
+	 */
+	xa_destroy(&ictx->objects);
+
 	WARN_ON(!xa_empty(&ictx->groups));
 
 	mutex_destroy(&ictx->sw_msi_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
  2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
  2025-07-09  4:02 ` [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  7:40   ` Tian, Kevin
  2025-07-11 22:09   ` Nicolin Chen
  2025-07-09  4:02 ` [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy Xu Yilun
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

Add a pre_destroy() op which gives objects a chance to clear their
short term users references before destruction. This op is intended for
external driver created objects (e.g. idev) which does deterministic
destruction.

In order to manage the lifecycle of interrelated objects as well as the
deterministic destruction (e.g. vdev can't outlive idev, and idev
destruction can't fail), short term users references are allowed to
live out of an ioctl execution. An immediate use case is, vdev holds
idev's short term user reference until vdev destruction completes, idev
leverages existing wait_shortterm mechanism to ensure it is destroyed
after vdev.

This extended usage makes the referenced object unable to just wait for
its reference gone. It needs to actively trigger the reference removal,
as well as prevent new references before wait. Should implement these
work in pre_destroy().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 25ab2f41d650..e91a36cc02d0 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -23,6 +23,7 @@
 #include "iommufd_test.h"
 
 struct iommufd_object_ops {
+	void (*pre_destroy)(struct iommufd_object *obj);
 	void (*destroy)(struct iommufd_object *obj);
 	void (*abort)(struct iommufd_object *obj);
 };
@@ -151,6 +152,9 @@ static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
 	if (refcount_dec_and_test(&to_destroy->shortterm_users))
 		return 0;
 
+	if (iommufd_object_ops[to_destroy->type].pre_destroy)
+		iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
+
 	if (wait_event_timeout(ictx->destroy_wait,
 			       refcount_read(&to_destroy->shortterm_users) == 0,
 			       msecs_to_jiffies(60000)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
                   ` (2 preceding siblings ...)
  2025-07-09  4:02 ` [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  8:03   ` Tian, Kevin
                     ` (2 more replies)
  2025-07-09  4:02 ` [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so
that vdev can't outlive idev.

idev represents the physical device bound to iommufd, while the vdev
represents the virtual instance of the physical device in the VM. The
lifecycle of the vdev should not be longer than idev. This doesn't
cause real problem on existing use cases cause vdev doesn't impact the
physical device, only provides virtualization information. But to
extend vdev for Confidential Computing (CC), there are needs to do
secure configuration for the vdev, e.g. TSM Bind/Unbind. These
configurations should be rolled back on idev destroy, or the external
driver (VFIO) functionality may be impact.

The idev is created by external driver so its destruction can't fail.
The idev implements pre_destroy() op to actively remove its associated
vdev before destroying itself. There are 3 cases on idev pre_destroy():

  1. vdev is already destroyed by userspace. No extra handling needed.
  2. vdev is still alive. Use iommufd_object_tombstone_user() to
     destroy vdev and tombstone the vdev ID.
  3. vdev is being destroyed by userspace. The vdev ID is already
     freed, but vdev destroy handler is not completed. This requires
     multi-threads syncing - vdev holds idev's short term users
     reference until vdev destruction completes, idev leverages
     existing wait_shortterm mechanism for syncing.

Originally-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/device.c          | 51 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h | 13 +++++++
 drivers/iommu/iommufd/main.c            |  2 +
 drivers/iommu/iommufd/viommu.c          | 51 +++++++++++++++++++++++--
 4 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e9b6ca47095c..e114094fbdef 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -137,6 +137,57 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
 	}
 }
 
+static void iommufd_device_remove_vdev(struct iommufd_device *idev)
+{
+	struct iommufd_vdevice *vdev;
+
+	mutex_lock(&idev->igroup->lock);
+	/* prevent new references from vdev */
+	idev->destroying = true;
+	/* vdev has been completely destroyed by userspace */
+	if (!idev->vdev)
+		goto out_unlock;
+
+	vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
+	/*
+	 * An ongoing vdev destroy ioctl has removed the vdev from the object
+	 * xarray, but has not finished iommufd_vdevice_destroy() yet as it
+	 * needs the same mutex. We exit the locking then wait on short term
+	 * users for the vdev destruction.
+	 */
+	if (IS_ERR(vdev))
+		goto out_unlock;
+
+	/* Should never happen */
+	if (WARN_ON(vdev != idev->vdev)) {
+		iommufd_put_object(idev->ictx, &vdev->obj);
+		goto out_unlock;
+	}
+
+	/*
+	 * vdev is still alive. Hold a users refcount to prevent racing with
+	 * userspace destruction, then use iommufd_object_tombstone_user() to
+	 * destroy it and leave a tombstone.
+	 */
+	refcount_inc(&vdev->obj.users);
+	iommufd_put_object(idev->ictx, &vdev->obj);
+	mutex_unlock(&idev->igroup->lock);
+	iommufd_object_tombstone_user(idev->ictx, &vdev->obj);
+	return;
+
+out_unlock:
+	mutex_unlock(&idev->igroup->lock);
+}
+
+void iommufd_device_pre_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_device *idev =
+		container_of(obj, struct iommufd_device, obj);
+
+	/* Release the short term users on this */
+	iommufd_device_remove_vdev(idev);
+}
+
 void iommufd_device_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_device *idev =
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index da1bced8c945..62e5dae2a50d 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -475,6 +475,8 @@ struct iommufd_device {
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
+	struct iommufd_vdevice *vdev;
+	bool destroying;
 };
 
 static inline struct iommufd_device *
@@ -485,6 +487,7 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
 			    struct iommufd_device, obj);
 }
 
+void iommufd_device_pre_destroy(struct iommufd_object *obj);
 void iommufd_device_destroy(struct iommufd_object *obj);
 int iommufd_get_hw_info(struct iommufd_ucmd *ucmd);
 
@@ -650,14 +653,24 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
 void iommufd_viommu_destroy(struct iommufd_object *obj);
 int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
 void iommufd_vdevice_destroy(struct iommufd_object *obj);
+void iommufd_vdevice_abort(struct iommufd_object *obj);
 
 struct iommufd_vdevice {
 	struct iommufd_object obj;
 	struct iommufd_viommu *viommu;
 	struct device *dev;
 	u64 id; /* per-vIOMMU virtual ID */
+	struct iommufd_device *idev;
 };
 
+static inline struct iommufd_vdevice *
+iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id)
+{
+	return container_of(iommufd_get_object(ictx, id,
+					       IOMMUFD_OBJ_VDEVICE),
+			    struct iommufd_vdevice, obj);
+}
+
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
 void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e91a36cc02d0..bea000eed14a 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -578,6 +578,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
 		.destroy = iommufd_access_destroy_object,
 	},
 	[IOMMUFD_OBJ_DEVICE] = {
+		.pre_destroy = iommufd_device_pre_destroy,
 		.destroy = iommufd_device_destroy,
 	},
 	[IOMMUFD_OBJ_FAULT] = {
@@ -596,6 +597,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
 	},
 	[IOMMUFD_OBJ_VDEVICE] = {
 		.destroy = iommufd_vdevice_destroy,
+		.abort = iommufd_vdevice_abort,
 	},
 	[IOMMUFD_OBJ_VEVENTQ] = {
 		.destroy = iommufd_veventq_destroy,
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index f556a65f3062..e6009678c8a5 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -104,18 +104,34 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	return rc;
 }
 
-void iommufd_vdevice_destroy(struct iommufd_object *obj)
+void iommufd_vdevice_abort(struct iommufd_object *obj)
 {
 	struct iommufd_vdevice *vdev =
 		container_of(obj, struct iommufd_vdevice, obj);
 	struct iommufd_viommu *viommu = vdev->viommu;
+	struct iommufd_device *idev = vdev->idev;
+
+	lockdep_assert_held(&idev->igroup->lock);
 
 	/* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */
 	xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
 	refcount_dec(&viommu->obj.users);
+	idev->vdev = NULL;
 	put_device(vdev->dev);
 }
 
+void iommufd_vdevice_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_vdevice *vdev =
+		container_of(obj, struct iommufd_vdevice, obj);
+	struct iommufd_device *idev = vdev->idev;
+
+	mutex_lock(&idev->igroup->lock);
+	iommufd_vdevice_abort(obj);
+	mutex_unlock(&idev->igroup->lock);
+	iommufd_put_object(idev->ictx, &idev->obj);
+}
+
 int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_vdevice_alloc *cmd = ucmd->cmd;
@@ -144,10 +160,21 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 		goto out_put_idev;
 	}
 
+	mutex_lock(&idev->igroup->lock);
+	if (idev->destroying) {
+		rc = -ENOENT;
+		goto out_unlock_igroup;
+	}
+
+	if (idev->vdev) {
+		rc = -EEXIST;
+		goto out_unlock_igroup;
+	}
+
 	vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE);
 	if (IS_ERR(vdev)) {
 		rc = PTR_ERR(vdev);
-		goto out_put_idev;
+		goto out_unlock_igroup;
 	}
 
 	vdev->id = virt_id;
@@ -155,6 +182,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	get_device(idev->dev);
 	vdev->viommu = viommu;
 	refcount_inc(&viommu->obj.users);
+	/*
+	 * A short term users reference is held on the idev so long as we have
+	 * the pointer. iommufd_device_pre_destroy() will revoke it before the
+	 * idev real destruction.
+	 */
+	vdev->idev = idev;
+
+	/*
+	 * iommufd_device_destroy() delays until idev->vdev is NULL before
+	 * freeing the idev, which only happens once the vdev is finished
+	 * destruction.
+	 */
+	idev->vdev = vdev;
 
 	curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
 	if (curr) {
@@ -167,12 +207,15 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	if (rc)
 		goto out_abort;
 	iommufd_object_finalize(ucmd->ictx, &vdev->obj);
-	goto out_put_idev;
+	goto out_unlock_igroup;
 
 out_abort:
 	iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
+out_unlock_igroup:
+	mutex_unlock(&idev->igroup->lock);
 out_put_idev:
-	iommufd_put_object(ucmd->ictx, &idev->obj);
+	if (rc)
+		iommufd_put_object(ucmd->ictx, &idev->obj);
 out_put_viommu:
 	iommufd_put_object(ucmd->ictx, &viommu->obj);
 	return rc;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
                   ` (3 preceding siblings ...)
  2025-07-09  4:02 ` [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  8:04   ` Tian, Kevin
  2025-07-12  2:27   ` Nicolin Chen
  2025-07-09  4:02 ` [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
  2025-07-09  4:02 ` [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
  6 siblings, 2 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

Remove struct device *dev from struct vdevice.

The dev pointer is the Plan B for vdevice to reference the physical
device. As now vdev->idev is added without refcounting concern, just
use vdev->idev->dev when needed.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/driver.c          | 4 ++--
 drivers/iommu/iommufd/iommufd_private.h | 1 -
 drivers/iommu/iommufd/viommu.c          | 3 ---
 3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 2fee399a148e..d620691d00a8 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -12,7 +12,7 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
 	lockdep_assert_held(&viommu->vdevs.xa_lock);
 
 	vdev = xa_load(&viommu->vdevs, vdev_id);
-	return vdev ? vdev->dev : NULL;
+	return vdev ? vdev->idev->dev : NULL;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, "IOMMUFD");
 
@@ -29,7 +29,7 @@ int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
 
 	xa_lock(&viommu->vdevs);
 	xa_for_each(&viommu->vdevs, index, vdev) {
-		if (vdev->dev == dev) {
+		if (vdev->idev->dev == dev) {
 			*vdev_id = vdev->id;
 			rc = 0;
 			break;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 62e5dae2a50d..2b9452b8550a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -658,7 +658,6 @@ void iommufd_vdevice_abort(struct iommufd_object *obj);
 struct iommufd_vdevice {
 	struct iommufd_object obj;
 	struct iommufd_viommu *viommu;
-	struct device *dev;
 	u64 id; /* per-vIOMMU virtual ID */
 	struct iommufd_device *idev;
 };
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index e6009678c8a5..702ae248df17 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -117,7 +117,6 @@ void iommufd_vdevice_abort(struct iommufd_object *obj)
 	xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
 	refcount_dec(&viommu->obj.users);
 	idev->vdev = NULL;
-	put_device(vdev->dev);
 }
 
 void iommufd_vdevice_destroy(struct iommufd_object *obj)
@@ -178,8 +177,6 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	}
 
 	vdev->id = virt_id;
-	vdev->dev = idev->dev;
-	get_device(idev->dev);
 	vdev->viommu = viommu;
 	refcount_inc(&viommu->obj.users);
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
                   ` (4 preceding siblings ...)
  2025-07-09  4:02 ` [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  8:06   ` Tian, Kevin
  2025-07-12  2:39   ` Nicolin Chen
  2025-07-09  4:02 ` [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
  6 siblings, 2 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

no_viommu is not applicable for some viommu/vdevice tests. Explicitly
report the skipping, don't do it silently.

Only add the prints. No functional change intended.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 tools/testing/selftests/iommu/iommufd.c | 378 ++++++++++++------------
 1 file changed, 190 insertions(+), 188 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1a8e85afe9aa..4a9b6e3b37fa 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2760,35 +2760,36 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf)
 	uint32_t fault_fd;
 	uint32_t vdev_id;
 
-	if (self->device_id) {
-		test_ioctl_fault_alloc(&fault_id, &fault_fd);
-		test_err_hwpt_alloc_iopf(
-			ENOENT, dev_id, viommu_id, UINT32_MAX,
-			IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
-			IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
-		test_err_hwpt_alloc_iopf(
-			EOPNOTSUPP, dev_id, viommu_id, fault_id,
-			IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id,
-			IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
-		test_cmd_hwpt_alloc_iopf(
-			dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID,
-			&iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
-			sizeof(data));
+	if (!dev_id)
+		SKIP(return, "Skipping test for variant no_viommu");
 
-		/* Must allocate vdevice before attaching to a nested hwpt */
-		test_err_mock_domain_replace(ENOENT, self->stdev_id,
-					     iopf_hwpt_id);
-		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
-		test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
-		EXPECT_ERRNO(EBUSY,
-			     _test_ioctl_destroy(self->fd, iopf_hwpt_id));
-		test_cmd_trigger_iopf(dev_id, fault_fd);
+	test_ioctl_fault_alloc(&fault_id, &fault_fd);
+	test_err_hwpt_alloc_iopf(
+		ENOENT, dev_id, viommu_id, UINT32_MAX,
+		IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+		IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
+	test_err_hwpt_alloc_iopf(
+		EOPNOTSUPP, dev_id, viommu_id, fault_id,
+		IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id,
+		IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
+	test_cmd_hwpt_alloc_iopf(
+		dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID,
+		&iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
+		sizeof(data));
+
+	/* Must allocate vdevice before attaching to a nested hwpt */
+	test_err_mock_domain_replace(ENOENT, self->stdev_id,
+				     iopf_hwpt_id);
+	test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+	test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+	EXPECT_ERRNO(EBUSY,
+		     _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+	test_cmd_trigger_iopf(dev_id, fault_fd);
 
-		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
-		test_ioctl_destroy(iopf_hwpt_id);
-		close(fault_fd);
-		test_ioctl_destroy(fault_id);
-	}
+	test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+	test_ioctl_destroy(iopf_hwpt_id);
+	close(fault_fd);
+	test_ioctl_destroy(fault_id);
 }
 
 TEST_F(iommufd_viommu, vdevice_alloc)
@@ -2849,169 +2850,170 @@ TEST_F(iommufd_viommu, vdevice_cache)
 	uint32_t vdev_id = 0;
 	uint32_t num_inv;
 
-	if (dev_id) {
-		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
-
-		test_cmd_dev_check_cache_all(dev_id,
-					     IOMMU_TEST_DEV_CACHE_DEFAULT);
-
-		/* Check data_type by passing zero-length array */
-		num_inv = 0;
-		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: Invalid data_type */
-		num_inv = 1;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: structure size sanity */
-		num_inv = 1;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs) + 1, &num_inv);
-		assert(!num_inv);
-
-		num_inv = 1;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   1, &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: invalid flag is passed */
-		num_inv = 1;
-		inv_reqs[0].flags = 0xffffffff;
-		inv_reqs[0].vdev_id = 0x99;
-		test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: invalid data_uptr when array is not empty */
-		num_inv = 1;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: invalid entry_len when array is not empty */
-		num_inv = 1;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   0, &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: invalid cache_id */
-		num_inv = 1;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(!num_inv);
-
-		/* Negative test: invalid vdev_id */
-		num_inv = 1;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x9;
-		inv_reqs[0].cache_id = 0;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(!num_inv);
-
-		/*
-		 * Invalidate the 1st cache entry but fail the 2nd request
-		 * due to invalid flags configuration in the 2nd request.
-		 */
-		num_inv = 2;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		inv_reqs[0].cache_id = 0;
-		inv_reqs[1].flags = 0xffffffff;
-		inv_reqs[1].vdev_id = 0x99;
-		inv_reqs[1].cache_id = 1;
-		test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(num_inv == 1);
-		test_cmd_dev_check_cache(dev_id, 0, 0);
-		test_cmd_dev_check_cache(dev_id, 1,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-		test_cmd_dev_check_cache(dev_id, 2,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-		test_cmd_dev_check_cache(dev_id, 3,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+	if (!dev_id)
+		SKIP(return, "Skipping test for variant no_viommu");
+
+	test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+
+	test_cmd_dev_check_cache_all(dev_id,
+				     IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+	/* Check data_type by passing zero-length array */
+	num_inv = 0;
+	test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: Invalid data_type */
+	num_inv = 1;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: structure size sanity */
+	num_inv = 1;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs) + 1, &num_inv);
+	assert(!num_inv);
+
+	num_inv = 1;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   1, &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: invalid flag is passed */
+	num_inv = 1;
+	inv_reqs[0].flags = 0xffffffff;
+	inv_reqs[0].vdev_id = 0x99;
+	test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: invalid data_uptr when array is not empty */
+	num_inv = 1;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: invalid entry_len when array is not empty */
+	num_inv = 1;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   0, &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: invalid cache_id */
+	num_inv = 1;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(!num_inv);
+
+	/* Negative test: invalid vdev_id */
+	num_inv = 1;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x9;
+	inv_reqs[0].cache_id = 0;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(!num_inv);
 
-		/*
-		 * Invalidate the 1st cache entry but fail the 2nd request
-		 * due to invalid cache_id configuration in the 2nd request.
-		 */
-		num_inv = 2;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		inv_reqs[0].cache_id = 0;
-		inv_reqs[1].flags = 0;
-		inv_reqs[1].vdev_id = 0x99;
-		inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
-		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
-					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(num_inv == 1);
-		test_cmd_dev_check_cache(dev_id, 0, 0);
-		test_cmd_dev_check_cache(dev_id, 1,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-		test_cmd_dev_check_cache(dev_id, 2,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-		test_cmd_dev_check_cache(dev_id, 3,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-
-		/* Invalidate the 2nd cache entry and verify */
-		num_inv = 1;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		inv_reqs[0].cache_id = 1;
-		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(num_inv == 1);
-		test_cmd_dev_check_cache(dev_id, 0, 0);
-		test_cmd_dev_check_cache(dev_id, 1, 0);
-		test_cmd_dev_check_cache(dev_id, 2,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-		test_cmd_dev_check_cache(dev_id, 3,
-					 IOMMU_TEST_DEV_CACHE_DEFAULT);
-
-		/* Invalidate the 3rd and 4th cache entries and verify */
-		num_inv = 2;
-		inv_reqs[0].flags = 0;
-		inv_reqs[0].vdev_id = 0x99;
-		inv_reqs[0].cache_id = 2;
-		inv_reqs[1].flags = 0;
-		inv_reqs[1].vdev_id = 0x99;
-		inv_reqs[1].cache_id = 3;
-		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(num_inv == 2);
-		test_cmd_dev_check_cache_all(dev_id, 0);
+	/*
+	 * Invalidate the 1st cache entry but fail the 2nd request
+	 * due to invalid flags configuration in the 2nd request.
+	 */
+	num_inv = 2;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	inv_reqs[0].cache_id = 0;
+	inv_reqs[1].flags = 0xffffffff;
+	inv_reqs[1].vdev_id = 0x99;
+	inv_reqs[1].cache_id = 1;
+	test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(num_inv == 1);
+	test_cmd_dev_check_cache(dev_id, 0, 0);
+	test_cmd_dev_check_cache(dev_id, 1,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+	test_cmd_dev_check_cache(dev_id, 2,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+	test_cmd_dev_check_cache(dev_id, 3,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
 
-		/* Invalidate all cache entries for nested_dev_id[1] and verify */
-		num_inv = 1;
-		inv_reqs[0].vdev_id = 0x99;
-		inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
-		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
-					   sizeof(*inv_reqs), &num_inv);
-		assert(num_inv == 1);
-		test_cmd_dev_check_cache_all(dev_id, 0);
-		test_ioctl_destroy(vdev_id);
-	}
+	/*
+	 * Invalidate the 1st cache entry but fail the 2nd request
+	 * due to invalid cache_id configuration in the 2nd request.
+	 */
+	num_inv = 2;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	inv_reqs[0].cache_id = 0;
+	inv_reqs[1].flags = 0;
+	inv_reqs[1].vdev_id = 0x99;
+	inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+	test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+				   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(num_inv == 1);
+	test_cmd_dev_check_cache(dev_id, 0, 0);
+	test_cmd_dev_check_cache(dev_id, 1,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+	test_cmd_dev_check_cache(dev_id, 2,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+	test_cmd_dev_check_cache(dev_id, 3,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+	/* Invalidate the 2nd cache entry and verify */
+	num_inv = 1;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	inv_reqs[0].cache_id = 1;
+	test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(num_inv == 1);
+	test_cmd_dev_check_cache(dev_id, 0, 0);
+	test_cmd_dev_check_cache(dev_id, 1, 0);
+	test_cmd_dev_check_cache(dev_id, 2,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+	test_cmd_dev_check_cache(dev_id, 3,
+				 IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+	/* Invalidate the 3rd and 4th cache entries and verify */
+	num_inv = 2;
+	inv_reqs[0].flags = 0;
+	inv_reqs[0].vdev_id = 0x99;
+	inv_reqs[0].cache_id = 2;
+	inv_reqs[1].flags = 0;
+	inv_reqs[1].vdev_id = 0x99;
+	inv_reqs[1].cache_id = 3;
+	test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(num_inv == 2);
+	test_cmd_dev_check_cache_all(dev_id, 0);
+
+	/* Invalidate all cache entries for nested_dev_id[1] and verify */
+	num_inv = 1;
+	inv_reqs[0].vdev_id = 0x99;
+	inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
+	test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+				   sizeof(*inv_reqs), &num_inv);
+	assert(num_inv == 1);
+	test_cmd_dev_check_cache_all(dev_id, 0);
+	test_ioctl_destroy(vdev_id);
 }
 
 FIXTURE(iommufd_device_pasid)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone
  2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
                   ` (5 preceding siblings ...)
  2025-07-09  4:02 ` [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
@ 2025-07-09  4:02 ` Xu Yilun
  2025-07-10  8:10   ` Tian, Kevin
  6 siblings, 1 reply; 38+ messages in thread
From: Xu Yilun @ 2025-07-09  4:02 UTC (permalink / raw)
  To: jgg, jgg, kevin.tian, will, aneesh.kumar
  Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
	dan.j.williams, baolu.lu, yilun.xu

This tests the flow to tombstone vdevice when idevice is to be unbound
before vdevice destruction. The expected result is:

 - idevice unbinding tombstones vdevice ID, the ID can't be repurposed
   anymore.
 - Even ioctl(IOMMU_DESTROY) can't free the tombstoned ID.
 - iommufd_fops_release() can still free everything.

Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 tools/testing/selftests/iommu/iommufd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 4a9b6e3b37fa..e1470a7a42cd 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -3016,6 +3016,20 @@ TEST_F(iommufd_viommu, vdevice_cache)
 	test_ioctl_destroy(vdev_id);
 }
 
+TEST_F(iommufd_viommu, vdevice_tombstone)
+{
+	uint32_t viommu_id = self->viommu_id;
+	uint32_t dev_id = self->device_id;
+	uint32_t vdev_id = 0;
+
+	if (!dev_id)
+		SKIP(return, "Skipping test for variant no_viommu");
+
+	test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+	test_ioctl_destroy(self->stdev_id);
+	EXPECT_ERRNO(ENOENT, _test_ioctl_destroy(self->fd, vdev_id));
+}
+
 FIXTURE(iommufd_device_pasid)
 {
 	int fd;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
  2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
@ 2025-07-10  7:32   ` Tian, Kevin
  2025-07-10 18:26   ` Nicolin Chen
  2025-07-11 17:42   ` Jason Gunthorpe
  2 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  7:32 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:02 PM
> 
> To solve the vdevice lifecycle issue, future patches make the vdevice
> allocation protected by lock. That will make
> iommufd_object_alloc_ucmd() not applicable for vdevice. Roll back to
> use iommufd_object_alloc() for preparation.
> 
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-09  4:02 ` [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
@ 2025-07-10  7:35   ` Tian, Kevin
  2025-07-10 19:08   ` Nicolin Chen
  2025-07-11 17:44   ` Jason Gunthorpe
  2 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  7:35 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:02 PM
> 
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnormal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects
  2025-07-09  4:02 ` [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects Xu Yilun
@ 2025-07-10  7:40   ` Tian, Kevin
  2025-07-10 17:15     ` Jason Gunthorpe
  2025-07-11 22:09   ` Nicolin Chen
  1 sibling, 1 reply; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  7:40 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:03 PM
> 
> Add a pre_destroy() op which gives objects a chance to clear their
> short term users references before destruction. This op is intended for
> external driver created objects (e.g. idev) which does deterministic
> destruction.
> 
> In order to manage the lifecycle of interrelated objects as well as the
> deterministic destruction (e.g. vdev can't outlive idev, and idev
> destruction can't fail), short term users references are allowed to
> live out of an ioctl execution. An immediate use case is, vdev holds
> idev's short term user reference until vdev destruction completes, idev
> leverages existing wait_shortterm mechanism to ensure it is destroyed
> after vdev.
> 
> This extended usage makes the referenced object unable to just wait for
> its reference gone. It needs to actively trigger the reference removal,
> as well as prevent new references before wait. Should implement these
> work in pre_destroy().
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

btw is it clearer to rename 'shortterm_users' as 'wait_cnt', as the
meaning now is beyond shortterm and is really about the need of
waiting?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-09  4:02 ` [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy Xu Yilun
@ 2025-07-10  8:03   ` Tian, Kevin
  2025-07-12  2:22   ` Nicolin Chen
  2025-07-12  2:26   ` Nicolin Chen
  2 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  8:03 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:03 PM
> 
> Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so
> that vdev can't outlive idev.
> 
> idev represents the physical device bound to iommufd, while the vdev
> represents the virtual instance of the physical device in the VM. The
> lifecycle of the vdev should not be longer than idev. This doesn't
> cause real problem on existing use cases cause vdev doesn't impact the
> physical device, only provides virtualization information. But to
> extend vdev for Confidential Computing (CC), there are needs to do
> secure configuration for the vdev, e.g. TSM Bind/Unbind. These
> configurations should be rolled back on idev destroy, or the external
> driver (VFIO) functionality may be impact.
> 
> The idev is created by external driver so its destruction can't fail.
> The idev implements pre_destroy() op to actively remove its associated
> vdev before destroying itself. There are 3 cases on idev pre_destroy():
> 
>   1. vdev is already destroyed by userspace. No extra handling needed.
>   2. vdev is still alive. Use iommufd_object_tombstone_user() to
>      destroy vdev and tombstone the vdev ID.
>   3. vdev is being destroyed by userspace. The vdev ID is already
>      freed, but vdev destroy handler is not completed. This requires
>      multi-threads syncing - vdev holds idev's short term users
>      reference until vdev destruction completes, idev leverages
>      existing wait_shortterm mechanism for syncing.

could also mention the introduction of 'idev->destroying'

> 
> Originally-by: Nicolin Chen <nicolinc@nvidia.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

I'd remove this tag as this version has substantial change compared to
last version.

> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

please also update the description about IOMMUFD_CMD_VDEVICE_ALLOC 
in include/uapi/linux/iommufd.h, so the user is aware of the implication.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice
  2025-07-09  4:02 ` [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
@ 2025-07-10  8:04   ` Tian, Kevin
  2025-07-12  2:27   ` Nicolin Chen
  1 sibling, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  8:04 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:03 PM
> 
> Remove struct device *dev from struct vdevice.
> 
> The dev pointer is the Plan B for vdevice to reference the physical
> device. As now vdev->idev is added without refcounting concern, just
> use vdev->idev->dev when needed.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant
  2025-07-09  4:02 ` [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
@ 2025-07-10  8:06   ` Tian, Kevin
  2025-07-12  2:39   ` Nicolin Chen
  1 sibling, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  8:06 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:03 PM
> 
> no_viommu is not applicable for some viommu/vdevice tests. Explicitly
> report the skipping, don't do it silently.
> 
> Only add the prints. No functional change intended.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone
  2025-07-09  4:02 ` [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
@ 2025-07-10  8:10   ` Tian, Kevin
  2025-07-10  9:25     ` Xu Yilun
  0 siblings, 1 reply; 38+ messages in thread
From: Tian, Kevin @ 2025-07-10  8:10 UTC (permalink / raw)
  To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:03 PM
> 
> This tests the flow to tombstone vdevice when idevice is to be unbound
> before vdevice destruction. The expected result is:
> 
>  - idevice unbinding tombstones vdevice ID, the ID can't be repurposed
>    anymore.
>  - Even ioctl(IOMMU_DESTROY) can't free the tombstoned ID.
>  - iommufd_fops_release() can still free everything.

but the test only checks 2) instead of all?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone
  2025-07-10  8:10   ` Tian, Kevin
@ 2025-07-10  9:25     ` Xu Yilun
  2025-07-11  3:12       ` Tian, Kevin
  0 siblings, 1 reply; 38+ messages in thread
From: Xu Yilun @ 2025-07-10  9:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, joro@8bytes.org,
	robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com,
	aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun

On Thu, Jul 10, 2025 at 08:10:34AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Wednesday, July 9, 2025 12:03 PM
> > 
> > This tests the flow to tombstone vdevice when idevice is to be unbound
> > before vdevice destruction. The expected result is:
> > 
> >  - idevice unbinding tombstones vdevice ID, the ID can't be repurposed
> >    anymore.
> >  - Even ioctl(IOMMU_DESTROY) can't free the tombstoned ID.
> >  - iommufd_fops_release() can still free everything.
> 
> but the test only checks 2) instead of all?

It tests 2) & 3), 3) will be executed on FIXTURE_TEARDOWN(iommufd_viommu)

For 1) "the ID can't be repurposed anymore", I don't have a good idea how to
verify it. Allocate 2^32 objects and verify no tombstoned ID is
reused? That's too crazy...   Maybe just mark it untested in the commit
message?

Thanks,
Yilun

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects
  2025-07-10  7:40   ` Tian, Kevin
@ 2025-07-10 17:15     ` Jason Gunthorpe
  2025-07-11  3:16       ` Tian, Kevin
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2025-07-10 17:15 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

On Thu, Jul 10, 2025 at 07:40:58AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Wednesday, July 9, 2025 12:03 PM
> > 
> > Add a pre_destroy() op which gives objects a chance to clear their
> > short term users references before destruction. This op is intended for
> > external driver created objects (e.g. idev) which does deterministic
> > destruction.
> > 
> > In order to manage the lifecycle of interrelated objects as well as the
> > deterministic destruction (e.g. vdev can't outlive idev, and idev
> > destruction can't fail), short term users references are allowed to
> > live out of an ioctl execution. An immediate use case is, vdev holds
> > idev's short term user reference until vdev destruction completes, idev
> > leverages existing wait_shortterm mechanism to ensure it is destroyed
> > after vdev.
> > 
> > This extended usage makes the referenced object unable to just wait for
> > its reference gone. It needs to actively trigger the reference removal,
> > as well as prevent new references before wait. Should implement these
> > work in pre_destroy().
> > 
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> 
> btw is it clearer to rename 'shortterm_users' as 'wait_cnt', as the
> meaning now is beyond shortterm and is really about the need of
> waiting?

Probably so, as a followup change would be fine if we don't need a v5

/*
 * Destroy will sleep and wait for wait_cnt to go to zero. This
 * allows concurrent users of the ID to reliably avoid causing a
 * spurious destroy failure. Incrementing this count should either be
 * short lived or be revoked and blocked during pre_destroy
 */
refcount_t wait_cnt;

?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
  2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
  2025-07-10  7:32   ` Tian, Kevin
@ 2025-07-10 18:26   ` Nicolin Chen
  2025-07-11 17:42   ` Jason Gunthorpe
  2 siblings, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-10 18:26 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:28PM +0800, Xu Yilun wrote:
> To solve the vdevice lifecycle issue, future patches make the vdevice
> allocation protected by lock. That will make
> iommufd_object_alloc_ucmd() not applicable for vdevice. Roll back to
> use iommufd_object_alloc() for preparation.
> 
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-09  4:02 ` [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
  2025-07-10  7:35   ` Tian, Kevin
@ 2025-07-10 19:08   ` Nicolin Chen
  2025-07-11 15:02     ` Xu Yilun
  2025-07-11 17:44   ` Jason Gunthorpe
  2 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2025-07-10 19:08 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:29PM +0800, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnormal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With one nit:

> -	while (!xa_empty(&ictx->objects)) {
> +	for (;;) {
>  		unsigned int destroyed = 0;
>  		unsigned long index;
> +		bool empty = true;
>  
> +		/*
> +		 * xa_for_each() will not return tomestones (zeroed entries),
> +		 * which prevent the xarray being empty. So use an empty flags

Since the first "empty" and the second "empty" are different things,

> +		 * instead of xa_empty() to indicate all entries are either
> +		 * NULLed or tomestoned.
> +		 */

let's write something like this (correcting typos too):

		/*
		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
		 * prevent it returning true, unlike xa_for_each() ignoring the
		 * NULLed entries. So use an empty flag instead of xa_empty() to
		 * indicate all entries are either NULLed or tombstoned.
		 */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone
  2025-07-10  9:25     ` Xu Yilun
@ 2025-07-11  3:12       ` Tian, Kevin
  0 siblings, 0 replies; 38+ messages in thread
From: Tian, Kevin @ 2025-07-11  3:12 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
	aneesh.kumar@kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, joro@8bytes.org,
	robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com,
	aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun

> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Thursday, July 10, 2025 5:25 PM
> 
> On Thu, Jul 10, 2025 at 08:10:34AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Wednesday, July 9, 2025 12:03 PM
> > >
> > > This tests the flow to tombstone vdevice when idevice is to be unbound
> > > before vdevice destruction. The expected result is:
> > >
> > >  - idevice unbinding tombstones vdevice ID, the ID can't be repurposed
> > >    anymore.
> > >  - Even ioctl(IOMMU_DESTROY) can't free the tombstoned ID.
> > >  - iommufd_fops_release() can still free everything.
> >
> > but the test only checks 2) instead of all?
> 
> It tests 2) & 3), 3) will be executed on
> FIXTURE_TEARDOWN(iommufd_viommu)
> 
> For 1) "the ID can't be repurposed anymore", I don't have a good idea how
> to
> verify it. Allocate 2^32 objects and verify no tombstoned ID is
> reused? That's too crazy...   Maybe just mark it untested in the commit
> message?
> 

not required. Just make the commit msg clear.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects
  2025-07-10 17:15     ` Jason Gunthorpe
@ 2025-07-11  3:16       ` Tian, Kevin
  2025-07-11  9:38         ` Xu Yilun
  0 siblings, 1 reply; 38+ messages in thread
From: Tian, Kevin @ 2025-07-11  3:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, July 11, 2025 1:15 AM
> 
> On Thu, Jul 10, 2025 at 07:40:58AM +0000, Tian, Kevin wrote:
> > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > Sent: Wednesday, July 9, 2025 12:03 PM
> > >
> > > Add a pre_destroy() op which gives objects a chance to clear their
> > > short term users references before destruction. This op is intended for
> > > external driver created objects (e.g. idev) which does deterministic
> > > destruction.
> > >
> > > In order to manage the lifecycle of interrelated objects as well as the
> > > deterministic destruction (e.g. vdev can't outlive idev, and idev
> > > destruction can't fail), short term users references are allowed to
> > > live out of an ioctl execution. An immediate use case is, vdev holds
> > > idev's short term user reference until vdev destruction completes, idev
> > > leverages existing wait_shortterm mechanism to ensure it is destroyed
> > > after vdev.
> > >
> > > This extended usage makes the referenced object unable to just wait for
> > > its reference gone. It needs to actively trigger the reference removal,
> > > as well as prevent new references before wait. Should implement these
> > > work in pre_destroy().
> > >
> > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> >
> > btw is it clearer to rename 'shortterm_users' as 'wait_cnt', as the
> > meaning now is beyond shortterm and is really about the need of
> > waiting?
> 
> Probably so, as a followup change would be fine if we don't need a v5
> 
> /*
>  * Destroy will sleep and wait for wait_cnt to go to zero. This
>  * allows concurrent users of the ID to reliably avoid causing a
>  * spurious destroy failure. Incrementing this count should either be
>  * short lived or be revoked and blocked during pre_destroy
>  */
> refcount_t wait_cnt;
> 
> ?

Looks good

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects
  2025-07-11  3:16       ` Tian, Kevin
@ 2025-07-11  9:38         ` Xu Yilun
  0 siblings, 0 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-11  9:38 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, will@kernel.org, aneesh.kumar@kernel.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
	baolu.lu@linux.intel.com, Xu, Yilun

On Fri, Jul 11, 2025 at 03:16:38AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, July 11, 2025 1:15 AM
> > 
> > On Thu, Jul 10, 2025 at 07:40:58AM +0000, Tian, Kevin wrote:
> > > > From: Xu Yilun <yilun.xu@linux.intel.com>
> > > > Sent: Wednesday, July 9, 2025 12:03 PM
> > > >
> > > > Add a pre_destroy() op which gives objects a chance to clear their
> > > > short term users references before destruction. This op is intended for
> > > > external driver created objects (e.g. idev) which does deterministic
> > > > destruction.
> > > >
> > > > In order to manage the lifecycle of interrelated objects as well as the
> > > > deterministic destruction (e.g. vdev can't outlive idev, and idev
> > > > destruction can't fail), short term users references are allowed to
> > > > live out of an ioctl execution. An immediate use case is, vdev holds
> > > > idev's short term user reference until vdev destruction completes, idev
> > > > leverages existing wait_shortterm mechanism to ensure it is destroyed
> > > > after vdev.
> > > >
> > > > This extended usage makes the referenced object unable to just wait for
> > > > its reference gone. It needs to actively trigger the reference removal,
> > > > as well as prevent new references before wait. Should implement these
> > > > work in pre_destroy().
> > > >
> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > >
> > > btw is it clearer to rename 'shortterm_users' as 'wait_cnt', as the
> > > meaning now is beyond shortterm and is really about the need of
> > > waiting?
> > 
> > Probably so, as a followup change would be fine if we don't need a v5

We have some comments to quick fix. Let me make a v5 and include this
change.

> > 
> > /*
> >  * Destroy will sleep and wait for wait_cnt to go to zero. This
> >  * allows concurrent users of the ID to reliably avoid causing a
> >  * spurious destroy failure. Incrementing this count should either be
> >  * short lived or be revoked and blocked during pre_destroy
> >  */
> > refcount_t wait_cnt;
> > 
> > ?
> 
> Looks good

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-10 19:08   ` Nicolin Chen
@ 2025-07-11 15:02     ` Xu Yilun
  2025-07-11 19:39       ` Nicolin Chen
  2025-07-11 21:33       ` Jason Gunthorpe
  0 siblings, 2 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-11 15:02 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

> With one nit:
> 
> > -	while (!xa_empty(&ictx->objects)) {
> > +	for (;;) {
> >  		unsigned int destroyed = 0;
> >  		unsigned long index;
> > +		bool empty = true;
> >  
> > +		/*
> > +		 * xa_for_each() will not return tomestones (zeroed entries),
> > +		 * which prevent the xarray being empty. So use an empty flags
> 
> Since the first "empty" and the second "empty" are different things,
> 
> > +		 * instead of xa_empty() to indicate all entries are either
> > +		 * NULLed or tomestoned.
> > +		 */
> 
> let's write something like this (correcting typos too):
> 
> 		/*
> 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
                                                            ^
> 		 * prevent it returning true, unlike xa_for_each() ignoring the
> 		 * NULLed entries. So use an empty flag instead of xa_empty() to
                   ^
s/NULLed/zeroed, are they?

Thanks,
Yilun

> 		 * indicate all entries are either NULLed or tombstoned.
> 		 */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
  2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
  2025-07-10  7:32   ` Tian, Kevin
  2025-07-10 18:26   ` Nicolin Chen
@ 2025-07-11 17:42   ` Jason Gunthorpe
  2 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2025-07-11 17:42 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro,
	robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:28PM +0800, Xu Yilun wrote:
> To solve the vdevice lifecycle issue, future patches make the vdevice
> allocation protected by lock. That will make
> iommufd_object_alloc_ucmd() not applicable for vdevice. Roll back to
> use iommufd_object_alloc() for preparation.
> 
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  drivers/iommu/iommufd/viommu.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-09  4:02 ` [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
  2025-07-10  7:35   ` Tian, Kevin
  2025-07-10 19:08   ` Nicolin Chen
@ 2025-07-11 17:44   ` Jason Gunthorpe
  2 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2025-07-11 17:44 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro,
	robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:29PM +0800, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnormal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
>  drivers/iommu/iommufd/main.c            | 24 ++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-11 15:02     ` Xu Yilun
@ 2025-07-11 19:39       ` Nicolin Chen
  2025-07-11 21:33       ` Jason Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-11 19:39 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Fri, Jul 11, 2025 at 11:02:53PM +0800, Xu Yilun wrote:
> > With one nit:
> > 
> > > -	while (!xa_empty(&ictx->objects)) {
> > > +	for (;;) {
> > >  		unsigned int destroyed = 0;
> > >  		unsigned long index;
> > > +		bool empty = true;
> > >  
> > > +		/*
> > > +		 * xa_for_each() will not return tomestones (zeroed entries),
> > > +		 * which prevent the xarray being empty. So use an empty flags
> > 
> > Since the first "empty" and the second "empty" are different things,
> > 
> > > +		 * instead of xa_empty() to indicate all entries are either
> > > +		 * NULLed or tomestoned.
> > > +		 */
> > 
> > let's write something like this (correcting typos too):
> > 
> > 		/*
> > 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
>                                                             ^
> > 		 * prevent it returning true, unlike xa_for_each() ignoring the
> > 		 * NULLed entries. So use an empty flag instead of xa_empty() to
>                    ^
> s/NULLed/zeroed, are they?

They are. Just nicer to pick one for consistency.

Typo was "tomestones", any way..

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-11 15:02     ` Xu Yilun
  2025-07-11 19:39       ` Nicolin Chen
@ 2025-07-11 21:33       ` Jason Gunthorpe
  2025-07-11 22:04         ` Nicolin Chen
  1 sibling, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2025-07-11 21:33 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Nicolin Chen, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Fri, Jul 11, 2025 at 11:02:53PM +0800, Xu Yilun wrote:
> > With one nit:
> > 
> > > -	while (!xa_empty(&ictx->objects)) {
> > > +	for (;;) {

Actually just leave this, if xa_empty does work out then the loop
exits quickly, otherwise the break will deal with the tombstones
 it.

> > >  		unsigned int destroyed = 0;
> > >  		unsigned long index;
> > > +		bool empty = true;
> > >  
> > > +		/*
> > > +		 * xa_for_each() will not return tomestones (zeroed entries),
> > > +		 * which prevent the xarray being empty. So use an empty flags
> > 
> > Since the first "empty" and the second "empty" are different things,
> > 
> > > +		 * instead of xa_empty() to indicate all entries are either
> > > +		 * NULLed or tomestoned.
> > > +		 */
> > 
> > let's write something like this (correcting typos too):
> > 
> > 		/*
> > 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
>                                                             ^
> > 		 * prevent it returning true, unlike xa_for_each() ignoring the
> > 		 * NULLed entries. So use an empty flag instead of xa_empty() to
>                    ^
> s/NULLed/zeroed, are they?

Maybe:

 We can't use xa_empty() to end the loop as the tombstones are stored
 as XA_ZERO_ENTRY in the xarray. However xa_for_each() automatically
 converts them to NULL and skips them causing xa_empty() to be kept
 false. Thus once xa_for_each() finds no further !NULL entries the
 loop is done.

?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
  2025-07-11 21:33       ` Jason Gunthorpe
@ 2025-07-11 22:04         ` Nicolin Chen
  0 siblings, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-11 22:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Xu Yilun, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Fri, Jul 11, 2025 at 06:33:23PM -0300, Jason Gunthorpe wrote:
> On Fri, Jul 11, 2025 at 11:02:53PM +0800, Xu Yilun wrote:
> > > let's write something like this (correcting typos too):
> > > 
> > > 		/*
> > > 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
> >                                                             ^
> > > 		 * prevent it returning true, unlike xa_for_each() ignoring the
> > > 		 * NULLed entries. So use an empty flag instead of xa_empty() to
> >                    ^
> > s/NULLed/zeroed, are they?
> 
> Maybe:
> 
>  We can't use xa_empty() to end the loop as the tombstones are stored
>  as XA_ZERO_ENTRY in the xarray. However xa_for_each() automatically
>  converts them to NULL and skips them causing xa_empty() to be kept
>  false. Thus once xa_for_each() finds no further !NULL entries the
>  loop is done.
> 
> ?

+1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects
  2025-07-09  4:02 ` [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects Xu Yilun
  2025-07-10  7:40   ` Tian, Kevin
@ 2025-07-11 22:09   ` Nicolin Chen
  1 sibling, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-11 22:09 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:30PM +0800, Xu Yilun wrote:
> Add a pre_destroy() op which gives objects a chance to clear their
> short term users references before destruction. This op is intended for
> external driver created objects (e.g. idev) which does deterministic
> destruction.
> 
> In order to manage the lifecycle of interrelated objects as well as the
> deterministic destruction (e.g. vdev can't outlive idev, and idev
> destruction can't fail), short term users references are allowed to
> live out of an ioctl execution. An immediate use case is, vdev holds
> idev's short term user reference until vdev destruction completes, idev
> leverages existing wait_shortterm mechanism to ensure it is destroyed
> after vdev.
> 
> This extended usage makes the referenced object unable to just wait for
> its reference gone. It needs to actively trigger the reference removal,
> as well as prevent new references before wait. Should implement these
> work in pre_destroy().
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-09  4:02 ` [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy Xu Yilun
  2025-07-10  8:03   ` Tian, Kevin
@ 2025-07-12  2:22   ` Nicolin Chen
  2025-07-12 17:47     ` Xu Yilun
  2025-07-12  2:26   ` Nicolin Chen
  2 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2025-07-12  2:22 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:31PM +0800, Xu Yilun wrote:
> Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so
> that vdev can't outlive idev.
> 
> idev represents the physical device bound to iommufd, while the vdev
> represents the virtual instance of the physical device in the VM. The
> lifecycle of the vdev should not be longer than idev. This doesn't
> cause real problem on existing use cases cause vdev doesn't impact the
> physical device, only provides virtualization information. But to
> extend vdev for Confidential Computing (CC), there are needs to do
> secure configuration for the vdev, e.g. TSM Bind/Unbind. These
> configurations should be rolled back on idev destroy, or the external
> driver (VFIO) functionality may be impact.
> 
> The idev is created by external driver so its destruction can't fail.
> The idev implements pre_destroy() op to actively remove its associated
> vdev before destroying itself. There are 3 cases on idev pre_destroy():
> 
>   1. vdev is already destroyed by userspace. No extra handling needed.
>   2. vdev is still alive. Use iommufd_object_tombstone_user() to
>      destroy vdev and tombstone the vdev ID.
>   3. vdev is being destroyed by userspace. The vdev ID is already
>      freed, but vdev destroy handler is not completed. This requires
>      multi-threads syncing - vdev holds idev's short term users
>      reference until vdev destruction completes, idev leverages
>      existing wait_shortterm mechanism for syncing.
> 
> Originally-by: Nicolin Chen <nicolinc@nvidia.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With a nit:

> @@ -155,6 +182,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
>  	get_device(idev->dev);
>  	vdev->viommu = viommu;
>  	refcount_inc(&viommu->obj.users);
> +	/*
> +	 * A short term users reference is held on the idev so long as we have
> +	 * the pointer. iommufd_device_pre_destroy() will revoke it before the
> +	 * idev real destruction.
> +	 */
> +	vdev->idev = idev;

[..]

>  out_put_idev:
> -	iommufd_put_object(ucmd->ictx, &idev->obj);
> +	if (rc)
> +		iommufd_put_object(ucmd->ictx, &idev->obj);

So, this actually holds both the short term user and (covertly)
the regular user too.

Though it doesn't hurt to do that, holding the regular one seems
to be useless, because its refcount check is way behind this new
pre_destroy op:

183         if (flags & REMOVE_WAIT_SHORTTERM) {                                                                                                                                                                                                                                                                                                             
184                 ret = iommufd_object_dec_wait_shortterm(ictx, to_destroy);
==>			/* pre_destroy op */
...
214         if (!refcount_dec_if_one(&obj->users)) {
215                 ret = -EBUSY;
216                 goto err_xa;
217         }

So, I think we could just do, exactly reflecting the comments:
+	vdev->idev = idev;
+	refcount_inc(&idev->obj.shortterm_users);

Then, keep the exit patch unchanged.
out_put_idev:
	iommufd_put_object(ucmd->ictx, &idev->obj);

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-09  4:02 ` [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy Xu Yilun
  2025-07-10  8:03   ` Tian, Kevin
  2025-07-12  2:22   ` Nicolin Chen
@ 2025-07-12  2:26   ` Nicolin Chen
  2 siblings, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-12  2:26 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:31PM +0800, Xu Yilun wrote:
>  struct iommufd_vdevice {
>  	struct iommufd_object obj;
>  	struct iommufd_viommu *viommu;
>  	struct device *dev;
>  	u64 id; /* per-vIOMMU virtual ID */
> +	struct iommufd_device *idev;
>  };

And move this next to viommu, please.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice
  2025-07-09  4:02 ` [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
  2025-07-10  8:04   ` Tian, Kevin
@ 2025-07-12  2:27   ` Nicolin Chen
  1 sibling, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-12  2:27 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:32PM +0800, Xu Yilun wrote:
> Remove struct device *dev from struct vdevice.
> 
> The dev pointer is the Plan B for vdevice to reference the physical
> device. As now vdev->idev is added without refcounting concern, just
> use vdev->idev->dev when needed.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
 
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant
  2025-07-09  4:02 ` [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
  2025-07-10  8:06   ` Tian, Kevin
@ 2025-07-12  2:39   ` Nicolin Chen
  1 sibling, 0 replies; 38+ messages in thread
From: Nicolin Chen @ 2025-07-12  2:39 UTC (permalink / raw)
  To: Xu Yilun
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Wed, Jul 09, 2025 at 12:02:33PM +0800, Xu Yilun wrote:
> no_viommu is not applicable for some viommu/vdevice tests. Explicitly
> report the skipping, don't do it silently.
> 
> Only add the prints. No functional change intended.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With one nit:

> +	EXPECT_ERRNO(EBUSY,
> +		     _test_ioctl_destroy(self->fd, iopf_hwpt_id));

Now, this kind of line wrappings aren't necessary any more after
the indentation changes. Would you please run this to the commit:
   git clang-format HEAD^
?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-12  2:22   ` Nicolin Chen
@ 2025-07-12 17:47     ` Xu Yilun
  2025-07-14 16:40       ` Xu Yilun
  0 siblings, 1 reply; 38+ messages in thread
From: Xu Yilun @ 2025-07-12 17:47 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

> [..]
> 
> >  out_put_idev:
> > -	iommufd_put_object(ucmd->ictx, &idev->obj);
> > +	if (rc)
> > +		iommufd_put_object(ucmd->ictx, &idev->obj);
> 
> So, this actually holds both the short term user and (covertly)
> the regular user too.
> 
> Though it doesn't hurt to do that, holding the regular one seems
> to be useless, because its refcount check is way behind this new
> pre_destroy op:
> 
> 183         if (flags & REMOVE_WAIT_SHORTTERM) {                                                                                                                                                                                                                                                                                                             
> 184                 ret = iommufd_object_dec_wait_shortterm(ictx, to_destroy);
> ==>			/* pre_destroy op */
> ...
> 214         if (!refcount_dec_if_one(&obj->users)) {
> 215                 ret = -EBUSY;
> 216                 goto err_xa;
> 217         }
> 
> So, I think we could just do, exactly reflecting the comments:
> +	vdev->idev = idev;
> +	refcount_inc(&idev->obj.shortterm_users);

I think this makes things more clear, we have 3 types of refcounts:

1. shortterm_users + 1, users + 1, temporary refcount, can't be
   revoked by referenced object.
2. users + 1, long term refcount, can't be revoked either.
3. shortterm_users + 1, can be revoked by referenced object.

> 
> Then, keep the exit patch unchanged.
> out_put_idev:
> 	iommufd_put_object(ucmd->ictx, &idev->obj);

Yeah, this is the part I like the most.

Thanks,
Yilun

> 
> Thanks
> Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-12 17:47     ` Xu Yilun
@ 2025-07-14 16:40       ` Xu Yilun
  2025-07-14 16:53         ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Xu Yilun @ 2025-07-14 16:40 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Sun, Jul 13, 2025 at 01:47:45AM +0800, Xu Yilun wrote:
> > [..]
> > 
> > >  out_put_idev:
> > > -	iommufd_put_object(ucmd->ictx, &idev->obj);
> > > +	if (rc)
> > > +		iommufd_put_object(ucmd->ictx, &idev->obj);
> > 
> > So, this actually holds both the short term user and (covertly)
> > the regular user too.
> > 
> > Though it doesn't hurt to do that, holding the regular one seems
> > to be useless, because its refcount check is way behind this new
> > pre_destroy op:
> > 
> > 183         if (flags & REMOVE_WAIT_SHORTTERM) {                                                                                                                                                                                                                                                                                                             
> > 184                 ret = iommufd_object_dec_wait_shortterm(ictx, to_destroy);
> > ==>			/* pre_destroy op */
> > ...
> > 214         if (!refcount_dec_if_one(&obj->users)) {
> > 215                 ret = -EBUSY;
> > 216                 goto err_xa;
> > 217         }
> > 
> > So, I think we could just do, exactly reflecting the comments:
> > +	vdev->idev = idev;
> > +	refcount_inc(&idev->obj.shortterm_users);
> 
> I think this makes things more clear, we have 3 types of refcounts:
> 
> 1. shortterm_users + 1, users + 1, temporary refcount, can't be
>    revoked by referenced object.
> 2. users + 1, long term refcount, can't be revoked either.
> 3. shortterm_users + 1, can be revoked by referenced object.
> 
> > 
> > Then, keep the exit patch unchanged.
> > out_put_idev:
> > 	iommufd_put_object(ucmd->ictx, &idev->obj);
> 
> Yeah, this is the part I like the most.

Sorry, I feel the actual coding is not as good as I image, the

  refcount_dec(&idev->obj.shortterm_users)

can't be called in iommufd_vdevice_abort() cause it may lead to idev
free but we will then do

  mutex_unlock(idev->igroup->lock)

I think the following change makes things more complex...


diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 702ae248df17..bdd5a5227cbf 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -128,7 +128,8 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj)
        mutex_lock(&idev->igroup->lock);
        iommufd_vdevice_abort(obj);
        mutex_unlock(&idev->igroup->lock);
-       iommufd_put_object(idev->ictx, &idev->obj);
+       refcount_dec(&idev->obj.shortterm_users);
+       wake_up_interruptible_all(&vdev->viommu->ictx->destroy_wait);
 }

 int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
@@ -185,6 +186,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
         * idev real destruction.
         */
        vdev->idev = idev;
+       refcount_inc(&idev->obj.shortterm_users);

        /*
         * iommufd_device_destroy() delays until idev->vdev is NULL before
@@ -207,12 +209,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
        goto out_unlock_igroup;

 out_abort:
+       refcount_dec(&idev->obj.shortterm_users);
        iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
 out_unlock_igroup:
        mutex_unlock(&idev->igroup->lock);
 out_put_idev:
-       if (rc)
-               iommufd_put_object(ucmd->ictx, &idev->obj);
+       iommufd_put_object(ucmd->ictx, &idev->obj);
 out_put_viommu:
        iommufd_put_object(ucmd->ictx, &viommu->obj);
        return rc;


Thanks,
Yilun

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-14 16:40       ` Xu Yilun
@ 2025-07-14 16:53         ` Jason Gunthorpe
  2025-07-14 17:34           ` Nicolin Chen
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2025-07-14 16:53 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Nicolin Chen, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Tue, Jul 15, 2025 at 12:40:40AM +0800, Xu Yilun wrote:
> diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
> index 702ae248df17..bdd5a5227cbf 100644
> --- a/drivers/iommu/iommufd/viommu.c
> +++ b/drivers/iommu/iommufd/viommu.c
> @@ -128,7 +128,8 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj)
>         mutex_lock(&idev->igroup->lock);
>         iommufd_vdevice_abort(obj);
>         mutex_unlock(&idev->igroup->lock);
> -       iommufd_put_object(idev->ictx, &idev->obj);
> +       refcount_dec(&idev->obj.shortterm_users);
> +       wake_up_interruptible_all(&vdev->viommu->ictx->destroy_wait);
>  }

I think the main point of keeping both refcounts is to keep the above
hidden in the main functions and out of the object functions.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-14 16:53         ` Jason Gunthorpe
@ 2025-07-14 17:34           ` Nicolin Chen
  2025-07-14 18:03             ` Xu Yilun
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2025-07-14 17:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Xu Yilun, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
	joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
	yilun.xu

On Mon, Jul 14, 2025 at 01:53:46PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 15, 2025 at 12:40:40AM +0800, Xu Yilun wrote:
> > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
> > index 702ae248df17..bdd5a5227cbf 100644
> > --- a/drivers/iommu/iommufd/viommu.c
> > +++ b/drivers/iommu/iommufd/viommu.c
> > @@ -128,7 +128,8 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj)
> >         mutex_lock(&idev->igroup->lock);
> >         iommufd_vdevice_abort(obj);
> >         mutex_unlock(&idev->igroup->lock);
> > -       iommufd_put_object(idev->ictx, &idev->obj);
> > +       refcount_dec(&idev->obj.shortterm_users);
> > +       wake_up_interruptible_all(&vdev->viommu->ictx->destroy_wait);
> >  }
> 
> I think the main point of keeping both refcounts is to keep the above
> hidden in the main functions and out of the object functions.

I see. Maybe we can just update the comments that we are keeping
both refcounts but using shortterm_users only to do the trick.

Otherwise, we'd need an iommufd_lock_obj_shortterm..

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy
  2025-07-14 17:34           ` Nicolin Chen
@ 2025-07-14 18:03             ` Xu Yilun
  0 siblings, 0 replies; 38+ messages in thread
From: Xu Yilun @ 2025-07-14 18:03 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jason Gunthorpe, kevin.tian, will, aneesh.kumar, iommu,
	linux-kernel, joro, robin.murphy, shuah, aik, dan.j.williams,
	baolu.lu, yilun.xu

On Mon, Jul 14, 2025 at 10:34:21AM -0700, Nicolin Chen wrote:
> On Mon, Jul 14, 2025 at 01:53:46PM -0300, Jason Gunthorpe wrote:
> > On Tue, Jul 15, 2025 at 12:40:40AM +0800, Xu Yilun wrote:
> > > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
> > > index 702ae248df17..bdd5a5227cbf 100644
> > > --- a/drivers/iommu/iommufd/viommu.c
> > > +++ b/drivers/iommu/iommufd/viommu.c
> > > @@ -128,7 +128,8 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj)
> > >         mutex_lock(&idev->igroup->lock);
> > >         iommufd_vdevice_abort(obj);
> > >         mutex_unlock(&idev->igroup->lock);
> > > -       iommufd_put_object(idev->ictx, &idev->obj);
> > > +       refcount_dec(&idev->obj.shortterm_users);
> > > +       wake_up_interruptible_all(&vdev->viommu->ictx->destroy_wait);
> > >  }
> > 
> > I think the main point of keeping both refcounts is to keep the above
> > hidden in the main functions and out of the object functions.
> 
> I see. Maybe we can just update the comments that we are keeping
> both refcounts but using shortterm_users only to do the trick.

I don't think we need special comments. The new usage is not a trick.
It follows the existing mechanism of iommufd_get/put_object().

https://lore.kernel.org/linux-iommu/20250707122502.GS1410929@nvidia.com/

Adding a patch to rename shortterm_users -> wait_cnt should make
thing clear.

Thanks,
Yilun

> 
> Otherwise, we'd need an iommufd_lock_obj_shortterm..
> 
> Thanks
> Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-07-14 18:12 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09  4:02 [PATCH v4 0/7] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-07-09  4:02 ` [PATCH v4 1/7] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
2025-07-10  7:32   ` Tian, Kevin
2025-07-10 18:26   ` Nicolin Chen
2025-07-11 17:42   ` Jason Gunthorpe
2025-07-09  4:02 ` [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-07-10  7:35   ` Tian, Kevin
2025-07-10 19:08   ` Nicolin Chen
2025-07-11 15:02     ` Xu Yilun
2025-07-11 19:39       ` Nicolin Chen
2025-07-11 21:33       ` Jason Gunthorpe
2025-07-11 22:04         ` Nicolin Chen
2025-07-11 17:44   ` Jason Gunthorpe
2025-07-09  4:02 ` [PATCH v4 3/7] iommufd: Add a pre_destroy() op for objects Xu Yilun
2025-07-10  7:40   ` Tian, Kevin
2025-07-10 17:15     ` Jason Gunthorpe
2025-07-11  3:16       ` Tian, Kevin
2025-07-11  9:38         ` Xu Yilun
2025-07-11 22:09   ` Nicolin Chen
2025-07-09  4:02 ` [PATCH v4 4/7] iommufd: Destroy vdevice on idevice destroy Xu Yilun
2025-07-10  8:03   ` Tian, Kevin
2025-07-12  2:22   ` Nicolin Chen
2025-07-12 17:47     ` Xu Yilun
2025-07-14 16:40       ` Xu Yilun
2025-07-14 16:53         ` Jason Gunthorpe
2025-07-14 17:34           ` Nicolin Chen
2025-07-14 18:03             ` Xu Yilun
2025-07-12  2:26   ` Nicolin Chen
2025-07-09  4:02 ` [PATCH v4 5/7] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
2025-07-10  8:04   ` Tian, Kevin
2025-07-12  2:27   ` Nicolin Chen
2025-07-09  4:02 ` [PATCH v4 6/7] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
2025-07-10  8:06   ` Tian, Kevin
2025-07-12  2:39   ` Nicolin Chen
2025-07-09  4:02 ` [PATCH v4 7/7] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
2025-07-10  8:10   ` Tian, Kevin
2025-07-10  9:25     ` Xu Yilun
2025-07-11  3:12       ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).