* [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind
@ 2025-06-27 3:38 Xu Yilun
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
` (4 more replies)
0 siblings, 5 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
It is to solve the lifecycle issue that vdevice may outlive idevice. It
is a prerequisite for TIO, to ensure extra secure configurations (e.g.
TSM Bind/Unbind) against vdevice could be rolled back on idevice unbind,
so that VFIO could still work on the physical device without surprise.
Changelog:
v3:
- No bother clean each tombstone in iommufd_fops_release().
- Drop vdev->ictx initialization fix patch.
- Optimize control flow in iommufd_device_remove_vdev().
- Make iommufd_vdevice_abort() reentrant.
- Call iommufd_vdevice_abort() directly instead of waiting for it.
- Rephrase/fix some comments.
- A new patch to remove vdev->dev.
- A new patch to explicitly skip existing viommu tests for no_iommu.
- Also skip vdevice tombstone test for no_iommu.
- Allow me to add SoB from Aneesh.
v2: https://lore.kernel.org/linux-iommu/20250623094946.1714996-1-yilun.xu@linux.intel.com/
v1/rfc: https://lore.kernel.org/linux-iommu/20250610065146.1321816-1-aneesh.kumar@kernel.org/
The series is based on v6.16-rc1
Xu Yilun (5):
iommufd: Add iommufd_object_tombstone_user() helper
iommufd: Destroy vdevice on idevice destroy
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/selftest: Add coverage for vdevice tombstone
drivers/iommu/iommufd/device.c | 42 +++
drivers/iommu/iommufd/driver.c | 4 +-
drivers/iommu/iommufd/iommufd_private.h | 35 ++-
drivers/iommu/iommufd/main.c | 20 +-
drivers/iommu/iommufd/viommu.c | 47 ++-
tools/testing/selftests/iommu/iommufd.c | 388 ++++++++++++------------
6 files changed, 337 insertions(+), 199 deletions(-)
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
--
2.25.1
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun
@ 2025-06-27 3:38 ` Xu Yilun
2025-06-30 3:08 ` Baolu Lu
` (2 more replies)
2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun
` (3 subsequent siblings)
4 siblings, 3 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Add the iommufd_object_tombstone_user() helper, which allows the caller
to destroy an iommufd object created by userspace.
This is useful on some destroy paths when the kernel caller finds the
object should have been removed by userspace but is still alive. With
this helper, the caller destroys the object but leave the object ID
reserved (so called tombstone). The tombstone prevents repurposing the
object ID without awareness of the original user.
Since this happens for abnomal userspace behavior, for simplicity, the
tombstoned object ID would be permanently leaked until
iommufd_fops_release(). I.e. the original user gets an error when
calling ioctl(IOMMU_DESTROY) on that ID.
The first use case would be to ensure the iommufd_vdevice can't outlive
the associated iommufd_device.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
drivers/iommu/iommufd/main.c | 19 ++++++++++++++++---
2 files changed, 38 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 9ccc83341f32..fbc9ef78d81f 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
struct iommufd_object *obj);
enum {
- REMOVE_WAIT_SHORTTERM = 1,
+ REMOVE_WAIT_SHORTTERM = BIT(0),
+ REMOVE_OBJ_TOMBSTONE = BIT(1),
};
int iommufd_object_remove(struct iommufd_ctx *ictx,
struct iommufd_object *to_destroy, u32 id,
@@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
WARN_ON(ret);
}
+/*
+ * Similar to iommufd_object_destroy_user(), except that the object ID is left
+ * reserved/tombstoned.
+ */
+static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
+ struct iommufd_object *obj)
+{
+ int ret;
+
+ ret = iommufd_object_remove(ictx, obj, obj->id,
+ REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
+
+ /*
+ * If there is a bug and we couldn't destroy the object then we did put
+ * back the caller's users refcount and will eventually try to free it
+ * again during close.
+ */
+ WARN_ON(ret);
+}
+
/*
* The HWPT allocated by autodomains is used in possibly many devices and
* is automatically destroyed when its refcount reaches zero.
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3df468f64e7d..620923669b42 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
goto err_xa;
}
- xas_store(&xas, NULL);
+ xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
ictx->vfio_ioas = NULL;
xa_unlock(&ictx->objects);
@@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
struct iommufd_sw_msi_map *next;
struct iommufd_sw_msi_map *cur;
struct iommufd_object *obj;
+ bool empty;
/*
* The objects in the xarray form a graph of "users" counts, and we have
@@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
* until the entire list is destroyed. If this can't progress then there
* is some bug related to object refcounting.
*/
- while (!xa_empty(&ictx->objects)) {
+ for (;;) {
unsigned int destroyed = 0;
unsigned long index;
+ empty = true;
xa_for_each(&ictx->objects, index, obj) {
+ empty = false;
if (!refcount_dec_if_one(&obj->users))
continue;
+
destroyed++;
xa_erase(&ictx->objects, index);
iommufd_object_ops[obj->type].destroy(obj);
kfree(obj);
}
+
+ if (empty)
+ break;
+
/* Bug related to users refcount */
if (WARN_ON(!destroyed))
break;
}
- WARN_ON(!xa_empty(&ictx->groups));
+
+ /*
+ * There may be some tombstones left over from
+ * iommufd_object_tombstone_user()
+ */
+ xa_destroy(&ictx->groups);
mutex_destroy(&ictx->sw_msi_lock);
list_for_each_entry_safe(cur, next, &ictx->sw_msi_list, sw_msi_item)
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
@ 2025-06-27 3:38 ` Xu Yilun
2025-06-30 5:08 ` Baolu Lu
` (2 more replies)
2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
` (2 subsequent siblings)
4 siblings, 3 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destroy so
that vdev can't outlive idev.
iommufd_device (idev) represents the physical device bound to iommufd,
while the iommufd_vdevice (vdev) represents the virtual instance of the
physical device in the VM. The lifecycle of the vdev should not be
longer than idev. This doesn't cause real problem on existing use cases
cause vdev doesn't impact the physical device, only provides
virtualization information. But to extend vdev for Confidential
Computing (CC), there are needs to do secure configuration for the vdev,
e.g. TSM Bind/Unbind. These configurations should be rolled back on idev
destroy, or the external driver (VFIO) functionality may be impact.
Building the association between idev & vdev requires the two objects
pointing each other, but not referencing each other. This requires
proper locking. This is done by reviving some of Nicolin's patch [1].
There are 3 cases on idev destroy:
1. vdev is already destroyed by userspace. No extra handling needed.
2. vdev is still alive. Use iommufd_object_tombstone_user() to
destroy vdev and tombstone the vdev ID.
3. vdev is being destroyed by userspace. The vdev ID is already freed,
but vdev destroy handler is not completed. Destroy the vdev
immediately. To solve the racing with userspace destruction, make
iommufd_vdevice_abort() reentrant.
[1]: https://lore.kernel.org/all/53025c827c44d68edb6469bfd940a8e8bc6147a5.1729897278.git.nicolinc@nvidia.com/
Originally-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/device.c | 42 +++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 11 +++++++
drivers/iommu/iommufd/main.c | 1 +
drivers/iommu/iommufd/viommu.c | 44 +++++++++++++++++++++++--
4 files changed, 95 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 86244403b532..0937d4989185 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -137,11 +137,53 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
}
}
+static void iommufd_device_remove_vdev(struct iommufd_device *idev)
+{
+ struct iommufd_vdevice *vdev;
+
+ mutex_lock(&idev->igroup->lock);
+ /* vdev has been completely destroyed by userspace */
+ if (!idev->vdev)
+ goto out_unlock;
+
+ vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
+ if (IS_ERR(vdev)) {
+ /*
+ * vdev is removed from xarray by userspace, but is not
+ * destroyed/freed. Since iommufd_vdevice_abort() is reentrant,
+ * safe to destroy vdev here.
+ */
+ iommufd_vdevice_abort(&idev->vdev->obj);
+ goto out_unlock;
+ }
+
+ /* Should never happen */
+ if (WARN_ON(vdev != idev->vdev)) {
+ iommufd_put_object(idev->ictx, &vdev->obj);
+ goto out_unlock;
+ }
+
+ /*
+ * vdev is still alive. Hold a users refcount to prevent racing with
+ * userspace destruction, then use iommufd_object_tombstone_user() to
+ * destroy it and leave a tombstone.
+ */
+ refcount_inc(&vdev->obj.users);
+ iommufd_put_object(idev->ictx, &vdev->obj);
+ mutex_unlock(&idev->igroup->lock);
+ iommufd_object_tombstone_user(idev->ictx, &vdev->obj);
+ return;
+
+out_unlock:
+ mutex_unlock(&idev->igroup->lock);
+}
+
void iommufd_device_destroy(struct iommufd_object *obj)
{
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
+ iommufd_device_remove_vdev(idev);
iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index fbc9ef78d81f..f58aa4439c53 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -446,6 +446,7 @@ struct iommufd_device {
/* always the physical device */
struct device *dev;
bool enforce_cache_coherency;
+ struct iommufd_vdevice *vdev;
};
static inline struct iommufd_device *
@@ -621,6 +622,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
void iommufd_viommu_destroy(struct iommufd_object *obj);
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
void iommufd_vdevice_destroy(struct iommufd_object *obj);
+void iommufd_vdevice_abort(struct iommufd_object *obj);
struct iommufd_vdevice {
struct iommufd_object obj;
@@ -628,8 +630,17 @@ struct iommufd_vdevice {
struct iommufd_viommu *viommu;
struct device *dev;
u64 id; /* per-vIOMMU virtual ID */
+ struct iommufd_device *idev;
};
+static inline struct iommufd_vdevice *
+iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id)
+{
+ return container_of(iommufd_get_object(ictx, id,
+ IOMMUFD_OBJ_VDEVICE),
+ struct iommufd_vdevice, obj);
+}
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 620923669b42..64731b4fdbdf 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -529,6 +529,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
},
[IOMMUFD_OBJ_VDEVICE] = {
.destroy = iommufd_vdevice_destroy,
+ .abort = iommufd_vdevice_abort,
},
[IOMMUFD_OBJ_VEVENTQ] = {
.destroy = iommufd_veventq_destroy,
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 01df2b985f02..632d1d7b8fd8 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -84,16 +84,38 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
return rc;
}
-void iommufd_vdevice_destroy(struct iommufd_object *obj)
+void iommufd_vdevice_abort(struct iommufd_object *obj)
{
struct iommufd_vdevice *vdev =
container_of(obj, struct iommufd_vdevice, obj);
struct iommufd_viommu *viommu = vdev->viommu;
+ struct iommufd_device *idev = vdev->idev;
+
+ lockdep_assert_held(&idev->igroup->lock);
+
+ /*
+ * iommufd_vdevice_abort() could be reentrant, by
+ * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only once.
+ */
+ if (!viommu)
+ return;
/* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */
xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
refcount_dec(&viommu->obj.users);
put_device(vdev->dev);
+ vdev->viommu = NULL;
+ idev->vdev = NULL;
+}
+
+void iommufd_vdevice_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_vdevice *vdev =
+ container_of(obj, struct iommufd_vdevice, obj);
+
+ mutex_lock(&vdev->idev->igroup->lock);
+ iommufd_vdevice_abort(obj);
+ mutex_unlock(&vdev->idev->igroup->lock);
}
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
@@ -124,10 +146,16 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
goto out_put_idev;
}
+ mutex_lock(&idev->igroup->lock);
+ if (idev->vdev) {
+ rc = -EEXIST;
+ goto out_unlock_igroup;
+ }
+
vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE);
if (IS_ERR(vdev)) {
rc = PTR_ERR(vdev);
- goto out_put_idev;
+ goto out_unlock_igroup;
}
vdev->id = virt_id;
@@ -135,6 +163,14 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
get_device(idev->dev);
vdev->viommu = viommu;
refcount_inc(&viommu->obj.users);
+ /*
+ * iommufd_device_destroy() waits until idev->vdev is NULL before
+ * freeing the idev, which only happens once the vdev is finished
+ * destruction. Thus we do not need refcounting on either idev->vdev or
+ * vdev->idev.
+ */
+ vdev->idev = idev;
+ idev->vdev = vdev;
curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
if (curr) {
@@ -147,10 +183,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
if (rc)
goto out_abort;
iommufd_object_finalize(ucmd->ictx, &vdev->obj);
- goto out_put_idev;
+ goto out_unlock_igroup;
out_abort:
iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
+out_unlock_igroup:
+ mutex_unlock(&idev->igroup->lock);
out_put_idev:
iommufd_put_object(ucmd->ictx, &idev->obj);
out_put_viommu:
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice
2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun
@ 2025-06-27 3:38 ` Xu Yilun
2025-06-30 5:11 ` Baolu Lu
2025-07-04 15:06 ` Jason Gunthorpe
2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
2025-06-27 3:38 ` [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
4 siblings, 2 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Remove struct device *dev from struct vdevice.
The dev pointer is the Plan B for vdevice to reference the physical
device. As now vdev->idev is added without refcounting concern, just
use vdev->idev->dev when needed.
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/driver.c | 4 ++--
drivers/iommu/iommufd/iommufd_private.h | 1 -
drivers/iommu/iommufd/viommu.c | 3 ---
3 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 922cd1fe7ec2..942d402bba36 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -45,7 +45,7 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
lockdep_assert_held(&viommu->vdevs.xa_lock);
vdev = xa_load(&viommu->vdevs, vdev_id);
- return vdev ? vdev->dev : NULL;
+ return vdev ? vdev->idev->dev : NULL;
}
EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, "IOMMUFD");
@@ -62,7 +62,7 @@ int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
xa_lock(&viommu->vdevs);
xa_for_each(&viommu->vdevs, index, vdev) {
- if (vdev->dev == dev) {
+ if (vdev->idev->dev == dev) {
*vdev_id = vdev->id;
rc = 0;
break;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index f58aa4439c53..3700193471db 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -628,7 +628,6 @@ struct iommufd_vdevice {
struct iommufd_object obj;
struct iommufd_ctx *ictx;
struct iommufd_viommu *viommu;
- struct device *dev;
u64 id; /* per-vIOMMU virtual ID */
struct iommufd_device *idev;
};
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 632d1d7b8fd8..452a7a24d738 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -103,7 +103,6 @@ void iommufd_vdevice_abort(struct iommufd_object *obj)
/* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */
xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
refcount_dec(&viommu->obj.users);
- put_device(vdev->dev);
vdev->viommu = NULL;
idev->vdev = NULL;
}
@@ -159,8 +158,6 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
}
vdev->id = virt_id;
- vdev->dev = idev->dev;
- get_device(idev->dev);
vdev->viommu = viommu;
refcount_inc(&viommu->obj.users);
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant
2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun
` (2 preceding siblings ...)
2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
@ 2025-06-27 3:38 ` Xu Yilun
2025-07-04 15:07 ` Jason Gunthorpe
2025-06-27 3:38 ` [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
4 siblings, 1 reply; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
no_viommu is not applicable for some viommu/vdevice tests. Explicitly
report the skipping, don't do it silently.
Only add the prints. No functional change intended.
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
tools/testing/selftests/iommu/iommufd.c | 378 ++++++++++++------------
1 file changed, 190 insertions(+), 188 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1a8e85afe9aa..4a9b6e3b37fa 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2760,35 +2760,36 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf)
uint32_t fault_fd;
uint32_t vdev_id;
- if (self->device_id) {
- test_ioctl_fault_alloc(&fault_id, &fault_fd);
- test_err_hwpt_alloc_iopf(
- ENOENT, dev_id, viommu_id, UINT32_MAX,
- IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
- IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
- test_err_hwpt_alloc_iopf(
- EOPNOTSUPP, dev_id, viommu_id, fault_id,
- IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id,
- IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
- test_cmd_hwpt_alloc_iopf(
- dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID,
- &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
- sizeof(data));
+ if (!dev_id)
+ SKIP(return, "Skipping test for variant no_viommu");
- /* Must allocate vdevice before attaching to a nested hwpt */
- test_err_mock_domain_replace(ENOENT, self->stdev_id,
- iopf_hwpt_id);
- test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
- test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
- EXPECT_ERRNO(EBUSY,
- _test_ioctl_destroy(self->fd, iopf_hwpt_id));
- test_cmd_trigger_iopf(dev_id, fault_fd);
+ test_ioctl_fault_alloc(&fault_id, &fault_fd);
+ test_err_hwpt_alloc_iopf(
+ ENOENT, dev_id, viommu_id, UINT32_MAX,
+ IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+ IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
+ test_err_hwpt_alloc_iopf(
+ EOPNOTSUPP, dev_id, viommu_id, fault_id,
+ IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id,
+ IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
+ test_cmd_hwpt_alloc_iopf(
+ dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID,
+ &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
+ sizeof(data));
+
+ /* Must allocate vdevice before attaching to a nested hwpt */
+ test_err_mock_domain_replace(ENOENT, self->stdev_id,
+ iopf_hwpt_id);
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+ test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+ EXPECT_ERRNO(EBUSY,
+ _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+ test_cmd_trigger_iopf(dev_id, fault_fd);
- test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
- test_ioctl_destroy(iopf_hwpt_id);
- close(fault_fd);
- test_ioctl_destroy(fault_id);
- }
+ test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+ test_ioctl_destroy(iopf_hwpt_id);
+ close(fault_fd);
+ test_ioctl_destroy(fault_id);
}
TEST_F(iommufd_viommu, vdevice_alloc)
@@ -2849,169 +2850,170 @@ TEST_F(iommufd_viommu, vdevice_cache)
uint32_t vdev_id = 0;
uint32_t num_inv;
- if (dev_id) {
- test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
-
- test_cmd_dev_check_cache_all(dev_id,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /* Check data_type by passing zero-length array */
- num_inv = 0;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: Invalid data_type */
- num_inv = 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: structure size sanity */
- num_inv = 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs) + 1, &num_inv);
- assert(!num_inv);
-
- num_inv = 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- 1, &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid flag is passed */
- num_inv = 1;
- inv_reqs[0].flags = 0xffffffff;
- inv_reqs[0].vdev_id = 0x99;
- test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid data_uptr when array is not empty */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid entry_len when array is not empty */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- 0, &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid cache_id */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid vdev_id */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x9;
- inv_reqs[0].cache_id = 0;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /*
- * Invalidate the 1st cache entry but fail the 2nd request
- * due to invalid flags configuration in the 2nd request.
- */
- num_inv = 2;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 0;
- inv_reqs[1].flags = 0xffffffff;
- inv_reqs[1].vdev_id = 0x99;
- inv_reqs[1].cache_id = 1;
- test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache(dev_id, 0, 0);
- test_cmd_dev_check_cache(dev_id, 1,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 2,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 3,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
+ if (!dev_id)
+ SKIP(return, "Skipping test for variant no_viommu");
+
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+
+ test_cmd_dev_check_cache_all(dev_id,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+ /* Check data_type by passing zero-length array */
+ num_inv = 0;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: Invalid data_type */
+ num_inv = 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: structure size sanity */
+ num_inv = 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs) + 1, &num_inv);
+ assert(!num_inv);
+
+ num_inv = 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ 1, &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid flag is passed */
+ num_inv = 1;
+ inv_reqs[0].flags = 0xffffffff;
+ inv_reqs[0].vdev_id = 0x99;
+ test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid data_uptr when array is not empty */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid entry_len when array is not empty */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ 0, &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid cache_id */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid vdev_id */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x9;
+ inv_reqs[0].cache_id = 0;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
- /*
- * Invalidate the 1st cache entry but fail the 2nd request
- * due to invalid cache_id configuration in the 2nd request.
- */
- num_inv = 2;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 0;
- inv_reqs[1].flags = 0;
- inv_reqs[1].vdev_id = 0x99;
- inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache(dev_id, 0, 0);
- test_cmd_dev_check_cache(dev_id, 1,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 2,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 3,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /* Invalidate the 2nd cache entry and verify */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 1;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache(dev_id, 0, 0);
- test_cmd_dev_check_cache(dev_id, 1, 0);
- test_cmd_dev_check_cache(dev_id, 2,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 3,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /* Invalidate the 3rd and 4th cache entries and verify */
- num_inv = 2;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 2;
- inv_reqs[1].flags = 0;
- inv_reqs[1].vdev_id = 0x99;
- inv_reqs[1].cache_id = 3;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 2);
- test_cmd_dev_check_cache_all(dev_id, 0);
+ /*
+ * Invalidate the 1st cache entry but fail the 2nd request
+ * due to invalid flags configuration in the 2nd request.
+ */
+ num_inv = 2;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 0;
+ inv_reqs[1].flags = 0xffffffff;
+ inv_reqs[1].vdev_id = 0x99;
+ inv_reqs[1].cache_id = 1;
+ test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache(dev_id, 0, 0);
+ test_cmd_dev_check_cache(dev_id, 1,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 2,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 3,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
- /* Invalidate all cache entries for nested_dev_id[1] and verify */
- num_inv = 1;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache_all(dev_id, 0);
- test_ioctl_destroy(vdev_id);
- }
+ /*
+ * Invalidate the 1st cache entry but fail the 2nd request
+ * due to invalid cache_id configuration in the 2nd request.
+ */
+ num_inv = 2;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 0;
+ inv_reqs[1].flags = 0;
+ inv_reqs[1].vdev_id = 0x99;
+ inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache(dev_id, 0, 0);
+ test_cmd_dev_check_cache(dev_id, 1,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 2,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 3,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+ /* Invalidate the 2nd cache entry and verify */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 1;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache(dev_id, 0, 0);
+ test_cmd_dev_check_cache(dev_id, 1, 0);
+ test_cmd_dev_check_cache(dev_id, 2,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 3,
+ IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+ /* Invalidate the 3rd and 4th cache entries and verify */
+ num_inv = 2;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 2;
+ inv_reqs[1].flags = 0;
+ inv_reqs[1].vdev_id = 0x99;
+ inv_reqs[1].cache_id = 3;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 2);
+ test_cmd_dev_check_cache_all(dev_id, 0);
+
+ /* Invalidate all cache entries for nested_dev_id[1] and verify */
+ num_inv = 1;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache_all(dev_id, 0);
+ test_ioctl_destroy(vdev_id);
}
FIXTURE(iommufd_device_pasid)
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone
2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun
` (3 preceding siblings ...)
2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
@ 2025-06-27 3:38 ` Xu Yilun
4 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
This tests the flow to tombstone vdevice when idevice is to be unbound
before vdevice destruction. The expected result is:
- idevice unbinding tombstones vdevice ID, the ID can't be repurposed
anymore.
- Even ioctl(IOMMU_DESTROY) can't free the tombstoned ID.
- iommufd_fops_release() can still free everything.
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
tools/testing/selftests/iommu/iommufd.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 4a9b6e3b37fa..e1470a7a42cd 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -3016,6 +3016,20 @@ TEST_F(iommufd_viommu, vdevice_cache)
test_ioctl_destroy(vdev_id);
}
+TEST_F(iommufd_viommu, vdevice_tombstone)
+{
+ uint32_t viommu_id = self->viommu_id;
+ uint32_t dev_id = self->device_id;
+ uint32_t vdev_id = 0;
+
+ if (!dev_id)
+ SKIP(return, "Skipping test for variant no_viommu");
+
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+ test_ioctl_destroy(self->stdev_id);
+ EXPECT_ERRNO(ENOENT, _test_ioctl_destroy(self->fd, vdev_id));
+}
+
FIXTURE(iommufd_device_pasid)
{
int fd;
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
@ 2025-06-30 3:08 ` Baolu Lu
2025-06-30 7:24 ` Xu Yilun
2025-06-30 5:52 ` Tian, Kevin
2025-06-30 19:50 ` Nicolin Chen
2 siblings, 1 reply; 32+ messages in thread
From: Baolu Lu @ 2025-06-30 3:08 UTC (permalink / raw)
To: Xu Yilun, jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, yilun.xu
On 6/27/25 11:38, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
>
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
>
> Since this happens for abnomal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
>
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
>
> Suggested-by: Jason Gunthorpe<jgg@nvidia.com>
> Co-developed-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com>
> ---
> drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
> drivers/iommu/iommufd/main.c | 19 ++++++++++++++++---
> 2 files changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 9ccc83341f32..fbc9ef78d81f 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
> struct iommufd_object *obj);
>
> enum {
> - REMOVE_WAIT_SHORTTERM = 1,
> + REMOVE_WAIT_SHORTTERM = BIT(0),
> + REMOVE_OBJ_TOMBSTONE = BIT(1),
> };
> int iommufd_object_remove(struct iommufd_ctx *ictx,
> struct iommufd_object *to_destroy, u32 id,
> @@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
> WARN_ON(ret);
> }
>
> +/*
> + * Similar to iommufd_object_destroy_user(), except that the object ID is left
> + * reserved/tombstoned.
> + */
> +static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
> + struct iommufd_object *obj)
> +{
> + int ret;
> +
> + ret = iommufd_object_remove(ictx, obj, obj->id,
> + REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
> +
> + /*
> + * If there is a bug and we couldn't destroy the object then we did put
> + * back the caller's users refcount and will eventually try to free it
> + * again during close.
> + */
> + WARN_ON(ret);
> +}
> +
> /*
> * The HWPT allocated by autodomains is used in possibly many devices and
> * is automatically destroyed when its refcount reaches zero.
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 3df468f64e7d..620923669b42 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
> goto err_xa;
> }
>
> - xas_store(&xas, NULL);
> + xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
> if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
> ictx->vfio_ioas = NULL;
> xa_unlock(&ictx->objects);
> @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
> struct iommufd_sw_msi_map *next;
> struct iommufd_sw_msi_map *cur;
> struct iommufd_object *obj;
> + bool empty;
>
> /*
> * The objects in the xarray form a graph of "users" counts, and we have
> @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
> * until the entire list is destroyed. If this can't progress then there
> * is some bug related to object refcounting.
> */
> - while (!xa_empty(&ictx->objects)) {
> + for (;;) {
> unsigned int destroyed = 0;
> unsigned long index;
>
> + empty = true;
> xa_for_each(&ictx->objects, index, obj) {
> + empty = false;
People like me, who don't know the details of how xa_for_each() and
xa_empty() work, will ask why "empty = xa_empty()" isn't used here. So
it's better to add some comments to make it more readable.
/* XA_ZERO_ENTRY is treated as a null entry by xa_for_each(). */
Others look good to me.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> if (!refcount_dec_if_one(&obj->users))
> continue;
> +
> destroyed++;
> xa_erase(&ictx->objects, index);
> iommufd_object_ops[obj->type].destroy(obj);
> kfree(obj);
> }
> +
> + if (empty)
> + break;
> +
> /* Bug related to users refcount */
> if (WARN_ON(!destroyed))
> break;
> }
Thanks,
baolu
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun
@ 2025-06-30 5:08 ` Baolu Lu
2025-07-08 8:34 ` Xu Yilun
2025-06-30 6:27 ` Tian, Kevin
2025-06-30 19:34 ` Nicolin Chen
2 siblings, 1 reply; 32+ messages in thread
From: Baolu Lu @ 2025-06-30 5:08 UTC (permalink / raw)
To: Xu Yilun, jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, yilun.xu
On 6/27/25 11:38, Xu Yilun wrote:
> Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destroy so
> that vdev can't outlive idev.
>
> iommufd_device (idev) represents the physical device bound to iommufd,
> while the iommufd_vdevice (vdev) represents the virtual instance of the
> physical device in the VM. The lifecycle of the vdev should not be
> longer than idev. This doesn't cause real problem on existing use cases
> cause vdev doesn't impact the physical device, only provides
> virtualization information. But to extend vdev for Confidential
> Computing (CC), there are needs to do secure configuration for the vdev,
> e.g. TSM Bind/Unbind. These configurations should be rolled back on idev
> destroy, or the external driver (VFIO) functionality may be impact.
>
> Building the association between idev & vdev requires the two objects
> pointing each other, but not referencing each other. This requires
> proper locking. This is done by reviving some of Nicolin's patch [1].
>
> There are 3 cases on idev destroy:
>
> 1. vdev is already destroyed by userspace. No extra handling needed.
> 2. vdev is still alive. Use iommufd_object_tombstone_user() to
> destroy vdev and tombstone the vdev ID.
> 3. vdev is being destroyed by userspace. The vdev ID is already freed,
> but vdev destroy handler is not completed. Destroy the vdev
> immediately. To solve the racing with userspace destruction, make
> iommufd_vdevice_abort() reentrant.
>
> [1]:https://lore.kernel.org/
> all/53025c827c44d68edb6469bfd940a8e8bc6147a5.1729897278.git.nicolinc@nvidia.com/
>
> Originally-by: Nicolin Chen<nicolinc@nvidia.com>
> Suggested-by: Jason Gunthorpe<jgg@nvidia.com>
> Co-developed-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com>
> ---
> drivers/iommu/iommufd/device.c | 42 +++++++++++++++++++++++
> drivers/iommu/iommufd/iommufd_private.h | 11 +++++++
> drivers/iommu/iommufd/main.c | 1 +
> drivers/iommu/iommufd/viommu.c | 44 +++++++++++++++++++++++--
> 4 files changed, 95 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 86244403b532..0937d4989185 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -137,11 +137,53 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
> }
> }
>
> +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> +{
> + struct iommufd_vdevice *vdev;
> +
> + mutex_lock(&idev->igroup->lock);
> + /* vdev has been completely destroyed by userspace */
> + if (!idev->vdev)
> + goto out_unlock;
> +
> + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> + if (IS_ERR(vdev)) {
> + /*
> + * vdev is removed from xarray by userspace, but is not
> + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant,
> + * safe to destroy vdev here.
> + */
> + iommufd_vdevice_abort(&idev->vdev->obj);
> + goto out_unlock;
> + }
> +
> + /* Should never happen */
> + if (WARN_ON(vdev != idev->vdev)) {
> + iommufd_put_object(idev->ictx, &vdev->obj);
> + goto out_unlock;
> + }
> +
> + /*
> + * vdev is still alive. Hold a users refcount to prevent racing with
> + * userspace destruction, then use iommufd_object_tombstone_user() to
> + * destroy it and leave a tombstone.
> + */
> + refcount_inc(&vdev->obj.users);
> + iommufd_put_object(idev->ictx, &vdev->obj);
> + mutex_unlock(&idev->igroup->lock);
> + iommufd_object_tombstone_user(idev->ictx, &vdev->obj);
> + return;
> +
> +out_unlock:
> + mutex_unlock(&idev->igroup->lock);
> +}
> +
> void iommufd_device_destroy(struct iommufd_object *obj)
> {
> struct iommufd_device *idev =
> container_of(obj, struct iommufd_device, obj);
>
> + iommufd_device_remove_vdev(idev);
> iommu_device_release_dma_owner(idev->dev);
> iommufd_put_group(idev->igroup);
> if (!iommufd_selftest_is_mock_dev(idev->dev))
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index fbc9ef78d81f..f58aa4439c53 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -446,6 +446,7 @@ struct iommufd_device {
> /* always the physical device */
> struct device *dev;
> bool enforce_cache_coherency;
> + struct iommufd_vdevice *vdev;
> };
>
> static inline struct iommufd_device *
> @@ -621,6 +622,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
> void iommufd_viommu_destroy(struct iommufd_object *obj);
> int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
> void iommufd_vdevice_destroy(struct iommufd_object *obj);
> +void iommufd_vdevice_abort(struct iommufd_object *obj);
>
> struct iommufd_vdevice {
> struct iommufd_object obj;
> @@ -628,8 +630,17 @@ struct iommufd_vdevice {
> struct iommufd_viommu *viommu;
> struct device *dev;
> u64 id; /* per-vIOMMU virtual ID */
> + struct iommufd_device *idev;
> };
>
> +static inline struct iommufd_vdevice *
> +iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id)
> +{
> + return container_of(iommufd_get_object(ictx, id,
> + IOMMUFD_OBJ_VDEVICE),
> + struct iommufd_vdevice, obj);
> +}
> +
> #ifdef CONFIG_IOMMUFD_TEST
> int iommufd_test(struct iommufd_ucmd *ucmd);
> void iommufd_selftest_destroy(struct iommufd_object *obj);
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 620923669b42..64731b4fdbdf 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -529,6 +529,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
> },
> [IOMMUFD_OBJ_VDEVICE] = {
> .destroy = iommufd_vdevice_destroy,
> + .abort = iommufd_vdevice_abort,
> },
> [IOMMUFD_OBJ_VEVENTQ] = {
> .destroy = iommufd_veventq_destroy,
> diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
> index 01df2b985f02..632d1d7b8fd8 100644
> --- a/drivers/iommu/iommufd/viommu.c
> +++ b/drivers/iommu/iommufd/viommu.c
> @@ -84,16 +84,38 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
> return rc;
> }
>
> -void iommufd_vdevice_destroy(struct iommufd_object *obj)
> +void iommufd_vdevice_abort(struct iommufd_object *obj)
> {
> struct iommufd_vdevice *vdev =
> container_of(obj, struct iommufd_vdevice, obj);
> struct iommufd_viommu *viommu = vdev->viommu;
> + struct iommufd_device *idev = vdev->idev;
> +
> + lockdep_assert_held(&idev->igroup->lock);
> +
> + /*
> + * iommufd_vdevice_abort() could be reentrant, by
> + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only once.
> + */
> + if (!viommu)
> + return;
>
> /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */
> xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
> refcount_dec(&viommu->obj.users);
> put_device(vdev->dev);
> + vdev->viommu = NULL;
> + idev->vdev = NULL;
I feel it makes more sense to reorder the operations like this:
vdev->viommu = NULL;
vdev->idev = NULL;
idev->vdev = NULL;
put_device(vdev->dev);
put_device(vdev->dev) could potentially trigger other code paths that
might attempt to reference vdev or its associated pointers. Therefore,
it's safer to null all the relevant reference pointers before calling
put_device().
Others look good to me,
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Thanks,
baolu
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice
2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
@ 2025-06-30 5:11 ` Baolu Lu
2025-07-04 15:06 ` Jason Gunthorpe
1 sibling, 0 replies; 32+ messages in thread
From: Baolu Lu @ 2025-06-30 5:11 UTC (permalink / raw)
To: Xu Yilun, jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, yilun.xu
On 6/27/25 11:38, Xu Yilun wrote:
> Remove struct device *dev from struct vdevice.
>
> The dev pointer is the Plan B for vdevice to reference the physical
> device. As now vdev->idev is added without refcounting concern, just
> use vdev->idev->dev when needed.
>
> Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-06-30 3:08 ` Baolu Lu
@ 2025-06-30 5:52 ` Tian, Kevin
2025-06-30 6:41 ` Xu Yilun
2025-06-30 19:50 ` Nicolin Chen
2 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2025-06-30 5:52 UTC (permalink / raw)
To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
aneesh.kumar@kernel.org
Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Friday, June 27, 2025 11:38 AM
>
> @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode,
> struct file *filp)
> struct iommufd_sw_msi_map *next;
> struct iommufd_sw_msi_map *cur;
> struct iommufd_object *obj;
> + bool empty;
move into for(;;) loop
>
> /*
> * The objects in the xarray form a graph of "users" counts, and we
> have
> @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode
> *inode, struct file *filp)
> * until the entire list is destroyed. If this can't progress then there
> * is some bug related to object refcounting.
> */
> - while (!xa_empty(&ictx->objects)) {
> + for (;;) {
> unsigned int destroyed = 0;
> unsigned long index;
>
> + empty = true;
> xa_for_each(&ictx->objects, index, obj) {
> + empty = false;
> if (!refcount_dec_if_one(&obj->users))
> continue;
> +
> destroyed++;
> xa_erase(&ictx->objects, index);
> iommufd_object_ops[obj->type].destroy(obj);
> kfree(obj);
> }
> +
> + if (empty)
> + break;
> +
> /* Bug related to users refcount */
> if (WARN_ON(!destroyed))
> break;
> }
> - WARN_ON(!xa_empty(&ictx->groups));
I didn't get the rationale of this change. tombstone only means the
object ID reserved, but all the destroy work for the object has been
done, e.g. after all idevice objects are destroyed there should be no
valid igroup entries in ictx->groups (and there is no tombstone
state for igroup).
Is it a wrong change by misreading ictx->groups as ictx->objects?
> +
> + /*
> + * There may be some tombstones left over from
> + * iommufd_object_tombstone_user()
> + */
> + xa_destroy(&ictx->groups);
>
And here should be ictx->objects?
^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun
2025-06-30 5:08 ` Baolu Lu
@ 2025-06-30 6:27 ` Tian, Kevin
2025-06-30 10:18 ` Xu Yilun
2025-06-30 19:34 ` Nicolin Chen
2 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2025-06-30 6:27 UTC (permalink / raw)
To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
aneesh.kumar@kernel.org
Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Friday, June 27, 2025 11:38 AM
>
> +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> +{
> + struct iommufd_vdevice *vdev;
> +
> + mutex_lock(&idev->igroup->lock);
> + /* vdev has been completely destroyed by userspace */
> + if (!idev->vdev)
> + goto out_unlock;
> +
> + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> + if (IS_ERR(vdev)) {
> + /*
> + * vdev is removed from xarray by userspace, but is not
> + * destroyed/freed. Since iommufd_vdevice_abort() is
> reentrant,
> + * safe to destroy vdev here.
> + */
> + iommufd_vdevice_abort(&idev->vdev->obj);
> + goto out_unlock;
> + }
let's add a comment that vdev is still freed in iommufd_destroy()
in this situation.
> -void iommufd_vdevice_destroy(struct iommufd_object *obj)
> +void iommufd_vdevice_abort(struct iommufd_object *obj)
> {
> struct iommufd_vdevice *vdev =
> container_of(obj, struct iommufd_vdevice, obj);
> struct iommufd_viommu *viommu = vdev->viommu;
> + struct iommufd_device *idev = vdev->idev;
> +
> + lockdep_assert_held(&idev->igroup->lock);
> +
> + /*
> + * iommufd_vdevice_abort() could be reentrant, by
> + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only
> once.
> + */
> + if (!viommu)
> + return;
Just check idev->vdev, to be consistent with the other path.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-30 5:52 ` Tian, Kevin
@ 2025-06-30 6:41 ` Xu Yilun
0 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-30 6:41 UTC (permalink / raw)
To: Tian, Kevin
Cc: jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
aneesh.kumar@kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, joro@8bytes.org,
robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com,
aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun
On Mon, Jun 30, 2025 at 05:52:26AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Friday, June 27, 2025 11:38 AM
> >
> > @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode,
> > struct file *filp)
> > struct iommufd_sw_msi_map *next;
> > struct iommufd_sw_msi_map *cur;
> > struct iommufd_object *obj;
> > + bool empty;
>
> move into for(;;) loop
Yes.
>
> >
> > /*
> > * The objects in the xarray form a graph of "users" counts, and we
> > have
> > @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode
> > *inode, struct file *filp)
> > * until the entire list is destroyed. If this can't progress then there
> > * is some bug related to object refcounting.
> > */
> > - while (!xa_empty(&ictx->objects)) {
> > + for (;;) {
> > unsigned int destroyed = 0;
> > unsigned long index;
> >
> > + empty = true;
> > xa_for_each(&ictx->objects, index, obj) {
> > + empty = false;
> > if (!refcount_dec_if_one(&obj->users))
> > continue;
> > +
> > destroyed++;
> > xa_erase(&ictx->objects, index);
> > iommufd_object_ops[obj->type].destroy(obj);
> > kfree(obj);
> > }
> > +
> > + if (empty)
> > + break;
> > +
> > /* Bug related to users refcount */
> > if (WARN_ON(!destroyed))
> > break;
> > }
> > - WARN_ON(!xa_empty(&ictx->groups));
>
> I didn't get the rationale of this change. tombstone only means the
> object ID reserved, but all the destroy work for the object has been
> done, e.g. after all idevice objects are destroyed there should be no
> valid igroup entries in ictx->groups (and there is no tombstone
> state for igroup).
>
> Is it a wrong change by misreading ictx->groups as ictx->objects?
Sorry, this is a true mistake.
>
> > +
> > + /*
> > + * There may be some tombstones left over from
> > + * iommufd_object_tombstone_user()
> > + */
> > + xa_destroy(&ictx->groups);
> >
>
> And here should be ictx->objects?
Yes, thanks for catching up.
- xa_destroy(&ictx->groups);
+ xa_destroy(&ictx->objects);
+
+ WARN_ON(!xa_empty(&ictx->groups));
Thanks,
Yilun
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-30 3:08 ` Baolu Lu
@ 2025-06-30 7:24 ` Xu Yilun
0 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-30 7:24 UTC (permalink / raw)
To: Baolu Lu
Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams,
yilun.xu
> > - while (!xa_empty(&ictx->objects)) {
> > + for (;;) {
> > unsigned int destroyed = 0;
> > unsigned long index;
> > + empty = true;
> > xa_for_each(&ictx->objects, index, obj) {
> > + empty = false;
>
> People like me, who don't know the details of how xa_for_each() and
> xa_empty() work, will ask why "empty = xa_empty()" isn't used here. So
> it's better to add some comments to make it more readable.
>
> /* XA_ZERO_ENTRY is treated as a null entry by xa_for_each(). */
Yes. I try to make it more elaborate:
+ /*
+ * xa_for_each() will not return tomestones (zeroed entries),
+ * which prevent the xarray being empty. So use an empty flag
+ * instead of xa_empty() to indicate all entries are either
+ * NULLed or tomestoned.
+ */
empty = true;
Thanks,
Yilun
>
> Others look good to me.
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-30 6:27 ` Tian, Kevin
@ 2025-06-30 10:18 ` Xu Yilun
2025-06-30 14:50 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Xu Yilun @ 2025-06-30 10:18 UTC (permalink / raw)
To: Tian, Kevin
Cc: jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org,
aneesh.kumar@kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, joro@8bytes.org,
robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com,
aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun
On Mon, Jun 30, 2025 at 06:27:51AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Friday, June 27, 2025 11:38 AM
> >
> > +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> > +{
> > + struct iommufd_vdevice *vdev;
> > +
> > + mutex_lock(&idev->igroup->lock);
> > + /* vdev has been completely destroyed by userspace */
> > + if (!idev->vdev)
> > + goto out_unlock;
> > +
> > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> > + if (IS_ERR(vdev)) {
> > + /*
> > + * vdev is removed from xarray by userspace, but is not
> > + * destroyed/freed. Since iommufd_vdevice_abort() is
> > reentrant,
> > + * safe to destroy vdev here.
> > + */
> > + iommufd_vdevice_abort(&idev->vdev->obj);
> > + goto out_unlock;
> > + }
>
> let's add a comment that vdev is still freed in iommufd_destroy()
> in this situation.
Yes.
>
> > -void iommufd_vdevice_destroy(struct iommufd_object *obj)
> > +void iommufd_vdevice_abort(struct iommufd_object *obj)
> > {
> > struct iommufd_vdevice *vdev =
> > container_of(obj, struct iommufd_vdevice, obj);
> > struct iommufd_viommu *viommu = vdev->viommu;
> > + struct iommufd_device *idev = vdev->idev;
> > +
> > + lockdep_assert_held(&idev->igroup->lock);
> > +
> > + /*
> > + * iommufd_vdevice_abort() could be reentrant, by
> > + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only
> > once.
> > + */
> > + if (!viommu)
> > + return;
>
> Just check idev->vdev, to be consistent with the other path.
I think there is problem here. From your comments above, vdev could be
aborted/destroyed by iommufd_destroy() again *after* idev is freed.
That means in iommufd_vdevice_abort/destroy(), usage of vdev->idev or
idev->vdev or vdev->idev->igroup->lock may be invalid.
I need to reconsider this, seems we need a dedicated vdev lock to
synchronize concurrent vdev abort/destroy.
Thanks,
Yilun
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-30 10:18 ` Xu Yilun
@ 2025-06-30 14:50 ` Jason Gunthorpe
2025-07-01 9:19 ` Xu Yilun
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-06-30 14:50 UTC (permalink / raw)
To: Xu Yilun
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote:
> I need to reconsider this, seems we need a dedicated vdev lock to
> synchronize concurrent vdev abort/destroy.
It is not possible to be concurrent
destroy is only called once after it is no longer possible to call
abort.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun
2025-06-30 5:08 ` Baolu Lu
2025-06-30 6:27 ` Tian, Kevin
@ 2025-06-30 19:34 ` Nicolin Chen
2025-07-08 8:12 ` Xu Yilun
2 siblings, 1 reply; 32+ messages in thread
From: Nicolin Chen @ 2025-06-30 19:34 UTC (permalink / raw)
To: Xu Yilun
Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
yilun.xu
On Fri, Jun 27, 2025 at 11:38:06AM +0800, Xu Yilun wrote:
> +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> +{
> + struct iommufd_vdevice *vdev;
> +
> + mutex_lock(&idev->igroup->lock);
> + /* vdev has been completely destroyed by userspace */
> + if (!idev->vdev)
> + goto out_unlock;
> +
> + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> + if (IS_ERR(vdev)) {
> + /*
> + * vdev is removed from xarray by userspace, but is not
> + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant,
> + * safe to destroy vdev here.
> + */
> + iommufd_vdevice_abort(&idev->vdev->obj);
> + goto out_unlock;
This is the case #3, i.e. a racing vdev destory, in the commit log?
I think it is worth clarifying that there is a concurrent destroy:
/*
* An ongoing vdev destroy ioctl has removed the vdev from the
* object xarray but has not finished iommufd_vdevice_destroy()
* yet, as it is holding the same mutex. Destroy the vdev here,
* i.e. the iommufd_vdevice_destroy() will be a NOP once it is
* unlocked.
*/
> @@ -147,10 +183,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
> if (rc)
> goto out_abort;
> iommufd_object_finalize(ucmd->ictx, &vdev->obj);
> - goto out_put_idev;
> + goto out_unlock_igroup;
>
> out_abort:
> iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
> +out_unlock_igroup:
> + mutex_unlock(&idev->igroup->lock);
Looks like we will have to partially revert the _ucmd allocator,
in this function:
https://lore.kernel.org/all/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com/
Please try fixing the conflicts on top of Jason's for-next tree.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-06-30 3:08 ` Baolu Lu
2025-06-30 5:52 ` Tian, Kevin
@ 2025-06-30 19:50 ` Nicolin Chen
2025-07-08 8:45 ` Xu Yilun
2 siblings, 1 reply; 32+ messages in thread
From: Nicolin Chen @ 2025-06-30 19:50 UTC (permalink / raw)
To: Xu Yilun
Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
yilun.xu
On Fri, Jun 27, 2025 at 11:38:05AM +0800, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
Should we describe it "partially destroy"?
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
>
> Since this happens for abnomal userspace behavior, for simplicity, the
s/abnomal/abnormal
Nicolin
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-30 14:50 ` Jason Gunthorpe
@ 2025-07-01 9:19 ` Xu Yilun
2025-07-01 12:13 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Xu Yilun @ 2025-07-01 9:19 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote:
>
> > I need to reconsider this, seems we need a dedicated vdev lock to
> > synchronize concurrent vdev abort/destroy.
>
> It is not possible to be concurrent
>
> destroy is only called once after it is no longer possible to call
> abort.
I'm almost about to drop the "abort twice" idea. [1]
[1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@nvidia.com/
See from the flow below,
T1. iommufd_device_unbind(idev)
iommufd_device_destroy(obj)
mutex_lock(&idev->igroup->lock)
iommufd_vdevice_abort(idev->vdev.obj)
mutex_unlock(&idev->igroup->lock)
kfree(obj)
T2. iommufd_destroy(vdev_id)
iommufd_vdevice_destroy(obj)
mutex_lock(&vdev->idev->igroup->lock)
iommufd_vdevice_abort(obj);
mutex_unlock(&vdev->idev->igroup->lock)
kfree(obj)
iommufd_vdevice_destroy() will access idev->igroup->lock, but it is
possible the idev is already freed at that time:
iommufd_destroy(vdev_id)
iommufd_vdevice_destroy(obj)
iommufd_device_unbind(idev)
iommufd_device_destroy(obj)
mutex_lock(&idev->igroup->lock)
mutex_lock(&vdev->idev->igroup->lock) (wait)
iommufd_vdevice_abort(idev->vdev.obj)
mutex_unlock(&idev->igroup->lock)
kfree(obj)
mutex_lock(&vdev->idev->igroup->lock) (PANIC)
iommufd_vdevice_abort(obj)
...
We also can't introduce some vdev side lock instead of idev side lock,
vdev could also be freed right after iommufd_vdevice_destroy().
I think the only simple way is to let idev destruction wait for vdev
destruction, that go back to the wait_event idea ...
Thanks,
Yilun
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-01 9:19 ` Xu Yilun
@ 2025-07-01 12:13 ` Jason Gunthorpe
2025-07-02 2:23 ` Xu Yilun
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-07-01 12:13 UTC (permalink / raw)
To: Xu Yilun
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote:
> On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote:
> > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote:
> >
> > > I need to reconsider this, seems we need a dedicated vdev lock to
> > > synchronize concurrent vdev abort/destroy.
> >
> > It is not possible to be concurrent
> >
> > destroy is only called once after it is no longer possible to call
> > abort.
>
> I'm almost about to drop the "abort twice" idea. [1]
>
> [1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@nvidia.com/
>
> See from the flow below,
>
> T1. iommufd_device_unbind(idev)
> iommufd_device_destroy(obj)
> mutex_lock(&idev->igroup->lock)
> iommufd_vdevice_abort(idev->vdev.obj)
> mutex_unlock(&idev->igroup->lock)
> kfree(obj)
>
> T2. iommufd_destroy(vdev_id)
> iommufd_vdevice_destroy(obj)
> mutex_lock(&vdev->idev->igroup->lock)
> iommufd_vdevice_abort(obj);
> mutex_unlock(&vdev->idev->igroup->lock)
> kfree(obj)
>
> iommufd_vdevice_destroy() will access idev->igroup->lock, but it is
> possible the idev is already freed at that time:
>
> iommufd_destroy(vdev_id)
> iommufd_vdevice_destroy(obj)
> iommufd_device_unbind(idev)
> iommufd_device_destroy(obj)
> mutex_lock(&idev->igroup->lock)
> mutex_lock(&vdev->idev->igroup->lock) (wait)
> iommufd_vdevice_abort(idev->vdev.obj)
> mutex_unlock(&idev->igroup->lock)
> kfree(obj)
> mutex_lock(&vdev->idev->igroup->lock) (PANIC)
> iommufd_vdevice_abort(obj)
> ...
Yes, you can't touch idev inside the destroy function at all, under
any version. idev is only valid if you have a refcount on vdev.
But why are you touching this lock? Arrange things so abort doesn't
touch the idev??
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-01 12:13 ` Jason Gunthorpe
@ 2025-07-02 2:23 ` Xu Yilun
2025-07-02 9:13 ` Tian, Kevin
0 siblings, 1 reply; 32+ messages in thread
From: Xu Yilun @ 2025-07-02 2:23 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Tue, Jul 01, 2025 at 09:13:15AM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote:
> > On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote:
> > >
> > > > I need to reconsider this, seems we need a dedicated vdev lock to
> > > > synchronize concurrent vdev abort/destroy.
> > >
> > > It is not possible to be concurrent
> > >
> > > destroy is only called once after it is no longer possible to call
> > > abort.
> >
> > I'm almost about to drop the "abort twice" idea. [1]
> >
> > [1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@nvidia.com/
> >
> > See from the flow below,
> >
> > T1. iommufd_device_unbind(idev)
> > iommufd_device_destroy(obj)
> > mutex_lock(&idev->igroup->lock)
> > iommufd_vdevice_abort(idev->vdev.obj)
> > mutex_unlock(&idev->igroup->lock)
> > kfree(obj)
> >
> > T2. iommufd_destroy(vdev_id)
> > iommufd_vdevice_destroy(obj)
> > mutex_lock(&vdev->idev->igroup->lock)
> > iommufd_vdevice_abort(obj);
> > mutex_unlock(&vdev->idev->igroup->lock)
> > kfree(obj)
> >
> > iommufd_vdevice_destroy() will access idev->igroup->lock, but it is
> > possible the idev is already freed at that time:
> >
> > iommufd_destroy(vdev_id)
> > iommufd_vdevice_destroy(obj)
> > iommufd_device_unbind(idev)
> > iommufd_device_destroy(obj)
> > mutex_lock(&idev->igroup->lock)
> > mutex_lock(&vdev->idev->igroup->lock) (wait)
> > iommufd_vdevice_abort(idev->vdev.obj)
> > mutex_unlock(&idev->igroup->lock)
> > kfree(obj)
> > mutex_lock(&vdev->idev->igroup->lock) (PANIC)
> > iommufd_vdevice_abort(obj)
> > ...
>
> Yes, you can't touch idev inside the destroy function at all, under
> any version. idev is only valid if you have a refcount on vdev.
>
> But why are you touching this lock? Arrange things so abort doesn't
> touch the idev??
idev has a pointer idev->vdev to track the vdev's lifecycle.
idev->igroup->lock protects the pointer. At the end of
iommufd_vdevice_destroy() this pointer should be NULLed so that idev
knows vdev is really destroyed.
I haven't found a safer way for vdev to sync up its validness with idev
w/o touching idev.
I was thinking of using vdev->idev and some vdev lock for tracking
instead. Then iommufd_vdevice_abort() doesn't touch idev. But it is
still the same, just switch to put idev in risk:
iommufd_destroy(vdev_id)
iommufd_vdevice_destroy(obj)
iommufd_device_unbind(idev)
iommufd_device_destroy(obj)
mutex_lock(&vdev->some_lock)
mutex_lock(&idev->vdev->some_lock) (wait)
iommufd_vdevice_abort(obj)
mutex_unlock(&vdev->some_lock)
kfree(obj)
mutex_lock(&idev->vdev->some_lock) (PANIC)
iommufd_vdevice_abort(idev->vdev.obj)
...
Thanks,
Yilun
^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-02 2:23 ` Xu Yilun
@ 2025-07-02 9:13 ` Tian, Kevin
2025-07-02 12:40 ` Jason Gunthorpe
0 siblings, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2025-07-02 9:13 UTC (permalink / raw)
To: Xu Yilun, Jason Gunthorpe
Cc: will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, joro@8bytes.org,
robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com,
aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 2, 2025 10:24 AM
>
> On Tue, Jul 01, 2025 at 09:13:15AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote:
> > > On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote:
> > > >
> > > > > I need to reconsider this, seems we need a dedicated vdev lock to
> > > > > synchronize concurrent vdev abort/destroy.
> > > >
> > > > It is not possible to be concurrent
> > > >
> > > > destroy is only called once after it is no longer possible to call
> > > > abort.
> > >
> > > I'm almost about to drop the "abort twice" idea. [1]
> > >
> > > [1]: https://lore.kernel.org/linux-
> iommu/20250625123832.GF167785@nvidia.com/
> > >
> > > See from the flow below,
> > >
> > > T1. iommufd_device_unbind(idev)
> > > iommufd_device_destroy(obj)
> > > mutex_lock(&idev->igroup->lock)
> > > iommufd_vdevice_abort(idev->vdev.obj)
> > > mutex_unlock(&idev->igroup->lock)
> > > kfree(obj)
> > >
> > > T2. iommufd_destroy(vdev_id)
> > > iommufd_vdevice_destroy(obj)
> > > mutex_lock(&vdev->idev->igroup->lock)
> > > iommufd_vdevice_abort(obj);
> > > mutex_unlock(&vdev->idev->igroup->lock)
> > > kfree(obj)
> > >
> > > iommufd_vdevice_destroy() will access idev->igroup->lock, but it is
> > > possible the idev is already freed at that time:
> > >
> > > iommufd_destroy(vdev_id)
> > > iommufd_vdevice_destroy(obj)
> > > iommufd_device_unbind(idev)
> > > iommufd_device_destroy(obj)
> > > mutex_lock(&idev->igroup->lock)
> > > mutex_lock(&vdev->idev->igroup->lock) (wait)
> > > iommufd_vdevice_abort(idev->vdev.obj)
> > > mutex_unlock(&idev->igroup->lock)
> > > kfree(obj)
> > > mutex_lock(&vdev->idev->igroup->lock) (PANIC)
> > > iommufd_vdevice_abort(obj)
> > > ...
> >
> > Yes, you can't touch idev inside the destroy function at all, under
> > any version. idev is only valid if you have a refcount on vdev.
> >
> > But why are you touching this lock? Arrange things so abort doesn't
> > touch the idev??
>
> idev has a pointer idev->vdev to track the vdev's lifecycle.
> idev->igroup->lock protects the pointer. At the end of
> iommufd_vdevice_destroy() this pointer should be NULLed so that idev
> knows vdev is really destroyed.
>
> I haven't found a safer way for vdev to sync up its validness with idev
> w/o touching idev.
>
> I was thinking of using vdev->idev and some vdev lock for tracking
> instead. Then iommufd_vdevice_abort() doesn't touch idev. But it is
> still the same, just switch to put idev in risk:
>
>
> iommufd_destroy(vdev_id)
> iommufd_vdevice_destroy(obj)
> iommufd_device_unbind(idev)
> iommufd_device_destroy(obj)
> mutex_lock(&vdev->some_lock)
> mutex_lock(&idev->vdev->some_lock) (wait)
panic could happen here, between acquiring idev->vdev and
mutex(vdev->some_lock) as vdev might be destroyed/freed
in-between.
> iommufd_vdevice_abort(obj)
> mutex_unlock(&vdev->some_lock)
> kfree(obj)
> mutex_lock(&idev->vdev->some_lock) (PANIC)
> iommufd_vdevice_abort(idev->vdev.obj)
> ...
>
I cannot find a safe way either, except using certain global lock.
but comparing to that I'd prefer to the original wait approach...
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-02 9:13 ` Tian, Kevin
@ 2025-07-02 12:40 ` Jason Gunthorpe
2025-07-03 4:32 ` Tian, Kevin
2025-07-07 10:58 ` Xu Yilun
0 siblings, 2 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 12:40 UTC (permalink / raw)
To: Tian, Kevin
Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Wed, Jul 02, 2025 at 09:13:50AM +0000, Tian, Kevin wrote:
> > > Yes, you can't touch idev inside the destroy function at all, under
> > > any version. idev is only valid if you have a refcount on vdev.
> > >
> > > But why are you touching this lock? Arrange things so abort doesn't
> > > touch the idev??
> >
> > idev has a pointer idev->vdev to track the vdev's lifecycle.
> > idev->igroup->lock protects the pointer. At the end of
> > iommufd_vdevice_destroy() this pointer should be NULLed so that idev
> > knows vdev is really destroyed.
Well, that is destroy, not abort, but OK, there is an issue with
destroy.
> but comparing to that I'd prefer to the original wait approach...
Okay, but lets try to keep the wait hidden inside the refcounting..
The issue here is we don't hold a refcount on idev while working with
idev. Let's fix that and then things should work properly?
Maybe something like this:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4e781aa9fc6329..9174fa7c972b80 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -178,12 +178,20 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev)
mutex_unlock(&idev->igroup->lock);
}
+void iommufd_device_pre_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_device *idev =
+ container_of(obj, struct iommufd_device, obj);
+
+ /* Release the short term users on this */
+ iommufd_device_remove_vdev(idev);
+}
+
void iommufd_device_destroy(struct iommufd_object *obj)
{
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
- iommufd_device_remove_vdev(idev);
iommu_device_release_dma_owner(idev->dev);
iommufd_put_group(idev->igroup);
if (!iommufd_selftest_is_mock_dev(idev->dev))
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index b2e8e106c16158..387c630fdabfbd 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -151,6 +151,9 @@ static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
if (refcount_dec_and_test(&to_destroy->shortterm_users))
return 0;
+ if (iommufd_object_ops[to_destroy->type].pre_destroy)
+ iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
+
if (wait_event_timeout(ictx->destroy_wait,
refcount_read(&to_destroy->shortterm_users) == 0,
msecs_to_jiffies(60000)))
@@ -567,6 +570,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
.destroy = iommufd_access_destroy_object,
},
[IOMMUFD_OBJ_DEVICE] = {
+ .pre_destroy = iommufd_device_pre_destroy,
.destroy = iommufd_device_destroy,
},
[IOMMUFD_OBJ_FAULT] = {
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 9451a311745f7b..cbf99daa7dc25d 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -135,6 +135,7 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj)
mutex_lock(&vdev->idev->igroup->lock);
iommufd_vdevice_abort(obj);
mutex_unlock(&vdev->idev->igroup->lock);
+ iommufd_put_object(vdev->viommu->ictx, &vdev->idev->obj);
}
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
@@ -180,13 +181,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
vdev->id = virt_id;
vdev->viommu = viommu;
refcount_inc(&viommu->obj.users);
+
+ /*
+ * A reference is held on the idev so long as we have a pointer.
+ * iommufd_device_pre_destroy() will revoke it before the real
+ * destruction.
+ */
+ vdev->idev = idev;
+
/*
* iommufd_device_destroy() waits until idev->vdev is NULL before
* freeing the idev, which only happens once the vdev is finished
- * destruction. Thus we do not need refcounting on either idev->vdev or
- * vdev->idev.
+ * destruction.
*/
- vdev->idev = idev;
idev->vdev = vdev;
curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
@@ -207,7 +214,8 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
out_unlock_igroup:
mutex_unlock(&idev->igroup->lock);
out_put_idev:
- iommufd_put_object(ucmd->ictx, &idev->obj);
+ if (rc)
+ iommufd_put_object(ucmd->ictx, &idev->obj);
out_put_viommu:
iommufd_put_object(ucmd->ictx, &viommu->obj);
return rc;
^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-02 12:40 ` Jason Gunthorpe
@ 2025-07-03 4:32 ` Tian, Kevin
2025-07-03 11:21 ` Jason Gunthorpe
2025-07-07 10:58 ` Xu Yilun
1 sibling, 1 reply; 32+ messages in thread
From: Tian, Kevin @ 2025-07-03 4:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, July 2, 2025 8:41 PM
>
> On Wed, Jul 02, 2025 at 09:13:50AM +0000, Tian, Kevin wrote:
> > > > Yes, you can't touch idev inside the destroy function at all, under
> > > > any version. idev is only valid if you have a refcount on vdev.
> > > >
> > > > But why are you touching this lock? Arrange things so abort doesn't
> > > > touch the idev??
> > >
> > > idev has a pointer idev->vdev to track the vdev's lifecycle.
> > > idev->igroup->lock protects the pointer. At the end of
> > > iommufd_vdevice_destroy() this pointer should be NULLed so that idev
> > > knows vdev is really destroyed.
>
> Well, that is destroy, not abort, but OK, there is an issue with
> destroy.
>
> > but comparing to that I'd prefer to the original wait approach...
>
> Okay, but lets try to keep the wait hidden inside the refcounting..
>
> The issue here is we don't hold a refcount on idev while working with
> idev. Let's fix that and then things should work properly?
>
> Maybe something like this:
>
> diff --git a/drivers/iommu/iommufd/device.c
> b/drivers/iommu/iommufd/device.c
> index 4e781aa9fc6329..9174fa7c972b80 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -178,12 +178,20 @@ static void iommufd_device_remove_vdev(struct
> iommufd_device *idev)
> mutex_unlock(&idev->igroup->lock);
> }
>
> +void iommufd_device_pre_destroy(struct iommufd_object *obj)
> +{
> + struct iommufd_device *idev =
> + container_of(obj, struct iommufd_device, obj);
> +
> + /* Release the short term users on this */
> + iommufd_device_remove_vdev(idev);
> +}
> +
> void iommufd_device_destroy(struct iommufd_object *obj)
> {
> struct iommufd_device *idev =
> container_of(obj, struct iommufd_device, obj);
>
> - iommufd_device_remove_vdev(idev);
> iommu_device_release_dma_owner(idev->dev);
> iommufd_put_group(idev->igroup);
> if (!iommufd_selftest_is_mock_dev(idev->dev))
> diff --git a/drivers/iommu/iommufd/main.c
> b/drivers/iommu/iommufd/main.c
> index b2e8e106c16158..387c630fdabfbd 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -151,6 +151,9 @@ static int
> iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
> if (refcount_dec_and_test(&to_destroy->shortterm_users))
> return 0;
>
> + if (iommufd_object_ops[to_destroy->type].pre_destroy)
> + iommufd_object_ops[to_destroy-
> >type].pre_destroy(to_destroy);
> +
> if (wait_event_timeout(ictx->destroy_wait,
> refcount_read(&to_destroy->shortterm_users)
> == 0,
> msecs_to_jiffies(60000)))
> @@ -567,6 +570,7 @@ static const struct iommufd_object_ops
> iommufd_object_ops[] = {
> .destroy = iommufd_access_destroy_object,
> },
> [IOMMUFD_OBJ_DEVICE] = {
> + .pre_destroy = iommufd_device_pre_destroy,
> .destroy = iommufd_device_destroy,
> },
> [IOMMUFD_OBJ_FAULT] = {
> diff --git a/drivers/iommu/iommufd/viommu.c
> b/drivers/iommu/iommufd/viommu.c
> index 9451a311745f7b..cbf99daa7dc25d 100644
> --- a/drivers/iommu/iommufd/viommu.c
> +++ b/drivers/iommu/iommufd/viommu.c
> @@ -135,6 +135,7 @@ void iommufd_vdevice_destroy(struct
> iommufd_object *obj)
> mutex_lock(&vdev->idev->igroup->lock);
> iommufd_vdevice_abort(obj);
> mutex_unlock(&vdev->idev->igroup->lock);
> + iommufd_put_object(vdev->viommu->ictx, &vdev->idev->obj);
> }
>
> int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
> @@ -180,13 +181,19 @@ int iommufd_vdevice_alloc_ioctl(struct
> iommufd_ucmd *ucmd)
> vdev->id = virt_id;
> vdev->viommu = viommu;
> refcount_inc(&viommu->obj.users);
> +
> + /*
> + * A reference is held on the idev so long as we have a pointer.
> + * iommufd_device_pre_destroy() will revoke it before the real
> + * destruction.
> + */
> + vdev->idev = idev;
> +
> /*
> * iommufd_device_destroy() waits until idev->vdev is NULL before
> * freeing the idev, which only happens once the vdev is finished
> - * destruction. Thus we do not need refcounting on either idev->vdev
> or
> - * vdev->idev.
> + * destruction.
> */
> - vdev->idev = idev;
> idev->vdev = vdev;
>
> curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev,
> GFP_KERNEL);
> @@ -207,7 +214,8 @@ int iommufd_vdevice_alloc_ioctl(struct
> iommufd_ucmd *ucmd)
> out_unlock_igroup:
> mutex_unlock(&idev->igroup->lock);
> out_put_idev:
> - iommufd_put_object(ucmd->ictx, &idev->obj);
> + if (rc)
> + iommufd_put_object(ucmd->ictx, &idev->obj);
but this sounds like a misuse of shortterm_users which is not intended
to be held long beyond ioctl...
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-03 4:32 ` Tian, Kevin
@ 2025-07-03 11:21 ` Jason Gunthorpe
0 siblings, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 11:21 UTC (permalink / raw)
To: Tian, Kevin
Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Thu, Jul 03, 2025 at 04:32:26AM +0000, Tian, Kevin wrote:
> but this sounds like a misuse of shortterm_users which is not intended
> to be held long beyond ioctl...
With the addition of the pre_destroy it purpose gets changed to
'beyond the ioctl or past pre_destroy'
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice
2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
2025-06-30 5:11 ` Baolu Lu
@ 2025-07-04 15:06 ` Jason Gunthorpe
1 sibling, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-07-04 15:06 UTC (permalink / raw)
To: Xu Yilun
Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro,
robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu,
yilun.xu
On Fri, Jun 27, 2025 at 11:38:07AM +0800, Xu Yilun wrote:
> Remove struct device *dev from struct vdevice.
>
> The dev pointer is the Plan B for vdevice to reference the physical
> device. As now vdev->idev is added without refcounting concern, just
> use vdev->idev->dev when needed.
>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
> drivers/iommu/iommufd/driver.c | 4 ++--
> drivers/iommu/iommufd/iommufd_private.h | 1 -
> drivers/iommu/iommufd/viommu.c | 3 ---
> 3 files changed, 2 insertions(+), 6 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant
2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
@ 2025-07-04 15:07 ` Jason Gunthorpe
0 siblings, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2025-07-04 15:07 UTC (permalink / raw)
To: Xu Yilun
Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro,
robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu,
yilun.xu
On Fri, Jun 27, 2025 at 11:38:08AM +0800, Xu Yilun wrote:
> no_viommu is not applicable for some viommu/vdevice tests. Explicitly
> report the skipping, don't do it silently.
>
> Only add the prints. No functional change intended.
>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
> tools/testing/selftests/iommu/iommufd.c | 378 ++++++++++++------------
> 1 file changed, 190 insertions(+), 188 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-02 12:40 ` Jason Gunthorpe
2025-07-03 4:32 ` Tian, Kevin
@ 2025-07-07 10:58 ` Xu Yilun
2025-07-07 12:25 ` Jason Gunthorpe
1 sibling, 1 reply; 32+ messages in thread
From: Xu Yilun @ 2025-07-07 10:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Wed, Jul 02, 2025 at 09:40:42AM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 02, 2025 at 09:13:50AM +0000, Tian, Kevin wrote:
> > > > Yes, you can't touch idev inside the destroy function at all, under
> > > > any version. idev is only valid if you have a refcount on vdev.
> > > >
> > > > But why are you touching this lock? Arrange things so abort doesn't
> > > > touch the idev??
> > >
> > > idev has a pointer idev->vdev to track the vdev's lifecycle.
> > > idev->igroup->lock protects the pointer. At the end of
> > > iommufd_vdevice_destroy() this pointer should be NULLed so that idev
> > > knows vdev is really destroyed.
>
> Well, that is destroy, not abort, but OK, there is an issue with
> destroy.
>
> > but comparing to that I'd prefer to the original wait approach...
>
> Okay, but lets try to keep the wait hidden inside the refcounting..
>
> The issue here is we don't hold a refcount on idev while working with
> idev. Let's fix that and then things should work properly?
>
> Maybe something like this:
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 4e781aa9fc6329..9174fa7c972b80 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -178,12 +178,20 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> mutex_unlock(&idev->igroup->lock);
> }
>
> +void iommufd_device_pre_destroy(struct iommufd_object *obj)
> +{
> + struct iommufd_device *idev =
> + container_of(obj, struct iommufd_device, obj);
> +
> + /* Release the short term users on this */
> + iommufd_device_remove_vdev(idev);
> +}
> +
> void iommufd_device_destroy(struct iommufd_object *obj)
> {
> struct iommufd_device *idev =
> container_of(obj, struct iommufd_device, obj);
>
> - iommufd_device_remove_vdev(idev);
> iommu_device_release_dma_owner(idev->dev);
> iommufd_put_group(idev->igroup);
> if (!iommufd_selftest_is_mock_dev(idev->dev))
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index b2e8e106c16158..387c630fdabfbd 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -151,6 +151,9 @@ static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
> if (refcount_dec_and_test(&to_destroy->shortterm_users))
> return 0;
>
> + if (iommufd_object_ops[to_destroy->type].pre_destroy)
> + iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
> +
> if (wait_event_timeout(ictx->destroy_wait,
> refcount_read(&to_destroy->shortterm_users) ==i 0,
I still see concerns here. The pre_destroy() and wait_event is before the
idev's users count decreasing to 0, which means a new user could get in.
The worse thing is the new user could hold the shortterm_users until
pre_destroy(), but there isn't a second pre_destroy()...
I think extending the life of shortterm_users at runtime makes things
complex. My idea is keep the original definition of shortterm_users at
runtime, only repurpose it as a wait completion after users refcount == 0.
I.e. idev increase its own shortterm_users, waiting for vdev to decrease
it.
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 9876bf70980a..49b787bc0688 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -147,8 +147,15 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev)
goto out_unlock;
vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
- if (IS_ERR(vdev))
+ if (IS_ERR(vdev)) {
+ /*
+ * vdev is removed from xarray by userspace, but is not
+ * destroyed. We should wait for vdev real destruction before
+ * idev destruction.
+ */
+ iommufd_object_destroy_wait_prepare(&idev->obj);
goto out_unlock;
+ }
/* Should never happen */
if (WARN_ON(vdev != idev->vdev)) {
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 08f3c66366ea..6620006df8d1 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -180,6 +180,47 @@ static inline void iommufd_put_object(struct iommufd_ctx *ictx,
wake_up_interruptible_all(&ictx->destroy_wait);
}
+/*
+ * The iommufd_object_destroy_wait_xx() are used to synchronize the last minute
+ * destructions of interrelated objects. E.g. the vdev's destroy() should
+ * always be called before idev's destroy().
+ *
+ * iommufd_object_destroy_wait_prepare() should only be called by
+ * later-destroyer in its pre_destroy() ops, where the users & shortterm_users
+ * refcounts are all zeroed. I.e. no new user is possible. Use "users == 0 &&
+ * shortterm_users == 1" as the "destroy wait" state, as this combination
+ * is not used in runtime usecase and it also prevents new users.
+ *
+ * The later-destroyer should use their own locking to ensure the
+ * first-destroyer still exists when calling this function.
+ */
+static inline void
+iommufd_object_destroy_wait_prepare(struct iommufd_object *to_destroy)
+{
+ if (WARN_ON(refcount_read(&to_destroy->users) != 0 ||
+ refcount_read(&to_destroy->shortterm_users) != 0))
+ return;
+
+ refcount_set(&to_destroy->shortterm_users, 1);
+}
+
+/*
+ * iommufd_object_destroy_wait_complete() should be called by first-destroyer
+ * in its destroy() ops. The first-destroyer should not access the
+ * later-destroyer after calling this function.
+ */
+static inline void
+iommufd_object_destroy_wait_complete(struct iommufd_ctx *ictx,
+ struct iommufd_object *to_destroy)
+{
+ if (refcount_read(&to_destroy->users) != 0 ||
+ refcount_read(&to_destroy->shortterm_users) != 1)
+ return;
+
+ refcount_dec(&to_destroy->shortterm_users);
+ wake_up_interruptible_all(&ictx->destroy_wait);
+}
+
void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj);
void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index fbccfbd8fb03..148218293f55 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -96,15 +96,37 @@ struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id,
return obj;
}
+static int iommufd_object_pre_destroy_wait(struct iommufd_ctx *ictx,
+ struct iommufd_object *to_destroy)
+{
+ if (!iommufd_object_ops[to_destroy->type].pre_destroy)
+ return 0;
+
+ /*
+ * pre_destroy() may put the object in destroy wait state (users == 0
+ * && shortterm_users == 1). Another thread completes the waiting by
+ * decreasing the shortterm_users back to 0. The helpers
+ * iommufd_object_destroy_wait_prepare() and
+ * iommufd_object_destroy_wait_complete() should be used for this
+ * purpose.
+ */
+ iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
+ if (wait_event_timeout(ictx->destroy_wait,
+ refcount_read(&to_destroy->shortterm_users) ==
+ 0,
+ msecs_to_jiffies(60000)))
+ return 0;
+
+ pr_crit("Time out waiting for iommufd object to become free\n");
+ return -EBUSY;
+}
+
static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
struct iommufd_object *to_destroy)
{
if (refcount_dec_and_test(&to_destroy->shortterm_users))
return 0;
- if (iommufd_object_ops[to_destroy->type].pre_destroy)
- iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
-
if (wait_event_timeout(ictx->destroy_wait,
refcount_read(&to_destroy->shortterm_users) ==
0,
@@ -184,6 +206,10 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
ret = iommufd_object_dec_wait_shortterm(ictx, obj);
if (WARN_ON(ret))
return ret;
+ } else {
+ ret = iommufd_object_pre_destroy_wait(ictx, obj);
+ if (WARN_ON(ret))
+ return ret;
}
iommufd_object_ops[obj->type].destroy(obj);
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index d49e8f0c8bb3..a7feefd5e472 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -103,11 +103,12 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj)
{
struct iommufd_vdevice *vdev =
container_of(obj, struct iommufd_vdevice, obj);
+ struct iommufd_device *idev = vdev->idev;
- mutex_lock(&vdev->idev->igroup->lock);
+ mutex_lock(&idev->igroup->lock);
iommufd_vdevice_abort(obj);
- mutex_unlock(&vdev->idev->igroup->lock);
- iommufd_put_object(vdev->viommu->ictx, &vdev->idev->obj);
+ mutex_unlock(&idev->igroup->lock);
+ iommufd_object_destroy_wait_complete(idev->ictx, &idev->obj);
}
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
@@ -185,8 +186,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
out_unlock_igroup:
mutex_unlock(&idev->igroup->lock);
out_put_idev:
- if (rc)
- iommufd_put_object(ucmd->ictx, &idev->obj);
+ iommufd_put_object(ucmd->ictx, &idev->obj);
out_put_viommu:
iommufd_put_object(ucmd->ictx, &viommu->obj);
return rc;
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-07 10:58 ` Xu Yilun
@ 2025-07-07 12:25 ` Jason Gunthorpe
2025-07-07 19:41 ` Xu Yilun
0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2025-07-07 12:25 UTC (permalink / raw)
To: Xu Yilun
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Mon, Jul 07, 2025 at 06:58:45PM +0800, Xu Yilun wrote:
> I still see concerns here. The pre_destroy() and wait_event is before the
> idev's users count decreasing to 0, which means a new user could get
> in.
Hum, yes it seems we could get a race with another vdevice being
created during idevice destruction, but I think this could be fixed in
a straightforward way with a flag 'being destroyed' in the idev under
the lock if it is the only issue
> The worse thing is the new user could hold the shortterm_users until
> pre_destroy(), but there isn't a second pre_destroy()...
This is OK, the point of pre_destroy is to remove the reference
cycle. So it needs to prevent new vdevs from establishing references
and remove the reference from any existing vdev.
refcounts outside of this cycle are handled by the existing
mechanisms - pre_destroy is only about the refcount loop.
> I think extending the life of shortterm_users at runtime makes things
> complex.
Well, it keeps the refcounting working the way it is supposed
to. Holding a pointer without a reference is nasty and is creating
alot of these weird problems and strange flows. Hold the refcount
along with the pointer and things follow the normal patterns.
There is nothing inherently wrong with shortterm users for this
besides its name. Think of it is as 'wait for users', and we clearly
want to wait for the vdev user to put back its idev reference.
I think there are too many gyrations with refcounts in this version
below. It is hard to reason about if zero refcounts are re-elevated to
non-zero.
> vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> - if (IS_ERR(vdev))
> + if (IS_ERR(vdev)) {
> + /*
> + * vdev is removed from xarray by userspace, but is not
> + * destroyed. We should wait for vdev real destruction before
> + * idev destruction.
> + */
> + iommufd_object_destroy_wait_prepare(&idev->obj);
> goto out_unlock;
This is now really weird, we are mangling the refcounts and the
reasoning why it is safe with the concurrent vdev thread is tricky..
> +/*
> + * iommufd_object_destroy_wait_complete() should be called by first-destroyer
> + * in its destroy() ops. The first-destroyer should not access the
> + * later-destroyer after calling this function.
> + */
> +static inline void
> +iommufd_object_destroy_wait_complete(struct iommufd_ctx *ictx,
> + struct iommufd_object *to_destroy)
> +{
> + if (refcount_read(&to_destroy->users) != 0 ||
> + refcount_read(&to_destroy->shortterm_users) != 1)
> + return;
> +
> + refcount_dec(&to_destroy->shortterm_users);
> + wake_up_interruptible_all(&ictx->destroy_wait);
> +}
This looks weird too, why is refcount_read safe? That is usually not
how this kind of logic is written.
I'm not sure this is OK, it is not an obvious known-good pattern for
lifetime management.
Jason
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-07-07 12:25 ` Jason Gunthorpe
@ 2025-07-07 19:41 ` Xu Yilun
0 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-07-07 19:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
nicolinc@nvidia.com, aik@amd.com, Williams, Dan J,
baolu.lu@linux.intel.com, Xu, Yilun
On Mon, Jul 07, 2025 at 09:25:02AM -0300, Jason Gunthorpe wrote:
> On Mon, Jul 07, 2025 at 06:58:45PM +0800, Xu Yilun wrote:
>
> > I still see concerns here. The pre_destroy() and wait_event is before the
> > idev's users count decreasing to 0, which means a new user could get
> > in.
>
> Hum, yes it seems we could get a race with another vdevice being
> created during idevice destruction, but I think this could be fixed in
> a straightforward way with a flag 'being destroyed' in the idev under
> the lock if it is the only issue
>
> > The worse thing is the new user could hold the shortterm_users until
> > pre_destroy(), but there isn't a second pre_destroy()...
>
> This is OK, the point of pre_destroy is to remove the reference
> cycle. So it needs to prevent new vdevs from establishing references
> and remove the reference from any existing vdev.
>
> refcounts outside of this cycle are handled by the existing
> mechanisms - pre_destroy is only about the refcount loop.
That works for me. I didn't prefer a being destroyed flag cause it
seems not align with the current refcounting mechanism. But I see it
makes sense cause idev does deterministic destruction.
The only concern is idev cannot actually prevent new reference by
itself, need vdev to honor the flag. But just keep simple, don't
over-engineering.
Thanks,
Yilun
>
> > I think extending the life of shortterm_users at runtime makes things
> > complex.
>
> Well, it keeps the refcounting working the way it is supposed
> to. Holding a pointer without a reference is nasty and is creating
> alot of these weird problems and strange flows. Hold the refcount
> along with the pointer and things follow the normal patterns.
>
> There is nothing inherently wrong with shortterm users for this
> besides its name. Think of it is as 'wait for users', and we clearly
> want to wait for the vdev user to put back its idev reference.
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-30 19:34 ` Nicolin Chen
@ 2025-07-08 8:12 ` Xu Yilun
0 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-07-08 8:12 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
yilun.xu
On Mon, Jun 30, 2025 at 12:34:03PM -0700, Nicolin Chen wrote:
> On Fri, Jun 27, 2025 at 11:38:06AM +0800, Xu Yilun wrote:
> > +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> > +{
> > + struct iommufd_vdevice *vdev;
> > +
> > + mutex_lock(&idev->igroup->lock);
> > + /* vdev has been completely destroyed by userspace */
> > + if (!idev->vdev)
> > + goto out_unlock;
> > +
> > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> > + if (IS_ERR(vdev)) {
> > + /*
> > + * vdev is removed from xarray by userspace, but is not
> > + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant,
> > + * safe to destroy vdev here.
> > + */
> > + iommufd_vdevice_abort(&idev->vdev->obj);
> > + goto out_unlock;
>
> This is the case #3, i.e. a racing vdev destory, in the commit log?
>
> I think it is worth clarifying that there is a concurrent destroy:
> /*
> * An ongoing vdev destroy ioctl has removed the vdev from the
> * object xarray but has not finished iommufd_vdevice_destroy()
> * yet, as it is holding the same mutex.
Applied this part.
> * Destroy the vdev here,
> * i.e. the iommufd_vdevice_destroy() will be a NOP once it is
> * unlocked.
> */
>
> > @@ -147,10 +183,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
> > if (rc)
> > goto out_abort;
> > iommufd_object_finalize(ucmd->ictx, &vdev->obj);
> > - goto out_put_idev;
> > + goto out_unlock_igroup;
> >
> > out_abort:
> > iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
> > +out_unlock_igroup:
> > + mutex_unlock(&idev->igroup->lock);
>
> Looks like we will have to partially revert the _ucmd allocator,
> in this function:
> https://lore.kernel.org/all/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com/
>
> Please try fixing the conflicts on top of Jason's for-next tree.
Yes, will rebase for next version.
Thanks,
Yilun
>
> Thanks
> Nicolin
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy
2025-06-30 5:08 ` Baolu Lu
@ 2025-07-08 8:34 ` Xu Yilun
0 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-07-08 8:34 UTC (permalink / raw)
To: Baolu Lu
Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams,
yilun.xu
> > -void iommufd_vdevice_destroy(struct iommufd_object *obj)
> > +void iommufd_vdevice_abort(struct iommufd_object *obj)
> > {
> > struct iommufd_vdevice *vdev =
> > container_of(obj, struct iommufd_vdevice, obj);
> > struct iommufd_viommu *viommu = vdev->viommu;
> > + struct iommufd_device *idev = vdev->idev;
> > +
> > + lockdep_assert_held(&idev->igroup->lock);
> > +
> > + /*
> > + * iommufd_vdevice_abort() could be reentrant, by
> > + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only once.
> > + */
> > + if (!viommu)
> > + return;
> > /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */
> > xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
> > refcount_dec(&viommu->obj.users);
> > put_device(vdev->dev);
> > + vdev->viommu = NULL;
> > + idev->vdev = NULL;
>
> I feel it makes more sense to reorder the operations like this:
>
> vdev->viommu = NULL;
> vdev->idev = NULL;
> idev->vdev = NULL;
> put_device(vdev->dev);
>
> put_device(vdev->dev) could potentially trigger other code paths that
> might attempt to reference vdev or its associated pointers. Therefore,
> it's safer to null all the relevant reference pointers before calling
> put_device().
Yep. due to other changes, now I keep:
xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
refcount_dec(&viommu->obj.users);
+ idev->vdev = NULL;
put_device(vdev->dev);
Anyway, vdev->dev would be completely dropped in next patch, so it
doesn't matter much.
Thanks,
Yilun
>
> Others look good to me,
>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
>
> Thanks,
> baolu
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
2025-06-30 19:50 ` Nicolin Chen
@ 2025-07-08 8:45 ` Xu Yilun
0 siblings, 0 replies; 32+ messages in thread
From: Xu Yilun @ 2025-07-08 8:45 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel,
joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu,
yilun.xu
On Mon, Jun 30, 2025 at 12:50:57PM -0700, Nicolin Chen wrote:
> On Fri, Jun 27, 2025 at 11:38:05AM +0800, Xu Yilun wrote:
> > Add the iommufd_object_tombstone_user() helper, which allows the caller
> > to destroy an iommufd object created by userspace.
>
> Should we describe it "partially destroy"?
I don't prefer "partially destroy", it gives the impression the object
is still in memory, in fact with the helper it is totally destroyed and
freed. Only the ID is reserved, but the ID actually has nothing to do
with the object anymore.
>
> > This is useful on some destroy paths when the kernel caller finds the
> > object should have been removed by userspace but is still alive. With
> > this helper, the caller destroys the object but leave the object ID
> > reserved (so called tombstone). The tombstone prevents repurposing the
> > object ID without awareness of the original user.
> >
> > Since this happens for abnomal userspace behavior, for simplicity, the
>
> s/abnomal/abnormal
Applied, thanks for catching.
>
> Nicolin
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-07-08 8:54 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-06-30 3:08 ` Baolu Lu
2025-06-30 7:24 ` Xu Yilun
2025-06-30 5:52 ` Tian, Kevin
2025-06-30 6:41 ` Xu Yilun
2025-06-30 19:50 ` Nicolin Chen
2025-07-08 8:45 ` Xu Yilun
2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun
2025-06-30 5:08 ` Baolu Lu
2025-07-08 8:34 ` Xu Yilun
2025-06-30 6:27 ` Tian, Kevin
2025-06-30 10:18 ` Xu Yilun
2025-06-30 14:50 ` Jason Gunthorpe
2025-07-01 9:19 ` Xu Yilun
2025-07-01 12:13 ` Jason Gunthorpe
2025-07-02 2:23 ` Xu Yilun
2025-07-02 9:13 ` Tian, Kevin
2025-07-02 12:40 ` Jason Gunthorpe
2025-07-03 4:32 ` Tian, Kevin
2025-07-03 11:21 ` Jason Gunthorpe
2025-07-07 10:58 ` Xu Yilun
2025-07-07 12:25 ` Jason Gunthorpe
2025-07-07 19:41 ` Xu Yilun
2025-06-30 19:34 ` Nicolin Chen
2025-07-08 8:12 ` Xu Yilun
2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
2025-06-30 5:11 ` Baolu Lu
2025-07-04 15:06 ` Jason Gunthorpe
2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
2025-07-04 15:07 ` Jason Gunthorpe
2025-06-27 3:38 ` [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).