* [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind
@ 2025-07-16 7:03 Xu Yilun
2025-07-16 7:03 ` [PATCH v6 1/8] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
` (9 more replies)
0 siblings, 10 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
It is to solve the lifecycle issue that vdevice may outlive idevice. It
is a prerequisite for TIO, to ensure extra secure configurations (e.g.
TSM Bind/Unbind) against vdevice could be rolled back on idevice unbind,
so that VFIO could still work on the physical device without surprise.
Changelog:
v6:
- Fix compile error for ARM platform in Patch 5
- Adjust some more line wrappings in Patch 6
- Add review tags.
v5: https://lore.kernel.org/linux-iommu/aHdFWV9k9M7tRpD0@yilunxu-OptiPlex-7050/
- Further rebase to iommufd for-next 601b1d0d9395
- Keep the xa_empty() check in iommufd_fops_release(), update comments
- Move the *idev next to *viommu for struct iommufd_vdevice
- Update the description about IOMMUFD_CMD_VDEVICE_ALLOC for lifecycle
- Remove Baolu's tag for patch 4 because of big changes since v3
- Add changelog about idev->destroying
- Adjust line wrappings for tools/testing/selftests/iommu/iommufd.c
- Clarify that no testing for tombstoned ID repurposing.
- Add review tags.
v4: https://lore.kernel.org/linux-iommu/20250709040234.1773573-1-yilun.xu@linux.intel.com/
- Rebase to iommufd for-next.
- A new patch to roll back iommufd_object_alloc_ucmd() for vdevice.
- Fix the mistake trying to xa_destroy ictx->groups on
iommufd_fops_release().
- Move 'empty' flag inside destroy loop for iommufd_fops_release().
- Refactor vdev/idev destroy syncing.
- Drop the iommufd_vdevice_abort() reentrant idea.
- A new patch that adds pre_destroy() op.
- Hold short term reference during the whole vdevice's lifecycle.
- Wait on short term reference on idev's pre_destroy().
- Add a 'destroying' flag for idev to prevent new reference after
pre_destroy().
- Rephrase/fix some comments.
- Add review tags.
v3: https://lore.kernel.org/linux-iommu/20250627033809.1730752-1-yilun.xu@linux.intel.com/
- No bother clean each tombstone in iommufd_fops_release().
- Drop vdev->ictx initialization fix patch.
- Optimize control flow in iommufd_device_remove_vdev().
- Make iommufd_vdevice_abort() reentrant.
- Call iommufd_vdevice_abort() directly instead of waiting for it.
- Rephrase/fix some comments.
- A new patch to remove vdev->dev.
- A new patch to explicitly skip existing viommu tests for no_iommu.
- Also skip vdevice tombstone test for no_iommu.
- Allow me to add SoB from Aneesh.
v2: https://lore.kernel.org/linux-iommu/20250623094946.1714996-1-yilun.xu@linux.intel.com/
v1/rfc: https://lore.kernel.org/linux-iommu/20250610065146.1321816-1-aneesh.kumar@kernel.org/
The series is based on iommufd for-next
Xu Yilun (8):
iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
iommufd: Add iommufd_object_tombstone_user() helper
iommufd: Add a pre_destroy() op for objects
iommufd: Destroy vdevice on idevice destroy
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/selftest: Add coverage for vdevice tombstone
iommufd: Rename some shortterm-related identifiers
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 3 +-
drivers/iommu/iommufd/device.c | 51 +++
drivers/iommu/iommufd/driver.c | 10 +-
drivers/iommu/iommufd/iommufd_private.h | 49 ++-
drivers/iommu/iommufd/main.c | 69 +++-
drivers/iommu/iommufd/viommu.c | 69 +++-
include/linux/iommufd.h | 17 +-
include/uapi/linux/iommufd.h | 5 +
tools/testing/selftests/iommu/iommufd.c | 377 +++++++++---------
9 files changed, 419 insertions(+), 231 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 1/8] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 2/8] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
To solve the vdevice lifecycle issue, future patches make the vdevice
allocation protected by lock. That will make
_iommufd_object_alloc_ucmd() not applicable for vdevice. Roll back to
use _iommufd_object_alloc() for preparation.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/viommu.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 91339f799916..dcf8a85b9f6e 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -167,8 +167,8 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
vdev_size = viommu->ops->vdevice_size;
}
- vdev = (struct iommufd_vdevice *)_iommufd_object_alloc_ucmd(
- ucmd, vdev_size, IOMMUFD_OBJ_VDEVICE);
+ vdev = (struct iommufd_vdevice *)_iommufd_object_alloc(
+ ucmd->ictx, vdev_size, IOMMUFD_OBJ_VDEVICE);
if (IS_ERR(vdev)) {
rc = PTR_ERR(vdev);
goto out_put_idev;
@@ -183,18 +183,24 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
if (curr) {
rc = xa_err(curr) ?: -EEXIST;
- goto out_put_idev;
+ goto out_abort;
}
if (viommu->ops && viommu->ops->vdevice_init) {
rc = viommu->ops->vdevice_init(vdev);
if (rc)
- goto out_put_idev;
+ goto out_abort;
}
cmd->out_vdevice_id = vdev->obj.id;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+ if (rc)
+ goto out_abort;
+ iommufd_object_finalize(ucmd->ictx, &vdev->obj);
+ goto out_put_idev;
+out_abort:
+ iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
out_put_idev:
iommufd_put_object(ucmd->ictx, &idev->obj);
out_put_viommu:
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 2/8] iommufd: Add iommufd_object_tombstone_user() helper
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-07-16 7:03 ` [PATCH v6 1/8] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 3/8] iommufd: Add a pre_destroy() op for objects Xu Yilun
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Add the iommufd_object_tombstone_user() helper, which allows the caller
to destroy an iommufd object created by userspace.
This is useful on some destroy paths when the kernel caller finds the
object should have been removed by userspace but is still alive. With
this helper, the caller destroys the object but leave the object ID
reserved (so called tombstone). The tombstone prevents repurposing the
object ID without awareness of the original user.
Since this happens for abnormal userspace behavior, for simplicity, the
tombstoned object ID would be permanently leaked until
iommufd_fops_release(). I.e. the original user gets an error when
calling ioctl(IOMMU_DESTROY) on that ID.
The first use case would be to ensure the iommufd_vdevice can't outlive
the associated iommufd_device.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
drivers/iommu/iommufd/main.c | 24 +++++++++++++++++++++++-
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index cd14163abdd1..149545060029 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -202,7 +202,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
struct iommufd_object *obj);
enum {
- REMOVE_WAIT_SHORTTERM = 1,
+ REMOVE_WAIT_SHORTTERM = BIT(0),
+ REMOVE_OBJ_TOMBSTONE = BIT(1),
};
int iommufd_object_remove(struct iommufd_ctx *ictx,
struct iommufd_object *to_destroy, u32 id,
@@ -228,6 +229,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
WARN_ON(ret);
}
+/*
+ * Similar to iommufd_object_destroy_user(), except that the object ID is left
+ * reserved/tombstoned.
+ */
+static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
+ struct iommufd_object *obj)
+{
+ int ret;
+
+ ret = iommufd_object_remove(ictx, obj, obj->id,
+ REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
+
+ /*
+ * If there is a bug and we couldn't destroy the object then we did put
+ * back the caller's users refcount and will eventually try to free it
+ * again during close.
+ */
+ WARN_ON(ret);
+}
+
/*
* The HWPT allocated by autodomains is used in possibly many devices and
* is automatically destroyed when its refcount reaches zero.
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 69c2195e77ca..71135f0ec72d 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -225,7 +225,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
goto err_xa;
}
- xas_store(&xas, NULL);
+ xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
ictx->vfio_ioas = NULL;
xa_unlock(&ictx->objects);
@@ -311,19 +311,41 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
while (!xa_empty(&ictx->objects)) {
unsigned int destroyed = 0;
unsigned long index;
+ bool empty = true;
+ /*
+ * We can't use xa_empty() to end the loop as the tombstones
+ * are stored as XA_ZERO_ENTRY in the xarray. However
+ * xa_for_each() automatically converts them to NULL and skips
+ * them causing xa_empty() to be kept false. Thus once
+ * xa_for_each() finds no further !NULL entries the loop is
+ * done.
+ */
xa_for_each(&ictx->objects, index, obj) {
+ empty = false;
if (!refcount_dec_if_one(&obj->users))
continue;
+
destroyed++;
xa_erase(&ictx->objects, index);
iommufd_object_ops[obj->type].destroy(obj);
kfree(obj);
}
+
+ if (empty)
+ break;
+
/* Bug related to users refcount */
if (WARN_ON(!destroyed))
break;
}
+
+ /*
+ * There may be some tombstones left over from
+ * iommufd_object_tombstone_user()
+ */
+ xa_destroy(&ictx->objects);
+
WARN_ON(!xa_empty(&ictx->groups));
mutex_destroy(&ictx->sw_msi_lock);
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 3/8] iommufd: Add a pre_destroy() op for objects
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-07-16 7:03 ` [PATCH v6 1/8] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
2025-07-16 7:03 ` [PATCH v6 2/8] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 4/8] iommufd: Destroy vdevice on idevice destroy Xu Yilun
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Add a pre_destroy() op which gives objects a chance to clear their
short term users references before destruction. This op is intended for
external driver created objects (e.g. idev) which does deterministic
destruction.
In order to manage the lifecycle of interrelated objects as well as the
deterministic destruction (e.g. vdev can't outlive idev, and idev
destruction can't fail), short term users references are allowed to
live out of an ioctl execution. An immediate use case is, vdev holds
idev's short term user reference until vdev destruction completes, idev
leverages existing wait_shortterm mechanism to ensure it is destroyed
after vdev.
This extended usage makes the referenced object unable to just wait for
its reference gone. It needs to actively trigger the reference removal,
as well as prevent new references before wait. Should implement these
work in pre_destroy().
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 71135f0ec72d..53085d24ce4a 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -23,6 +23,7 @@
#include "iommufd_test.h"
struct iommufd_object_ops {
+ void (*pre_destroy)(struct iommufd_object *obj);
void (*destroy)(struct iommufd_object *obj);
void (*abort)(struct iommufd_object *obj);
};
@@ -160,6 +161,9 @@ static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
if (refcount_dec_and_test(&to_destroy->shortterm_users))
return 0;
+ if (iommufd_object_ops[to_destroy->type].pre_destroy)
+ iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
+
if (wait_event_timeout(ictx->destroy_wait,
refcount_read(&to_destroy->shortterm_users) == 0,
msecs_to_jiffies(60000)))
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 4/8] iommufd: Destroy vdevice on idevice destroy
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (2 preceding siblings ...)
2025-07-16 7:03 ` [PATCH v6 3/8] iommufd: Add a pre_destroy() op for objects Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 5/8] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so
that vdev can't outlive idev.
idev represents the physical device bound to iommufd, while the vdev
represents the virtual instance of the physical device in the VM. The
lifecycle of the vdev should not be longer than idev. This doesn't
cause real problem on existing use cases cause vdev doesn't impact the
physical device, only provides virtualization information. But to
extend vdev for Confidential Computing (CC), there are needs to do
secure configuration for the vdev, e.g. TSM Bind/Unbind. These
configurations should be rolled back on idev destroy, or the external
driver (VFIO) functionality may be impact.
The idev is created by external driver so its destruction can't fail.
The idev implements pre_destroy() op to actively remove its associated
vdev before destroying itself. There are 3 cases on idev pre_destroy():
1. vdev is already destroyed by userspace. No extra handling needed.
2. vdev is still alive. Use iommufd_object_tombstone_user() to
destroy vdev and tombstone the vdev ID.
3. vdev is being destroyed by userspace. The vdev ID is already
freed, but vdev destroy handler is not completed. This requires
multi-threads syncing - vdev holds idev's short term users
reference until vdev destruction completes, idev leverages
existing wait_shortterm mechanism for syncing.
idev should also block any new reference to it after pre_destroy(),
or the following wait shortterm would timeout. Introduce a 'destroying'
flag, set it to true on idev pre_destroy(). Any attempt to reference
idev should honor this flag under the protection of
idev->igroup->lock.
Originally-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/device.c | 51 ++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 12 ++++++
drivers/iommu/iommufd/main.c | 2 +
drivers/iommu/iommufd/viommu.c | 52 +++++++++++++++++++++++--
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 5 +++
6 files changed, 119 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e2ba21c43ad2..ee6ff4caf398 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -137,6 +137,57 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
}
}
+static void iommufd_device_remove_vdev(struct iommufd_device *idev)
+{
+ struct iommufd_vdevice *vdev;
+
+ mutex_lock(&idev->igroup->lock);
+ /* prevent new references from vdev */
+ idev->destroying = true;
+ /* vdev has been completely destroyed by userspace */
+ if (!idev->vdev)
+ goto out_unlock;
+
+ vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
+ /*
+ * An ongoing vdev destroy ioctl has removed the vdev from the object
+ * xarray, but has not finished iommufd_vdevice_destroy() yet as it
+ * needs the same mutex. We exit the locking then wait on short term
+ * users for the vdev destruction.
+ */
+ if (IS_ERR(vdev))
+ goto out_unlock;
+
+ /* Should never happen */
+ if (WARN_ON(vdev != idev->vdev)) {
+ iommufd_put_object(idev->ictx, &vdev->obj);
+ goto out_unlock;
+ }
+
+ /*
+ * vdev is still alive. Hold a users refcount to prevent racing with
+ * userspace destruction, then use iommufd_object_tombstone_user() to
+ * destroy it and leave a tombstone.
+ */
+ refcount_inc(&vdev->obj.users);
+ iommufd_put_object(idev->ictx, &vdev->obj);
+ mutex_unlock(&idev->igroup->lock);
+ iommufd_object_tombstone_user(idev->ictx, &vdev->obj);
+ return;
+
+out_unlock:
+ mutex_unlock(&idev->igroup->lock);
+}
+
+void iommufd_device_pre_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_device *idev =
+ container_of(obj, struct iommufd_device, obj);
+
+ /* Release the short term users on this */
+ iommufd_device_remove_vdev(idev);
+}
+
void iommufd_device_destroy(struct iommufd_object *obj)
{
struct iommufd_device *idev =
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 149545060029..5d6ea5395cfe 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -489,6 +489,8 @@ struct iommufd_device {
/* always the physical device */
struct device *dev;
bool enforce_cache_coherency;
+ struct iommufd_vdevice *vdev;
+ bool destroying;
};
static inline struct iommufd_device *
@@ -499,6 +501,7 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
struct iommufd_device, obj);
}
+void iommufd_device_pre_destroy(struct iommufd_object *obj);
void iommufd_device_destroy(struct iommufd_object *obj);
int iommufd_get_hw_info(struct iommufd_ucmd *ucmd);
@@ -687,9 +690,18 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
void iommufd_viommu_destroy(struct iommufd_object *obj);
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
void iommufd_vdevice_destroy(struct iommufd_object *obj);
+void iommufd_vdevice_abort(struct iommufd_object *obj);
int iommufd_hw_queue_alloc_ioctl(struct iommufd_ucmd *ucmd);
void iommufd_hw_queue_destroy(struct iommufd_object *obj);
+static inline struct iommufd_vdevice *
+iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id)
+{
+ return container_of(iommufd_get_object(ictx, id,
+ IOMMUFD_OBJ_VDEVICE),
+ struct iommufd_vdevice, obj);
+}
+
#ifdef CONFIG_IOMMUFD_TEST
int iommufd_test(struct iommufd_ucmd *ucmd);
void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 53085d24ce4a..99c1aab3d396 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -655,6 +655,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
.destroy = iommufd_access_destroy_object,
},
[IOMMUFD_OBJ_DEVICE] = {
+ .pre_destroy = iommufd_device_pre_destroy,
.destroy = iommufd_device_destroy,
},
[IOMMUFD_OBJ_FAULT] = {
@@ -676,6 +677,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
},
[IOMMUFD_OBJ_VDEVICE] = {
.destroy = iommufd_vdevice_destroy,
+ .abort = iommufd_vdevice_abort,
},
[IOMMUFD_OBJ_VEVENTQ] = {
.destroy = iommufd_veventq_destroy,
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index dcf8a85b9f6e..ecbae5091ffe 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -110,20 +110,37 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
return rc;
}
-void iommufd_vdevice_destroy(struct iommufd_object *obj)
+void iommufd_vdevice_abort(struct iommufd_object *obj)
{
struct iommufd_vdevice *vdev =
container_of(obj, struct iommufd_vdevice, obj);
struct iommufd_viommu *viommu = vdev->viommu;
+ struct iommufd_device *idev = vdev->idev;
+
+ lockdep_assert_held(&idev->igroup->lock);
if (vdev->destroy)
vdev->destroy(vdev);
/* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */
xa_cmpxchg(&viommu->vdevs, vdev->virt_id, vdev, NULL, GFP_KERNEL);
refcount_dec(&viommu->obj.users);
+ idev->vdev = NULL;
put_device(vdev->dev);
}
+void iommufd_vdevice_destroy(struct iommufd_object *obj)
+{
+ struct iommufd_vdevice *vdev =
+ container_of(obj, struct iommufd_vdevice, obj);
+ struct iommufd_device *idev = vdev->idev;
+ struct iommufd_ctx *ictx = idev->ictx;
+
+ mutex_lock(&idev->igroup->lock);
+ iommufd_vdevice_abort(obj);
+ mutex_unlock(&idev->igroup->lock);
+ iommufd_put_object(ictx, &idev->obj);
+}
+
int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
{
struct iommu_vdevice_alloc *cmd = ucmd->cmd;
@@ -153,6 +170,17 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
goto out_put_idev;
}
+ mutex_lock(&idev->igroup->lock);
+ if (idev->destroying) {
+ rc = -ENOENT;
+ goto out_unlock_igroup;
+ }
+
+ if (idev->vdev) {
+ rc = -EEXIST;
+ goto out_unlock_igroup;
+ }
+
if (viommu->ops && viommu->ops->vdevice_size) {
/*
* It is a driver bug for:
@@ -171,7 +199,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
ucmd->ictx, vdev_size, IOMMUFD_OBJ_VDEVICE);
if (IS_ERR(vdev)) {
rc = PTR_ERR(vdev);
- goto out_put_idev;
+ goto out_unlock_igroup;
}
vdev->virt_id = virt_id;
@@ -179,6 +207,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
get_device(idev->dev);
vdev->viommu = viommu;
refcount_inc(&viommu->obj.users);
+ /*
+ * A short term users reference is held on the idev so long as we have
+ * the pointer. iommufd_device_pre_destroy() will revoke it before the
+ * idev real destruction.
+ */
+ vdev->idev = idev;
+
+ /*
+ * iommufd_device_destroy() delays until idev->vdev is NULL before
+ * freeing the idev, which only happens once the vdev is finished
+ * destruction.
+ */
+ idev->vdev = vdev;
curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
if (curr) {
@@ -197,12 +238,15 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
if (rc)
goto out_abort;
iommufd_object_finalize(ucmd->ictx, &vdev->obj);
- goto out_put_idev;
+ goto out_unlock_igroup;
out_abort:
iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
+out_unlock_igroup:
+ mutex_unlock(&idev->igroup->lock);
out_put_idev:
- iommufd_put_object(ucmd->ictx, &idev->obj);
+ if (rc)
+ iommufd_put_object(ucmd->ictx, &idev->obj);
out_put_viommu:
iommufd_put_object(ucmd->ictx, &viommu->obj);
return rc;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index e3a0cd47384d..b88911026bc4 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -108,6 +108,7 @@ struct iommufd_viommu {
struct iommufd_vdevice {
struct iommufd_object obj;
struct iommufd_viommu *viommu;
+ struct iommufd_device *idev;
struct device *dev;
/*
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 554aacf89ea7..c218c89e0e2e 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -1070,6 +1070,11 @@ struct iommu_viommu_alloc {
*
* Allocate a virtual device instance (for a physical device) against a vIOMMU.
* This instance holds the device's information (related to its vIOMMU) in a VM.
+ * User should use IOMMU_DESTROY to destroy the virtual device before
+ * destroying the physical device (by closing vfio_cdev fd). Otherwise the
+ * virtual device would be forcibly destroyed on physical device destruction,
+ * its vdevice_id would be permanently leaked (unremovable & unreusable) until
+ * iommu fd closed.
*/
struct iommu_vdevice_alloc {
__u32 size;
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 5/8] iommufd/vdevice: Remove struct device reference from struct vdevice
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (3 preceding siblings ...)
2025-07-16 7:03 ` [PATCH v6 4/8] iommufd: Destroy vdevice on idevice destroy Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 6/8] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
` (4 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Remove struct device *dev from struct vdevice.
The dev pointer is the Plan B for vdevice to reference the physical
device. As now vdev->idev is added without refcounting concern, just
use vdev->idev->dev when needed. To avoid exposing
struct iommufd_device in the public header, export a
iommufd_vdevice_to_device() helper.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 3 ++-
drivers/iommu/iommufd/driver.c | 10 ++++++++--
drivers/iommu/iommufd/viommu.c | 3 ---
include/linux/iommufd.h | 8 +++++++-
4 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index eb90af5093d8..4c86eacd36b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -1218,7 +1218,8 @@ static void tegra241_vintf_destroy_vsid(struct iommufd_vdevice *vdev)
static int tegra241_vintf_init_vsid(struct iommufd_vdevice *vdev)
{
- struct arm_smmu_master *master = dev_iommu_priv_get(vdev->dev);
+ struct device *dev = iommufd_vdevice_to_device(vdev);
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
struct tegra241_vintf *vintf = viommu_to_vintf(vdev->viommu);
struct tegra241_vintf_sid *vsid = vdev_to_vsid(vdev);
struct arm_smmu_stream *stream = &master->streams[0];
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index e4eae20bcd4e..6f1010da221c 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -83,6 +83,12 @@ void _iommufd_destroy_mmap(struct iommufd_ctx *ictx,
}
EXPORT_SYMBOL_NS_GPL(_iommufd_destroy_mmap, "IOMMUFD");
+struct device *iommufd_vdevice_to_device(struct iommufd_vdevice *vdev)
+{
+ return vdev->idev->dev;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_vdevice_to_device, "IOMMUFD");
+
/* Caller should xa_lock(&viommu->vdevs) to protect the return value */
struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id)
@@ -92,7 +98,7 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
lockdep_assert_held(&viommu->vdevs.xa_lock);
vdev = xa_load(&viommu->vdevs, vdev_id);
- return vdev ? vdev->dev : NULL;
+ return vdev ? iommufd_vdevice_to_device(vdev) : NULL;
}
EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, "IOMMUFD");
@@ -109,7 +115,7 @@ int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
xa_lock(&viommu->vdevs);
xa_for_each(&viommu->vdevs, index, vdev) {
- if (vdev->dev == dev) {
+ if (iommufd_vdevice_to_device(vdev) == dev) {
*vdev_id = vdev->virt_id;
rc = 0;
break;
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index ecbae5091ffe..6cf0bd5d8f08 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -125,7 +125,6 @@ void iommufd_vdevice_abort(struct iommufd_object *obj)
xa_cmpxchg(&viommu->vdevs, vdev->virt_id, vdev, NULL, GFP_KERNEL);
refcount_dec(&viommu->obj.users);
idev->vdev = NULL;
- put_device(vdev->dev);
}
void iommufd_vdevice_destroy(struct iommufd_object *obj)
@@ -203,8 +202,6 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
}
vdev->virt_id = virt_id;
- vdev->dev = idev->dev;
- get_device(idev->dev);
vdev->viommu = viommu;
refcount_inc(&viommu->obj.users);
/*
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index b88911026bc4..810e4d8ac912 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -109,7 +109,6 @@ struct iommufd_vdevice {
struct iommufd_object obj;
struct iommufd_viommu *viommu;
struct iommufd_device *idev;
- struct device *dev;
/*
* Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID of
@@ -261,6 +260,7 @@ int _iommufd_alloc_mmap(struct iommufd_ctx *ictx, struct iommufd_object *owner,
unsigned long *offset);
void _iommufd_destroy_mmap(struct iommufd_ctx *ictx,
struct iommufd_object *owner, unsigned long offset);
+struct device *iommufd_vdevice_to_device(struct iommufd_vdevice *vdev);
struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu,
unsigned long vdev_id);
int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu,
@@ -295,6 +295,12 @@ static inline void _iommufd_destroy_mmap(struct iommufd_ctx *ictx,
{
}
+static inline struct device *
+iommufd_vdevice_to_device(struct iommufd_vdevice *vdev)
+{
+ return NULL;
+}
+
static inline struct device *
iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 6/8] iommufd/selftest: Explicitly skip tests for inapplicable variant
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (4 preceding siblings ...)
2025-07-16 7:03 ` [PATCH v6 5/8] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 7/8] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
no_viommu is not applicable for some viommu/vdevice tests. Explicitly
report the skipping, don't do it silently.
Opportunistically adjust the line wrappings after the indentation
changes using git clang-format.
Only add the prints. No functional change intended.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
tools/testing/selftests/iommu/iommufd.c | 363 ++++++++++++------------
1 file changed, 176 insertions(+), 187 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d59d48022a24..1b629bedeb1c 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2779,35 +2779,32 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf)
uint32_t fault_fd;
uint32_t vdev_id;
- if (self->device_id) {
- test_ioctl_fault_alloc(&fault_id, &fault_fd);
- test_err_hwpt_alloc_iopf(
- ENOENT, dev_id, viommu_id, UINT32_MAX,
- IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
- IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
- test_err_hwpt_alloc_iopf(
- EOPNOTSUPP, dev_id, viommu_id, fault_id,
- IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id,
- IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
- test_cmd_hwpt_alloc_iopf(
- dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID,
- &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
- sizeof(data));
+ if (!dev_id)
+ SKIP(return, "Skipping test for variant no_viommu");
- /* Must allocate vdevice before attaching to a nested hwpt */
- test_err_mock_domain_replace(ENOENT, self->stdev_id,
- iopf_hwpt_id);
- test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
- test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
- EXPECT_ERRNO(EBUSY,
- _test_ioctl_destroy(self->fd, iopf_hwpt_id));
- test_cmd_trigger_iopf(dev_id, fault_fd);
+ test_ioctl_fault_alloc(&fault_id, &fault_fd);
+ test_err_hwpt_alloc_iopf(ENOENT, dev_id, viommu_id, UINT32_MAX,
+ IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+ IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
+ test_err_hwpt_alloc_iopf(EOPNOTSUPP, dev_id, viommu_id, fault_id,
+ IOMMU_HWPT_FAULT_ID_VALID | (1 << 31),
+ &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data,
+ sizeof(data));
+ test_cmd_hwpt_alloc_iopf(dev_id, viommu_id, fault_id,
+ IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+ IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data));
- test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
- test_ioctl_destroy(iopf_hwpt_id);
- close(fault_fd);
- test_ioctl_destroy(fault_id);
- }
+ /* Must allocate vdevice before attaching to a nested hwpt */
+ test_err_mock_domain_replace(ENOENT, self->stdev_id, iopf_hwpt_id);
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+ test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+ EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+ test_cmd_trigger_iopf(dev_id, fault_fd);
+
+ test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+ test_ioctl_destroy(iopf_hwpt_id);
+ close(fault_fd);
+ test_ioctl_destroy(fault_id);
}
TEST_F(iommufd_viommu, viommu_alloc_with_data)
@@ -2902,169 +2899,161 @@ TEST_F(iommufd_viommu, vdevice_cache)
uint32_t vdev_id = 0;
uint32_t num_inv;
- if (dev_id) {
- test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
-
- test_cmd_dev_check_cache_all(dev_id,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /* Check data_type by passing zero-length array */
- num_inv = 0;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: Invalid data_type */
- num_inv = 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: structure size sanity */
- num_inv = 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs) + 1, &num_inv);
- assert(!num_inv);
-
- num_inv = 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- 1, &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid flag is passed */
- num_inv = 1;
- inv_reqs[0].flags = 0xffffffff;
- inv_reqs[0].vdev_id = 0x99;
- test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid data_uptr when array is not empty */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid entry_len when array is not empty */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- 0, &num_inv);
- assert(!num_inv);
-
- /* Negative test: invalid cache_id */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
+ if (!dev_id)
+ SKIP(return, "Skipping test for variant no_viommu");
- /* Negative test: invalid vdev_id */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x9;
- inv_reqs[0].cache_id = 0;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(!num_inv);
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+
+ test_cmd_dev_check_cache_all(dev_id, IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+ /* Check data_type by passing zero-length array */
+ num_inv = 0;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs, sizeof(*inv_reqs),
+ &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: Invalid data_type */
+ num_inv = 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: structure size sanity */
+ num_inv = 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs) + 1, &num_inv);
+ assert(!num_inv);
+
+ num_inv = 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 1,
+ &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid flag is passed */
+ num_inv = 1;
+ inv_reqs[0].flags = 0xffffffff;
+ inv_reqs[0].vdev_id = 0x99;
+ test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid data_uptr when array is not empty */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid entry_len when array is not empty */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 0,
+ &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid cache_id */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: invalid vdev_id */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x9;
+ inv_reqs[0].cache_id = 0;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
- /*
- * Invalidate the 1st cache entry but fail the 2nd request
- * due to invalid flags configuration in the 2nd request.
- */
- num_inv = 2;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 0;
- inv_reqs[1].flags = 0xffffffff;
- inv_reqs[1].vdev_id = 0x99;
- inv_reqs[1].cache_id = 1;
- test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache(dev_id, 0, 0);
- test_cmd_dev_check_cache(dev_id, 1,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 2,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 3,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /*
- * Invalidate the 1st cache entry but fail the 2nd request
- * due to invalid cache_id configuration in the 2nd request.
- */
- num_inv = 2;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 0;
- inv_reqs[1].flags = 0;
- inv_reqs[1].vdev_id = 0x99;
- inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
- test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache(dev_id, 0, 0);
- test_cmd_dev_check_cache(dev_id, 1,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 2,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 3,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /* Invalidate the 2nd cache entry and verify */
- num_inv = 1;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 1;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache(dev_id, 0, 0);
- test_cmd_dev_check_cache(dev_id, 1, 0);
- test_cmd_dev_check_cache(dev_id, 2,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
- test_cmd_dev_check_cache(dev_id, 3,
- IOMMU_TEST_DEV_CACHE_DEFAULT);
-
- /* Invalidate the 3rd and 4th cache entries and verify */
- num_inv = 2;
- inv_reqs[0].flags = 0;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].cache_id = 2;
- inv_reqs[1].flags = 0;
- inv_reqs[1].vdev_id = 0x99;
- inv_reqs[1].cache_id = 3;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 2);
- test_cmd_dev_check_cache_all(dev_id, 0);
+ /*
+ * Invalidate the 1st cache entry but fail the 2nd request
+ * due to invalid flags configuration in the 2nd request.
+ */
+ num_inv = 2;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 0;
+ inv_reqs[1].flags = 0xffffffff;
+ inv_reqs[1].vdev_id = 0x99;
+ inv_reqs[1].cache_id = 1;
+ test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache(dev_id, 0, 0);
+ test_cmd_dev_check_cache(dev_id, 1, IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 2, IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 3, IOMMU_TEST_DEV_CACHE_DEFAULT);
- /* Invalidate all cache entries for nested_dev_id[1] and verify */
- num_inv = 1;
- inv_reqs[0].vdev_id = 0x99;
- inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
- test_cmd_viommu_invalidate(viommu_id, inv_reqs,
- sizeof(*inv_reqs), &num_inv);
- assert(num_inv == 1);
- test_cmd_dev_check_cache_all(dev_id, 0);
- test_ioctl_destroy(vdev_id);
- }
+ /*
+ * Invalidate the 1st cache entry but fail the 2nd request
+ * due to invalid cache_id configuration in the 2nd request.
+ */
+ num_inv = 2;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 0;
+ inv_reqs[1].flags = 0;
+ inv_reqs[1].vdev_id = 0x99;
+ inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+ test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+ IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache(dev_id, 0, 0);
+ test_cmd_dev_check_cache(dev_id, 1, IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 2, IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 3, IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+ /* Invalidate the 2nd cache entry and verify */
+ num_inv = 1;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 1;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs, sizeof(*inv_reqs),
+ &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache(dev_id, 0, 0);
+ test_cmd_dev_check_cache(dev_id, 1, 0);
+ test_cmd_dev_check_cache(dev_id, 2, IOMMU_TEST_DEV_CACHE_DEFAULT);
+ test_cmd_dev_check_cache(dev_id, 3, IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+ /* Invalidate the 3rd and 4th cache entries and verify */
+ num_inv = 2;
+ inv_reqs[0].flags = 0;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].cache_id = 2;
+ inv_reqs[1].flags = 0;
+ inv_reqs[1].vdev_id = 0x99;
+ inv_reqs[1].cache_id = 3;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs, sizeof(*inv_reqs),
+ &num_inv);
+ assert(num_inv == 2);
+ test_cmd_dev_check_cache_all(dev_id, 0);
+
+ /* Invalidate all cache entries for nested_dev_id[1] and verify */
+ num_inv = 1;
+ inv_reqs[0].vdev_id = 0x99;
+ inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
+ test_cmd_viommu_invalidate(viommu_id, inv_reqs, sizeof(*inv_reqs),
+ &num_inv);
+ assert(num_inv == 1);
+ test_cmd_dev_check_cache_all(dev_id, 0);
+ test_ioctl_destroy(vdev_id);
}
TEST_F(iommufd_viommu, hw_queue)
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 7/8] iommufd/selftest: Add coverage for vdevice tombstone
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (5 preceding siblings ...)
2025-07-16 7:03 ` [PATCH v6 6/8] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-16 7:03 ` [PATCH v6 8/8] iommufd: Rename some shortterm-related identifiers Xu Yilun
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
This tests the flow to tombstone vdevice when idevice is to be unbound
before vdevice destruction. The expected results of the tombstone are:
- The vdevice ID can't be reused anymore (not tested in this patch).
- Even ioctl(IOMMU_DESTROY) can't free the vdevice ID.
- iommufd_fops_release() can still free everything.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
tools/testing/selftests/iommu/iommufd.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1b629bedeb1c..26fe912c62ae 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -3115,6 +3115,20 @@ TEST_F(iommufd_viommu, hw_queue)
test_ioctl_ioas_unmap(iova, PAGE_SIZE);
}
+TEST_F(iommufd_viommu, vdevice_tombstone)
+{
+ uint32_t viommu_id = self->viommu_id;
+ uint32_t dev_id = self->device_id;
+ uint32_t vdev_id = 0;
+
+ if (!dev_id)
+ SKIP(return, "Skipping test for variant no_viommu");
+
+ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+ test_ioctl_destroy(self->stdev_id);
+ EXPECT_ERRNO(ENOENT, _test_ioctl_destroy(self->fd, vdev_id));
+}
+
FIXTURE(iommufd_device_pasid)
{
int fd;
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 8/8] iommufd: Rename some shortterm-related identifiers
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (6 preceding siblings ...)
2025-07-16 7:03 ` [PATCH v6 7/8] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
@ 2025-07-16 7:03 ` Xu Yilun
2025-07-18 9:13 ` [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Aneesh Kumar K.V
2025-07-18 17:30 ` Jason Gunthorpe
9 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-16 7:03 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Rename the shortterm-related identifiers to wait-related.
The usage of shortterm_users refcount is now beyond its name. It is
also used for references which live longer than an ioctl execution.
E.g. vdev holds idev's shortterm_users refcount on vdev allocation,
releases it during idev's pre_destroy(). Rename the refcount as
wait_cnt, since it is always used to sync the referencing & the
destruction of the object by waiting for it to go to zero.
List all changed identifiers:
iommufd_object::shortterm_users -> iommufd_object::wait_cnt
REMOVE_WAIT_SHORTTERM -> REMOVE_WAIT
iommufd_object_dec_wait_shortterm() -> iommufd_object_dec_wait()
zerod_shortterm -> zerod_wait_cnt
No functional change intended.
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
drivers/iommu/iommufd/device.c | 6 ++--
drivers/iommu/iommufd/iommufd_private.h | 18 ++++++------
drivers/iommu/iommufd/main.c | 39 +++++++++++++------------
drivers/iommu/iommufd/viommu.c | 4 +--
include/linux/iommufd.h | 8 ++++-
5 files changed, 41 insertions(+), 34 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index ee6ff4caf398..65fbd098f9e9 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -152,8 +152,8 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev)
/*
* An ongoing vdev destroy ioctl has removed the vdev from the object
* xarray, but has not finished iommufd_vdevice_destroy() yet as it
- * needs the same mutex. We exit the locking then wait on short term
- * users for the vdev destruction.
+ * needs the same mutex. We exit the locking then wait on wait_cnt
+ * reference for the vdev destruction.
*/
if (IS_ERR(vdev))
goto out_unlock;
@@ -184,7 +184,7 @@ void iommufd_device_pre_destroy(struct iommufd_object *obj)
struct iommufd_device *idev =
container_of(obj, struct iommufd_device, obj);
- /* Release the short term users on this */
+ /* Release the wait_cnt reference on this */
iommufd_device_remove_vdev(idev);
}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 5d6ea5395cfe..0da2a81eedfa 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -169,7 +169,7 @@ static inline bool iommufd_lock_obj(struct iommufd_object *obj)
{
if (!refcount_inc_not_zero(&obj->users))
return false;
- if (!refcount_inc_not_zero(&obj->shortterm_users)) {
+ if (!refcount_inc_not_zero(&obj->wait_cnt)) {
/*
* If the caller doesn't already have a ref on obj this must be
* called under the xa_lock. Otherwise the caller is holding a
@@ -187,11 +187,11 @@ static inline void iommufd_put_object(struct iommufd_ctx *ictx,
struct iommufd_object *obj)
{
/*
- * Users first, then shortterm so that REMOVE_WAIT_SHORTTERM never sees
- * a spurious !0 users with a 0 shortterm_users.
+ * Users first, then wait_cnt so that REMOVE_WAIT never sees a spurious
+ * !0 users with a 0 wait_cnt.
*/
refcount_dec(&obj->users);
- if (refcount_dec_and_test(&obj->shortterm_users))
+ if (refcount_dec_and_test(&obj->wait_cnt))
wake_up_interruptible_all(&ictx->destroy_wait);
}
@@ -202,7 +202,7 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
struct iommufd_object *obj);
enum {
- REMOVE_WAIT_SHORTTERM = BIT(0),
+ REMOVE_WAIT = BIT(0),
REMOVE_OBJ_TOMBSTONE = BIT(1),
};
int iommufd_object_remove(struct iommufd_ctx *ictx,
@@ -211,15 +211,15 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
/*
* The caller holds a users refcount and wants to destroy the object. At this
- * point the caller has no shortterm_users reference and at least the xarray
- * will be holding one.
+ * point the caller has no wait_cnt reference and at least the xarray will be
+ * holding one.
*/
static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
struct iommufd_object *obj)
{
int ret;
- ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
+ ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT);
/*
* If there is a bug and we couldn't destroy the object then we did put
@@ -239,7 +239,7 @@ static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
int ret;
ret = iommufd_object_remove(ictx, obj, obj->id,
- REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
+ REMOVE_WAIT | REMOVE_OBJ_TOMBSTONE);
/*
* If there is a bug and we couldn't destroy the object then we did put
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 99c1aab3d396..15af7ced0501 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -42,7 +42,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
return ERR_PTR(-ENOMEM);
obj->type = type;
/* Starts out bias'd by 1 until it is removed from the xarray */
- refcount_set(&obj->shortterm_users, 1);
+ refcount_set(&obj->wait_cnt, 1);
refcount_set(&obj->users, 1);
/*
@@ -155,22 +155,22 @@ struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id,
return obj;
}
-static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx,
- struct iommufd_object *to_destroy)
+static int iommufd_object_dec_wait(struct iommufd_ctx *ictx,
+ struct iommufd_object *to_destroy)
{
- if (refcount_dec_and_test(&to_destroy->shortterm_users))
+ if (refcount_dec_and_test(&to_destroy->wait_cnt))
return 0;
if (iommufd_object_ops[to_destroy->type].pre_destroy)
iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy);
if (wait_event_timeout(ictx->destroy_wait,
- refcount_read(&to_destroy->shortterm_users) == 0,
+ refcount_read(&to_destroy->wait_cnt) == 0,
msecs_to_jiffies(60000)))
return 0;
pr_crit("Time out waiting for iommufd object to become free\n");
- refcount_inc(&to_destroy->shortterm_users);
+ refcount_inc(&to_destroy->wait_cnt);
return -EBUSY;
}
@@ -184,17 +184,18 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
{
struct iommufd_object *obj;
XA_STATE(xas, &ictx->objects, id);
- bool zerod_shortterm = false;
+ bool zerod_wait_cnt = false;
int ret;
/*
- * The purpose of the shortterm_users is to ensure deterministic
- * destruction of objects used by external drivers and destroyed by this
- * function. Any temporary increment of the refcount must increment
- * shortterm_users, such as during ioctl execution.
+ * The purpose of the wait_cnt is to ensure deterministic destruction
+ * of objects used by external drivers and destroyed by this function.
+ * Incrementing this wait_cnt should either be short lived, such as
+ * during ioctl execution, or be revoked and blocked during
+ * pre_destroy(), such as vdev holding the idev's refcount.
*/
- if (flags & REMOVE_WAIT_SHORTTERM) {
- ret = iommufd_object_dec_wait_shortterm(ictx, to_destroy);
+ if (flags & REMOVE_WAIT) {
+ ret = iommufd_object_dec_wait(ictx, to_destroy);
if (ret) {
/*
* We have a bug. Put back the callers reference and
@@ -203,7 +204,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
refcount_dec(&to_destroy->users);
return ret;
}
- zerod_shortterm = true;
+ zerod_wait_cnt = true;
}
xa_lock(&ictx->objects);
@@ -235,11 +236,11 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
xa_unlock(&ictx->objects);
/*
- * Since users is zero any positive users_shortterm must be racing
+ * Since users is zero any positive wait_cnt must be racing
* iommufd_put_object(), or we have a bug.
*/
- if (!zerod_shortterm) {
- ret = iommufd_object_dec_wait_shortterm(ictx, obj);
+ if (!zerod_wait_cnt) {
+ ret = iommufd_object_dec_wait(ictx, obj);
if (WARN_ON(ret))
return ret;
}
@@ -249,9 +250,9 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
return 0;
err_xa:
- if (zerod_shortterm) {
+ if (zerod_wait_cnt) {
/* Restore the xarray owned reference */
- refcount_set(&obj->shortterm_users, 1);
+ refcount_set(&obj->wait_cnt, 1);
}
xa_unlock(&ictx->objects);
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 6cf0bd5d8f08..2ca5809b238b 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -205,8 +205,8 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
vdev->viommu = viommu;
refcount_inc(&viommu->obj.users);
/*
- * A short term users reference is held on the idev so long as we have
- * the pointer. iommufd_device_pre_destroy() will revoke it before the
+ * A wait_cnt reference is held on the idev so long as we have the
+ * pointer. iommufd_device_pre_destroy() will revoke it before the
* idev real destruction.
*/
vdev->idev = idev;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 810e4d8ac912..6e7efe83bc5d 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -46,7 +46,13 @@ enum iommufd_object_type {
/* Base struct for all objects with a userspace ID handle. */
struct iommufd_object {
- refcount_t shortterm_users;
+ /*
+ * Destroy will sleep and wait for wait_cnt to go to zero. This allows
+ * concurrent users of the ID to reliably avoid causing a spurious
+ * destroy failure. Incrementing this count should either be short
+ * lived or be revoked and blocked during pre_destroy().
+ */
+ refcount_t wait_cnt;
refcount_t users;
enum iommufd_object_type type;
unsigned int id;
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (7 preceding siblings ...)
2025-07-16 7:03 ` [PATCH v6 8/8] iommufd: Rename some shortterm-related identifiers Xu Yilun
@ 2025-07-18 9:13 ` Aneesh Kumar K.V
2025-07-19 15:33 ` Xu Yilun
2025-07-18 17:30 ` Jason Gunthorpe
9 siblings, 1 reply; 12+ messages in thread
From: Aneesh Kumar K.V @ 2025-07-18 9:13 UTC (permalink / raw)
To: Xu Yilun, jgg, jgg, kevin.tian, will
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
Xu Yilun <yilun.xu@linux.intel.com> writes:
> It is to solve the lifecycle issue that vdevice may outlive idevice. It
> is a prerequisite for TIO, to ensure extra secure configurations (e.g.
> TSM Bind/Unbind) against vdevice could be rolled back on idevice unbind,
> so that VFIO could still work on the physical device without surprise.
>
> Changelog:
> v6:
> - Fix compile error for ARM platform in Patch 5
> - Adjust some more line wrappings in Patch 6
> - Add review tags.
>
> v5: https://lore.kernel.org/linux-iommu/aHdFWV9k9M7tRpD0@yilunxu-OptiPlex-7050/
> - Further rebase to iommufd for-next 601b1d0d9395
> - Keep the xa_empty() check in iommufd_fops_release(), update comments
> - Move the *idev next to *viommu for struct iommufd_vdevice
> - Update the description about IOMMUFD_CMD_VDEVICE_ALLOC for lifecycle
> - Remove Baolu's tag for patch 4 because of big changes since v3
> - Add changelog about idev->destroying
> - Adjust line wrappings for tools/testing/selftests/iommu/iommufd.c
> - Clarify that no testing for tombstoned ID repurposing.
> - Add review tags.
>
> v4: https://lore.kernel.org/linux-iommu/20250709040234.1773573-1-yilun.xu@linux.intel.com/
> - Rebase to iommufd for-next.
> - A new patch to roll back iommufd_object_alloc_ucmd() for vdevice.
> - Fix the mistake trying to xa_destroy ictx->groups on
> iommufd_fops_release().
> - Move 'empty' flag inside destroy loop for iommufd_fops_release().
> - Refactor vdev/idev destroy syncing.
> - Drop the iommufd_vdevice_abort() reentrant idea.
> - A new patch that adds pre_destroy() op.
> - Hold short term reference during the whole vdevice's lifecycle.
> - Wait on short term reference on idev's pre_destroy().
> - Add a 'destroying' flag for idev to prevent new reference after
> pre_destroy().
> - Rephrase/fix some comments.
> - Add review tags.
>
> v3: https://lore.kernel.org/linux-iommu/20250627033809.1730752-1-yilun.xu@linux.intel.com/
> - No bother clean each tombstone in iommufd_fops_release().
> - Drop vdev->ictx initialization fix patch.
> - Optimize control flow in iommufd_device_remove_vdev().
> - Make iommufd_vdevice_abort() reentrant.
> - Call iommufd_vdevice_abort() directly instead of waiting for it.
> - Rephrase/fix some comments.
> - A new patch to remove vdev->dev.
> - A new patch to explicitly skip existing viommu tests for no_iommu.
> - Also skip vdevice tombstone test for no_iommu.
> - Allow me to add SoB from Aneesh.
>
> v2: https://lore.kernel.org/linux-iommu/20250623094946.1714996-1-yilun.xu@linux.intel.com/
>
> v1/rfc: https://lore.kernel.org/linux-iommu/20250610065146.1321816-1-aneesh.kumar@kernel.org/
>
> The series is based on iommufd for-next
>
>
> Xu Yilun (8):
> iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
> iommufd: Add iommufd_object_tombstone_user() helper
> iommufd: Add a pre_destroy() op for objects
> iommufd: Destroy vdevice on idevice destroy
> iommufd/vdevice: Remove struct device reference from struct vdevice
> iommufd/selftest: Explicitly skip tests for inapplicable variant
> iommufd/selftest: Add coverage for vdevice tombstone
> iommufd: Rename some shortterm-related identifiers
>
> .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 3 +-
> drivers/iommu/iommufd/device.c | 51 +++
> drivers/iommu/iommufd/driver.c | 10 +-
> drivers/iommu/iommufd/iommufd_private.h | 49 ++-
> drivers/iommu/iommufd/main.c | 69 +++-
> drivers/iommu/iommufd/viommu.c | 69 +++-
> include/linux/iommufd.h | 17 +-
> include/uapi/linux/iommufd.h | 5 +
> tools/testing/selftests/iommu/iommufd.c | 377 +++++++++---------
> 9 files changed, 419 insertions(+), 231 deletions(-)
>
Can you share the commit id these patches are against.
-aneesh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
` (8 preceding siblings ...)
2025-07-18 9:13 ` [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Aneesh Kumar K.V
@ 2025-07-18 17:30 ` Jason Gunthorpe
9 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2025-07-18 17:30 UTC (permalink / raw)
To: Xu Yilun
Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro,
robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu,
yilun.xu
On Wed, Jul 16, 2025 at 03:03:41PM +0800, Xu Yilun wrote:
> It is to solve the lifecycle issue that vdevice may outlive idevice. It
> is a prerequisite for TIO, to ensure extra secure configurations (e.g.
> TSM Bind/Unbind) against vdevice could be rolled back on idevice unbind,
> so that VFIO could still work on the physical device without surprise.
> Xu Yilun (8):
> iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice
> iommufd: Add iommufd_object_tombstone_user() helper
> iommufd: Add a pre_destroy() op for objects
> iommufd: Destroy vdevice on idevice destroy
> iommufd/vdevice: Remove struct device reference from struct vdevice
> iommufd/selftest: Explicitly skip tests for inapplicable variant
> iommufd/selftest: Add coverage for vdevice tombstone
> iommufd: Rename some shortterm-related identifiers
Applied to for-next, thanks
Jason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind
2025-07-18 9:13 ` [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Aneesh Kumar K.V
@ 2025-07-19 15:33 ` Xu Yilun
0 siblings, 0 replies; 12+ messages in thread
From: Xu Yilun @ 2025-07-19 15:33 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: jgg, jgg, kevin.tian, will, iommu, linux-kernel, joro,
robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu,
yilun.xu
> > 9 files changed, 419 insertions(+), 231 deletions(-)
> >
>
> Can you share the commit id these patches are against.
Sorry, somehow I forgot to use --base
base-commit: 601b1d0d9395c711383452bd0d47037afbbb4bcf
>
> -aneesh
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-07-19 15:42 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-16 7:03 [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-07-16 7:03 ` [PATCH v6 1/8] iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice Xu Yilun
2025-07-16 7:03 ` [PATCH v6 2/8] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-07-16 7:03 ` [PATCH v6 3/8] iommufd: Add a pre_destroy() op for objects Xu Yilun
2025-07-16 7:03 ` [PATCH v6 4/8] iommufd: Destroy vdevice on idevice destroy Xu Yilun
2025-07-16 7:03 ` [PATCH v6 5/8] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun
2025-07-16 7:03 ` [PATCH v6 6/8] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun
2025-07-16 7:03 ` [PATCH v6 7/8] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
2025-07-16 7:03 ` [PATCH v6 8/8] iommufd: Rename some shortterm-related identifiers Xu Yilun
2025-07-18 9:13 ` [PATCH v6 0/8] iommufd: Destroy vdevice on device unbind Aneesh Kumar K.V
2025-07-19 15:33 ` Xu Yilun
2025-07-18 17:30 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).