* [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind
@ 2025-06-27 3:38 Xu Yilun
2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
` (4 more replies)
0 siblings, 5 replies; 32+ messages in thread
From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw)
To: jgg, jgg, kevin.tian, will, aneesh.kumar
Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik,
dan.j.williams, baolu.lu, yilun.xu
It is to solve the lifecycle issue that vdevice may outlive idevice. It
is a prerequisite for TIO, to ensure extra secure configurations (e.g.
TSM Bind/Unbind) against vdevice could be rolled back on idevice unbind,
so that VFIO could still work on the physical device without surprise.
Changelog:
v3:
- No bother clean each tombstone in iommufd_fops_release().
- Drop vdev->ictx initialization fix patch.
- Optimize control flow in iommufd_device_remove_vdev().
- Make iommufd_vdevice_abort() reentrant.
- Call iommufd_vdevice_abort() directly instead of waiting for it.
- Rephrase/fix some comments.
- A new patch to remove vdev->dev.
- A new patch to explicitly skip existing viommu tests for no_iommu.
- Also skip vdevice tombstone test for no_iommu.
- Allow me to add SoB from Aneesh.
v2: https://lore.kernel.org/linux-iommu/20250623094946.1714996-1-yilun.xu@linux.intel.com/
v1/rfc: https://lore.kernel.org/linux-iommu/20250610065146.1321816-1-aneesh.kumar@kernel.org/
The series is based on v6.16-rc1
Xu Yilun (5):
iommufd: Add iommufd_object_tombstone_user() helper
iommufd: Destroy vdevice on idevice destroy
iommufd/vdevice: Remove struct device reference from struct vdevice
iommufd/selftest: Explicitly skip tests for inapplicable variant
iommufd/selftest: Add coverage for vdevice tombstone
drivers/iommu/iommufd/device.c | 42 +++
drivers/iommu/iommufd/driver.c | 4 +-
drivers/iommu/iommufd/iommufd_private.h | 35 ++-
drivers/iommu/iommufd/main.c | 20 +-
drivers/iommu/iommufd/viommu.c | 47 ++-
tools/testing/selftests/iommu/iommufd.c | 388 ++++++++++++------------
6 files changed, 337 insertions(+), 199 deletions(-)
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
--
2.25.1
^ permalink raw reply [flat|nested] 32+ messages in thread* [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun @ 2025-06-27 3:38 ` Xu Yilun 2025-06-30 3:08 ` Baolu Lu ` (2 more replies) 2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun ` (3 subsequent siblings) 4 siblings, 3 replies; 32+ messages in thread From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw) To: jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu Add the iommufd_object_tombstone_user() helper, which allows the caller to destroy an iommufd object created by userspace. This is useful on some destroy paths when the kernel caller finds the object should have been removed by userspace but is still alive. With this helper, the caller destroys the object but leave the object ID reserved (so called tombstone). The tombstone prevents repurposing the object ID without awareness of the original user. Since this happens for abnomal userspace behavior, for simplicity, the tombstoned object ID would be permanently leaked until iommufd_fops_release(). I.e. the original user gets an error when calling ioctl(IOMMU_DESTROY) on that ID. The first use case would be to ensure the iommufd_vdevice can't outlive the associated iommufd_device. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++- drivers/iommu/iommufd/main.c | 19 ++++++++++++++++--- 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 9ccc83341f32..fbc9ef78d81f 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx, struct iommufd_object *obj); enum { - REMOVE_WAIT_SHORTTERM = 1, + REMOVE_WAIT_SHORTTERM = BIT(0), + REMOVE_OBJ_TOMBSTONE = BIT(1), }; int iommufd_object_remove(struct iommufd_ctx *ictx, struct iommufd_object *to_destroy, u32 id, @@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx, WARN_ON(ret); } +/* + * Similar to iommufd_object_destroy_user(), except that the object ID is left + * reserved/tombstoned. + */ +static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx, + struct iommufd_object *obj) +{ + int ret; + + ret = iommufd_object_remove(ictx, obj, obj->id, + REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE); + + /* + * If there is a bug and we couldn't destroy the object then we did put + * back the caller's users refcount and will eventually try to free it + * again during close. + */ + WARN_ON(ret); +} + /* * The HWPT allocated by autodomains is used in possibly many devices and * is automatically destroyed when its refcount reaches zero. diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 3df468f64e7d..620923669b42 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx, goto err_xa; } - xas_store(&xas, NULL); + xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL); if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj)) ictx->vfio_ioas = NULL; xa_unlock(&ictx->objects); @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp) struct iommufd_sw_msi_map *next; struct iommufd_sw_msi_map *cur; struct iommufd_object *obj; + bool empty; /* * The objects in the xarray form a graph of "users" counts, and we have @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp) * until the entire list is destroyed. If this can't progress then there * is some bug related to object refcounting. */ - while (!xa_empty(&ictx->objects)) { + for (;;) { unsigned int destroyed = 0; unsigned long index; + empty = true; xa_for_each(&ictx->objects, index, obj) { + empty = false; if (!refcount_dec_if_one(&obj->users)) continue; + destroyed++; xa_erase(&ictx->objects, index); iommufd_object_ops[obj->type].destroy(obj); kfree(obj); } + + if (empty) + break; + /* Bug related to users refcount */ if (WARN_ON(!destroyed)) break; } - WARN_ON(!xa_empty(&ictx->groups)); + + /* + * There may be some tombstones left over from + * iommufd_object_tombstone_user() + */ + xa_destroy(&ictx->groups); mutex_destroy(&ictx->sw_msi_lock); list_for_each_entry_safe(cur, next, &ictx->sw_msi_list, sw_msi_item) -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun @ 2025-06-30 3:08 ` Baolu Lu 2025-06-30 7:24 ` Xu Yilun 2025-06-30 5:52 ` Tian, Kevin 2025-06-30 19:50 ` Nicolin Chen 2 siblings, 1 reply; 32+ messages in thread From: Baolu Lu @ 2025-06-30 3:08 UTC (permalink / raw) To: Xu Yilun, jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, yilun.xu On 6/27/25 11:38, Xu Yilun wrote: > Add the iommufd_object_tombstone_user() helper, which allows the caller > to destroy an iommufd object created by userspace. > > This is useful on some destroy paths when the kernel caller finds the > object should have been removed by userspace but is still alive. With > this helper, the caller destroys the object but leave the object ID > reserved (so called tombstone). The tombstone prevents repurposing the > object ID without awareness of the original user. > > Since this happens for abnomal userspace behavior, for simplicity, the > tombstoned object ID would be permanently leaked until > iommufd_fops_release(). I.e. the original user gets an error when > calling ioctl(IOMMU_DESTROY) on that ID. > > The first use case would be to ensure the iommufd_vdevice can't outlive > the associated iommufd_device. > > Suggested-by: Jason Gunthorpe<jgg@nvidia.com> > Co-developed-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org> > Signed-off-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org> > Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com> > --- > drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++- > drivers/iommu/iommufd/main.c | 19 ++++++++++++++++--- > 2 files changed, 38 insertions(+), 4 deletions(-) > > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h > index 9ccc83341f32..fbc9ef78d81f 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx, > struct iommufd_object *obj); > > enum { > - REMOVE_WAIT_SHORTTERM = 1, > + REMOVE_WAIT_SHORTTERM = BIT(0), > + REMOVE_OBJ_TOMBSTONE = BIT(1), > }; > int iommufd_object_remove(struct iommufd_ctx *ictx, > struct iommufd_object *to_destroy, u32 id, > @@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx, > WARN_ON(ret); > } > > +/* > + * Similar to iommufd_object_destroy_user(), except that the object ID is left > + * reserved/tombstoned. > + */ > +static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx, > + struct iommufd_object *obj) > +{ > + int ret; > + > + ret = iommufd_object_remove(ictx, obj, obj->id, > + REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE); > + > + /* > + * If there is a bug and we couldn't destroy the object then we did put > + * back the caller's users refcount and will eventually try to free it > + * again during close. > + */ > + WARN_ON(ret); > +} > + > /* > * The HWPT allocated by autodomains is used in possibly many devices and > * is automatically destroyed when its refcount reaches zero. > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index 3df468f64e7d..620923669b42 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx, > goto err_xa; > } > > - xas_store(&xas, NULL); > + xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL); > if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj)) > ictx->vfio_ioas = NULL; > xa_unlock(&ictx->objects); > @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp) > struct iommufd_sw_msi_map *next; > struct iommufd_sw_msi_map *cur; > struct iommufd_object *obj; > + bool empty; > > /* > * The objects in the xarray form a graph of "users" counts, and we have > @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp) > * until the entire list is destroyed. If this can't progress then there > * is some bug related to object refcounting. > */ > - while (!xa_empty(&ictx->objects)) { > + for (;;) { > unsigned int destroyed = 0; > unsigned long index; > > + empty = true; > xa_for_each(&ictx->objects, index, obj) { > + empty = false; People like me, who don't know the details of how xa_for_each() and xa_empty() work, will ask why "empty = xa_empty()" isn't used here. So it's better to add some comments to make it more readable. /* XA_ZERO_ENTRY is treated as a null entry by xa_for_each(). */ Others look good to me. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > if (!refcount_dec_if_one(&obj->users)) > continue; > + > destroyed++; > xa_erase(&ictx->objects, index); > iommufd_object_ops[obj->type].destroy(obj); > kfree(obj); > } > + > + if (empty) > + break; > + > /* Bug related to users refcount */ > if (WARN_ON(!destroyed)) > break; > } Thanks, baolu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-30 3:08 ` Baolu Lu @ 2025-06-30 7:24 ` Xu Yilun 0 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-06-30 7:24 UTC (permalink / raw) To: Baolu Lu Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, yilun.xu > > - while (!xa_empty(&ictx->objects)) { > > + for (;;) { > > unsigned int destroyed = 0; > > unsigned long index; > > + empty = true; > > xa_for_each(&ictx->objects, index, obj) { > > + empty = false; > > People like me, who don't know the details of how xa_for_each() and > xa_empty() work, will ask why "empty = xa_empty()" isn't used here. So > it's better to add some comments to make it more readable. > > /* XA_ZERO_ENTRY is treated as a null entry by xa_for_each(). */ Yes. I try to make it more elaborate: + /* + * xa_for_each() will not return tomestones (zeroed entries), + * which prevent the xarray being empty. So use an empty flag + * instead of xa_empty() to indicate all entries are either + * NULLed or tomestoned. + */ empty = true; Thanks, Yilun > > Others look good to me. > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun 2025-06-30 3:08 ` Baolu Lu @ 2025-06-30 5:52 ` Tian, Kevin 2025-06-30 6:41 ` Xu Yilun 2025-06-30 19:50 ` Nicolin Chen 2 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2025-06-30 5:52 UTC (permalink / raw) To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org, aneesh.kumar@kernel.org Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun > From: Xu Yilun <yilun.xu@linux.intel.com> > Sent: Friday, June 27, 2025 11:38 AM > > @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, > struct file *filp) > struct iommufd_sw_msi_map *next; > struct iommufd_sw_msi_map *cur; > struct iommufd_object *obj; > + bool empty; move into for(;;) loop > > /* > * The objects in the xarray form a graph of "users" counts, and we > have > @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode > *inode, struct file *filp) > * until the entire list is destroyed. If this can't progress then there > * is some bug related to object refcounting. > */ > - while (!xa_empty(&ictx->objects)) { > + for (;;) { > unsigned int destroyed = 0; > unsigned long index; > > + empty = true; > xa_for_each(&ictx->objects, index, obj) { > + empty = false; > if (!refcount_dec_if_one(&obj->users)) > continue; > + > destroyed++; > xa_erase(&ictx->objects, index); > iommufd_object_ops[obj->type].destroy(obj); > kfree(obj); > } > + > + if (empty) > + break; > + > /* Bug related to users refcount */ > if (WARN_ON(!destroyed)) > break; > } > - WARN_ON(!xa_empty(&ictx->groups)); I didn't get the rationale of this change. tombstone only means the object ID reserved, but all the destroy work for the object has been done, e.g. after all idevice objects are destroyed there should be no valid igroup entries in ictx->groups (and there is no tombstone state for igroup). Is it a wrong change by misreading ictx->groups as ictx->objects? > + > + /* > + * There may be some tombstones left over from > + * iommufd_object_tombstone_user() > + */ > + xa_destroy(&ictx->groups); > And here should be ictx->objects? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-30 5:52 ` Tian, Kevin @ 2025-06-30 6:41 ` Xu Yilun 0 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-06-30 6:41 UTC (permalink / raw) To: Tian, Kevin Cc: jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Mon, Jun 30, 2025 at 05:52:26AM +0000, Tian, Kevin wrote: > > From: Xu Yilun <yilun.xu@linux.intel.com> > > Sent: Friday, June 27, 2025 11:38 AM > > > > @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, > > struct file *filp) > > struct iommufd_sw_msi_map *next; > > struct iommufd_sw_msi_map *cur; > > struct iommufd_object *obj; > > + bool empty; > > move into for(;;) loop Yes. > > > > > /* > > * The objects in the xarray form a graph of "users" counts, and we > > have > > @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode > > *inode, struct file *filp) > > * until the entire list is destroyed. If this can't progress then there > > * is some bug related to object refcounting. > > */ > > - while (!xa_empty(&ictx->objects)) { > > + for (;;) { > > unsigned int destroyed = 0; > > unsigned long index; > > > > + empty = true; > > xa_for_each(&ictx->objects, index, obj) { > > + empty = false; > > if (!refcount_dec_if_one(&obj->users)) > > continue; > > + > > destroyed++; > > xa_erase(&ictx->objects, index); > > iommufd_object_ops[obj->type].destroy(obj); > > kfree(obj); > > } > > + > > + if (empty) > > + break; > > + > > /* Bug related to users refcount */ > > if (WARN_ON(!destroyed)) > > break; > > } > > - WARN_ON(!xa_empty(&ictx->groups)); > > I didn't get the rationale of this change. tombstone only means the > object ID reserved, but all the destroy work for the object has been > done, e.g. after all idevice objects are destroyed there should be no > valid igroup entries in ictx->groups (and there is no tombstone > state for igroup). > > Is it a wrong change by misreading ictx->groups as ictx->objects? Sorry, this is a true mistake. > > > + > > + /* > > + * There may be some tombstones left over from > > + * iommufd_object_tombstone_user() > > + */ > > + xa_destroy(&ictx->groups); > > > > And here should be ictx->objects? Yes, thanks for catching up. - xa_destroy(&ictx->groups); + xa_destroy(&ictx->objects); + + WARN_ON(!xa_empty(&ictx->groups)); Thanks, Yilun ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun 2025-06-30 3:08 ` Baolu Lu 2025-06-30 5:52 ` Tian, Kevin @ 2025-06-30 19:50 ` Nicolin Chen 2025-07-08 8:45 ` Xu Yilun 2 siblings, 1 reply; 32+ messages in thread From: Nicolin Chen @ 2025-06-30 19:50 UTC (permalink / raw) To: Xu Yilun Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu, yilun.xu On Fri, Jun 27, 2025 at 11:38:05AM +0800, Xu Yilun wrote: > Add the iommufd_object_tombstone_user() helper, which allows the caller > to destroy an iommufd object created by userspace. Should we describe it "partially destroy"? > This is useful on some destroy paths when the kernel caller finds the > object should have been removed by userspace but is still alive. With > this helper, the caller destroys the object but leave the object ID > reserved (so called tombstone). The tombstone prevents repurposing the > object ID without awareness of the original user. > > Since this happens for abnomal userspace behavior, for simplicity, the s/abnomal/abnormal Nicolin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper 2025-06-30 19:50 ` Nicolin Chen @ 2025-07-08 8:45 ` Xu Yilun 0 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-07-08 8:45 UTC (permalink / raw) To: Nicolin Chen Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu, yilun.xu On Mon, Jun 30, 2025 at 12:50:57PM -0700, Nicolin Chen wrote: > On Fri, Jun 27, 2025 at 11:38:05AM +0800, Xu Yilun wrote: > > Add the iommufd_object_tombstone_user() helper, which allows the caller > > to destroy an iommufd object created by userspace. > > Should we describe it "partially destroy"? I don't prefer "partially destroy", it gives the impression the object is still in memory, in fact with the helper it is totally destroyed and freed. Only the ID is reserved, but the ID actually has nothing to do with the object anymore. > > > This is useful on some destroy paths when the kernel caller finds the > > object should have been removed by userspace but is still alive. With > > this helper, the caller destroys the object but leave the object ID > > reserved (so called tombstone). The tombstone prevents repurposing the > > object ID without awareness of the original user. > > > > Since this happens for abnomal userspace behavior, for simplicity, the > > s/abnomal/abnormal Applied, thanks for catching. > > Nicolin ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun 2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun @ 2025-06-27 3:38 ` Xu Yilun 2025-06-30 5:08 ` Baolu Lu ` (2 more replies) 2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun ` (2 subsequent siblings) 4 siblings, 3 replies; 32+ messages in thread From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw) To: jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destroy so that vdev can't outlive idev. iommufd_device (idev) represents the physical device bound to iommufd, while the iommufd_vdevice (vdev) represents the virtual instance of the physical device in the VM. The lifecycle of the vdev should not be longer than idev. This doesn't cause real problem on existing use cases cause vdev doesn't impact the physical device, only provides virtualization information. But to extend vdev for Confidential Computing (CC), there are needs to do secure configuration for the vdev, e.g. TSM Bind/Unbind. These configurations should be rolled back on idev destroy, or the external driver (VFIO) functionality may be impact. Building the association between idev & vdev requires the two objects pointing each other, but not referencing each other. This requires proper locking. This is done by reviving some of Nicolin's patch [1]. There are 3 cases on idev destroy: 1. vdev is already destroyed by userspace. No extra handling needed. 2. vdev is still alive. Use iommufd_object_tombstone_user() to destroy vdev and tombstone the vdev ID. 3. vdev is being destroyed by userspace. The vdev ID is already freed, but vdev destroy handler is not completed. Destroy the vdev immediately. To solve the racing with userspace destruction, make iommufd_vdevice_abort() reentrant. [1]: https://lore.kernel.org/all/53025c827c44d68edb6469bfd940a8e8bc6147a5.1729897278.git.nicolinc@nvidia.com/ Originally-by: Nicolin Chen <nicolinc@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- drivers/iommu/iommufd/device.c | 42 +++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 11 +++++++ drivers/iommu/iommufd/main.c | 1 + drivers/iommu/iommufd/viommu.c | 44 +++++++++++++++++++++++-- 4 files changed, 95 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 86244403b532..0937d4989185 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -137,11 +137,53 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx, } } +static void iommufd_device_remove_vdev(struct iommufd_device *idev) +{ + struct iommufd_vdevice *vdev; + + mutex_lock(&idev->igroup->lock); + /* vdev has been completely destroyed by userspace */ + if (!idev->vdev) + goto out_unlock; + + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); + if (IS_ERR(vdev)) { + /* + * vdev is removed from xarray by userspace, but is not + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant, + * safe to destroy vdev here. + */ + iommufd_vdevice_abort(&idev->vdev->obj); + goto out_unlock; + } + + /* Should never happen */ + if (WARN_ON(vdev != idev->vdev)) { + iommufd_put_object(idev->ictx, &vdev->obj); + goto out_unlock; + } + + /* + * vdev is still alive. Hold a users refcount to prevent racing with + * userspace destruction, then use iommufd_object_tombstone_user() to + * destroy it and leave a tombstone. + */ + refcount_inc(&vdev->obj.users); + iommufd_put_object(idev->ictx, &vdev->obj); + mutex_unlock(&idev->igroup->lock); + iommufd_object_tombstone_user(idev->ictx, &vdev->obj); + return; + +out_unlock: + mutex_unlock(&idev->igroup->lock); +} + void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); + iommufd_device_remove_vdev(idev); iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index fbc9ef78d81f..f58aa4439c53 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -446,6 +446,7 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + struct iommufd_vdevice *vdev; }; static inline struct iommufd_device * @@ -621,6 +622,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_vdevice_destroy(struct iommufd_object *obj); +void iommufd_vdevice_abort(struct iommufd_object *obj); struct iommufd_vdevice { struct iommufd_object obj; @@ -628,8 +630,17 @@ struct iommufd_vdevice { struct iommufd_viommu *viommu; struct device *dev; u64 id; /* per-vIOMMU virtual ID */ + struct iommufd_device *idev; }; +static inline struct iommufd_vdevice * +iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id) +{ + return container_of(iommufd_get_object(ictx, id, + IOMMUFD_OBJ_VDEVICE), + struct iommufd_vdevice, obj); +} + #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 620923669b42..64731b4fdbdf 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -529,6 +529,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { }, [IOMMUFD_OBJ_VDEVICE] = { .destroy = iommufd_vdevice_destroy, + .abort = iommufd_vdevice_abort, }, [IOMMUFD_OBJ_VEVENTQ] = { .destroy = iommufd_veventq_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 01df2b985f02..632d1d7b8fd8 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -84,16 +84,38 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) return rc; } -void iommufd_vdevice_destroy(struct iommufd_object *obj) +void iommufd_vdevice_abort(struct iommufd_object *obj) { struct iommufd_vdevice *vdev = container_of(obj, struct iommufd_vdevice, obj); struct iommufd_viommu *viommu = vdev->viommu; + struct iommufd_device *idev = vdev->idev; + + lockdep_assert_held(&idev->igroup->lock); + + /* + * iommufd_vdevice_abort() could be reentrant, by + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only once. + */ + if (!viommu) + return; /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); put_device(vdev->dev); + vdev->viommu = NULL; + idev->vdev = NULL; +} + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev = + container_of(obj, struct iommufd_vdevice, obj); + + mutex_lock(&vdev->idev->igroup->lock); + iommufd_vdevice_abort(obj); + mutex_unlock(&vdev->idev->igroup->lock); } int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -124,10 +146,16 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_idev; } + mutex_lock(&idev->igroup->lock); + if (idev->vdev) { + rc = -EEXIST; + goto out_unlock_igroup; + } + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); if (IS_ERR(vdev)) { rc = PTR_ERR(vdev); - goto out_put_idev; + goto out_unlock_igroup; } vdev->id = virt_id; @@ -135,6 +163,14 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) get_device(idev->dev); vdev->viommu = viommu; refcount_inc(&viommu->obj.users); + /* + * iommufd_device_destroy() waits until idev->vdev is NULL before + * freeing the idev, which only happens once the vdev is finished + * destruction. Thus we do not need refcounting on either idev->vdev or + * vdev->idev. + */ + vdev->idev = idev; + idev->vdev = vdev; curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); if (curr) { @@ -147,10 +183,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) if (rc) goto out_abort; iommufd_object_finalize(ucmd->ictx, &vdev->obj); - goto out_put_idev; + goto out_unlock_igroup; out_abort: iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); out_put_idev: iommufd_put_object(ucmd->ictx, &idev->obj); out_put_viommu: -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun @ 2025-06-30 5:08 ` Baolu Lu 2025-07-08 8:34 ` Xu Yilun 2025-06-30 6:27 ` Tian, Kevin 2025-06-30 19:34 ` Nicolin Chen 2 siblings, 1 reply; 32+ messages in thread From: Baolu Lu @ 2025-06-30 5:08 UTC (permalink / raw) To: Xu Yilun, jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, yilun.xu On 6/27/25 11:38, Xu Yilun wrote: > Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destroy so > that vdev can't outlive idev. > > iommufd_device (idev) represents the physical device bound to iommufd, > while the iommufd_vdevice (vdev) represents the virtual instance of the > physical device in the VM. The lifecycle of the vdev should not be > longer than idev. This doesn't cause real problem on existing use cases > cause vdev doesn't impact the physical device, only provides > virtualization information. But to extend vdev for Confidential > Computing (CC), there are needs to do secure configuration for the vdev, > e.g. TSM Bind/Unbind. These configurations should be rolled back on idev > destroy, or the external driver (VFIO) functionality may be impact. > > Building the association between idev & vdev requires the two objects > pointing each other, but not referencing each other. This requires > proper locking. This is done by reviving some of Nicolin's patch [1]. > > There are 3 cases on idev destroy: > > 1. vdev is already destroyed by userspace. No extra handling needed. > 2. vdev is still alive. Use iommufd_object_tombstone_user() to > destroy vdev and tombstone the vdev ID. > 3. vdev is being destroyed by userspace. The vdev ID is already freed, > but vdev destroy handler is not completed. Destroy the vdev > immediately. To solve the racing with userspace destruction, make > iommufd_vdevice_abort() reentrant. > > [1]:https://lore.kernel.org/ > all/53025c827c44d68edb6469bfd940a8e8bc6147a5.1729897278.git.nicolinc@nvidia.com/ > > Originally-by: Nicolin Chen<nicolinc@nvidia.com> > Suggested-by: Jason Gunthorpe<jgg@nvidia.com> > Co-developed-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org> > Signed-off-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org> > Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com> > --- > drivers/iommu/iommufd/device.c | 42 +++++++++++++++++++++++ > drivers/iommu/iommufd/iommufd_private.h | 11 +++++++ > drivers/iommu/iommufd/main.c | 1 + > drivers/iommu/iommufd/viommu.c | 44 +++++++++++++++++++++++-- > 4 files changed, 95 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c > index 86244403b532..0937d4989185 100644 > --- a/drivers/iommu/iommufd/device.c > +++ b/drivers/iommu/iommufd/device.c > @@ -137,11 +137,53 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx, > } > } > > +static void iommufd_device_remove_vdev(struct iommufd_device *idev) > +{ > + struct iommufd_vdevice *vdev; > + > + mutex_lock(&idev->igroup->lock); > + /* vdev has been completely destroyed by userspace */ > + if (!idev->vdev) > + goto out_unlock; > + > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); > + if (IS_ERR(vdev)) { > + /* > + * vdev is removed from xarray by userspace, but is not > + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant, > + * safe to destroy vdev here. > + */ > + iommufd_vdevice_abort(&idev->vdev->obj); > + goto out_unlock; > + } > + > + /* Should never happen */ > + if (WARN_ON(vdev != idev->vdev)) { > + iommufd_put_object(idev->ictx, &vdev->obj); > + goto out_unlock; > + } > + > + /* > + * vdev is still alive. Hold a users refcount to prevent racing with > + * userspace destruction, then use iommufd_object_tombstone_user() to > + * destroy it and leave a tombstone. > + */ > + refcount_inc(&vdev->obj.users); > + iommufd_put_object(idev->ictx, &vdev->obj); > + mutex_unlock(&idev->igroup->lock); > + iommufd_object_tombstone_user(idev->ictx, &vdev->obj); > + return; > + > +out_unlock: > + mutex_unlock(&idev->igroup->lock); > +} > + > void iommufd_device_destroy(struct iommufd_object *obj) > { > struct iommufd_device *idev = > container_of(obj, struct iommufd_device, obj); > > + iommufd_device_remove_vdev(idev); > iommu_device_release_dma_owner(idev->dev); > iommufd_put_group(idev->igroup); > if (!iommufd_selftest_is_mock_dev(idev->dev)) > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h > index fbc9ef78d81f..f58aa4439c53 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -446,6 +446,7 @@ struct iommufd_device { > /* always the physical device */ > struct device *dev; > bool enforce_cache_coherency; > + struct iommufd_vdevice *vdev; > }; > > static inline struct iommufd_device * > @@ -621,6 +622,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); > void iommufd_viommu_destroy(struct iommufd_object *obj); > int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); > void iommufd_vdevice_destroy(struct iommufd_object *obj); > +void iommufd_vdevice_abort(struct iommufd_object *obj); > > struct iommufd_vdevice { > struct iommufd_object obj; > @@ -628,8 +630,17 @@ struct iommufd_vdevice { > struct iommufd_viommu *viommu; > struct device *dev; > u64 id; /* per-vIOMMU virtual ID */ > + struct iommufd_device *idev; > }; > > +static inline struct iommufd_vdevice * > +iommufd_get_vdevice(struct iommufd_ctx *ictx, u32 id) > +{ > + return container_of(iommufd_get_object(ictx, id, > + IOMMUFD_OBJ_VDEVICE), > + struct iommufd_vdevice, obj); > +} > + > #ifdef CONFIG_IOMMUFD_TEST > int iommufd_test(struct iommufd_ucmd *ucmd); > void iommufd_selftest_destroy(struct iommufd_object *obj); > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index 620923669b42..64731b4fdbdf 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -529,6 +529,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { > }, > [IOMMUFD_OBJ_VDEVICE] = { > .destroy = iommufd_vdevice_destroy, > + .abort = iommufd_vdevice_abort, > }, > [IOMMUFD_OBJ_VEVENTQ] = { > .destroy = iommufd_veventq_destroy, > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c > index 01df2b985f02..632d1d7b8fd8 100644 > --- a/drivers/iommu/iommufd/viommu.c > +++ b/drivers/iommu/iommufd/viommu.c > @@ -84,16 +84,38 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > return rc; > } > > -void iommufd_vdevice_destroy(struct iommufd_object *obj) > +void iommufd_vdevice_abort(struct iommufd_object *obj) > { > struct iommufd_vdevice *vdev = > container_of(obj, struct iommufd_vdevice, obj); > struct iommufd_viommu *viommu = vdev->viommu; > + struct iommufd_device *idev = vdev->idev; > + > + lockdep_assert_held(&idev->igroup->lock); > + > + /* > + * iommufd_vdevice_abort() could be reentrant, by > + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only once. > + */ > + if (!viommu) > + return; > > /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ > xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > refcount_dec(&viommu->obj.users); > put_device(vdev->dev); > + vdev->viommu = NULL; > + idev->vdev = NULL; I feel it makes more sense to reorder the operations like this: vdev->viommu = NULL; vdev->idev = NULL; idev->vdev = NULL; put_device(vdev->dev); put_device(vdev->dev) could potentially trigger other code paths that might attempt to reference vdev or its associated pointers. Therefore, it's safer to null all the relevant reference pointers before calling put_device(). Others look good to me, Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Thanks, baolu ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-30 5:08 ` Baolu Lu @ 2025-07-08 8:34 ` Xu Yilun 0 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-07-08 8:34 UTC (permalink / raw) To: Baolu Lu Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, yilun.xu > > -void iommufd_vdevice_destroy(struct iommufd_object *obj) > > +void iommufd_vdevice_abort(struct iommufd_object *obj) > > { > > struct iommufd_vdevice *vdev = > > container_of(obj, struct iommufd_vdevice, obj); > > struct iommufd_viommu *viommu = vdev->viommu; > > + struct iommufd_device *idev = vdev->idev; > > + > > + lockdep_assert_held(&idev->igroup->lock); > > + > > + /* > > + * iommufd_vdevice_abort() could be reentrant, by > > + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only once. > > + */ > > + if (!viommu) > > + return; > > /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ > > xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > refcount_dec(&viommu->obj.users); > > put_device(vdev->dev); > > + vdev->viommu = NULL; > > + idev->vdev = NULL; > > I feel it makes more sense to reorder the operations like this: > > vdev->viommu = NULL; > vdev->idev = NULL; > idev->vdev = NULL; > put_device(vdev->dev); > > put_device(vdev->dev) could potentially trigger other code paths that > might attempt to reference vdev or its associated pointers. Therefore, > it's safer to null all the relevant reference pointers before calling > put_device(). Yep. due to other changes, now I keep: xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); + idev->vdev = NULL; put_device(vdev->dev); Anyway, vdev->dev would be completely dropped in next patch, so it doesn't matter much. Thanks, Yilun > > Others look good to me, > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > > Thanks, > baolu ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun 2025-06-30 5:08 ` Baolu Lu @ 2025-06-30 6:27 ` Tian, Kevin 2025-06-30 10:18 ` Xu Yilun 2025-06-30 19:34 ` Nicolin Chen 2 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2025-06-30 6:27 UTC (permalink / raw) To: Xu Yilun, jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org, aneesh.kumar@kernel.org Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun > From: Xu Yilun <yilun.xu@linux.intel.com> > Sent: Friday, June 27, 2025 11:38 AM > > +static void iommufd_device_remove_vdev(struct iommufd_device *idev) > +{ > + struct iommufd_vdevice *vdev; > + > + mutex_lock(&idev->igroup->lock); > + /* vdev has been completely destroyed by userspace */ > + if (!idev->vdev) > + goto out_unlock; > + > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); > + if (IS_ERR(vdev)) { > + /* > + * vdev is removed from xarray by userspace, but is not > + * destroyed/freed. Since iommufd_vdevice_abort() is > reentrant, > + * safe to destroy vdev here. > + */ > + iommufd_vdevice_abort(&idev->vdev->obj); > + goto out_unlock; > + } let's add a comment that vdev is still freed in iommufd_destroy() in this situation. > -void iommufd_vdevice_destroy(struct iommufd_object *obj) > +void iommufd_vdevice_abort(struct iommufd_object *obj) > { > struct iommufd_vdevice *vdev = > container_of(obj, struct iommufd_vdevice, obj); > struct iommufd_viommu *viommu = vdev->viommu; > + struct iommufd_device *idev = vdev->idev; > + > + lockdep_assert_held(&idev->igroup->lock); > + > + /* > + * iommufd_vdevice_abort() could be reentrant, by > + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only > once. > + */ > + if (!viommu) > + return; Just check idev->vdev, to be consistent with the other path. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-30 6:27 ` Tian, Kevin @ 2025-06-30 10:18 ` Xu Yilun 2025-06-30 14:50 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Xu Yilun @ 2025-06-30 10:18 UTC (permalink / raw) To: Tian, Kevin Cc: jgg@nvidia.com, jgg@ziepe.ca, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Mon, Jun 30, 2025 at 06:27:51AM +0000, Tian, Kevin wrote: > > From: Xu Yilun <yilun.xu@linux.intel.com> > > Sent: Friday, June 27, 2025 11:38 AM > > > > +static void iommufd_device_remove_vdev(struct iommufd_device *idev) > > +{ > > + struct iommufd_vdevice *vdev; > > + > > + mutex_lock(&idev->igroup->lock); > > + /* vdev has been completely destroyed by userspace */ > > + if (!idev->vdev) > > + goto out_unlock; > > + > > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); > > + if (IS_ERR(vdev)) { > > + /* > > + * vdev is removed from xarray by userspace, but is not > > + * destroyed/freed. Since iommufd_vdevice_abort() is > > reentrant, > > + * safe to destroy vdev here. > > + */ > > + iommufd_vdevice_abort(&idev->vdev->obj); > > + goto out_unlock; > > + } > > let's add a comment that vdev is still freed in iommufd_destroy() > in this situation. Yes. > > > -void iommufd_vdevice_destroy(struct iommufd_object *obj) > > +void iommufd_vdevice_abort(struct iommufd_object *obj) > > { > > struct iommufd_vdevice *vdev = > > container_of(obj, struct iommufd_vdevice, obj); > > struct iommufd_viommu *viommu = vdev->viommu; > > + struct iommufd_device *idev = vdev->idev; > > + > > + lockdep_assert_held(&idev->igroup->lock); > > + > > + /* > > + * iommufd_vdevice_abort() could be reentrant, by > > + * iommufd_device_unbind() or by iommufd_destroy(). Cleanup only > > once. > > + */ > > + if (!viommu) > > + return; > > Just check idev->vdev, to be consistent with the other path. I think there is problem here. From your comments above, vdev could be aborted/destroyed by iommufd_destroy() again *after* idev is freed. That means in iommufd_vdevice_abort/destroy(), usage of vdev->idev or idev->vdev or vdev->idev->igroup->lock may be invalid. I need to reconsider this, seems we need a dedicated vdev lock to synchronize concurrent vdev abort/destroy. Thanks, Yilun ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-30 10:18 ` Xu Yilun @ 2025-06-30 14:50 ` Jason Gunthorpe 2025-07-01 9:19 ` Xu Yilun 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-06-30 14:50 UTC (permalink / raw) To: Xu Yilun Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote: > I need to reconsider this, seems we need a dedicated vdev lock to > synchronize concurrent vdev abort/destroy. It is not possible to be concurrent destroy is only called once after it is no longer possible to call abort. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-30 14:50 ` Jason Gunthorpe @ 2025-07-01 9:19 ` Xu Yilun 2025-07-01 12:13 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Xu Yilun @ 2025-07-01 9:19 UTC (permalink / raw) To: Jason Gunthorpe Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote: > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote: > > > I need to reconsider this, seems we need a dedicated vdev lock to > > synchronize concurrent vdev abort/destroy. > > It is not possible to be concurrent > > destroy is only called once after it is no longer possible to call > abort. I'm almost about to drop the "abort twice" idea. [1] [1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@nvidia.com/ See from the flow below, T1. iommufd_device_unbind(idev) iommufd_device_destroy(obj) mutex_lock(&idev->igroup->lock) iommufd_vdevice_abort(idev->vdev.obj) mutex_unlock(&idev->igroup->lock) kfree(obj) T2. iommufd_destroy(vdev_id) iommufd_vdevice_destroy(obj) mutex_lock(&vdev->idev->igroup->lock) iommufd_vdevice_abort(obj); mutex_unlock(&vdev->idev->igroup->lock) kfree(obj) iommufd_vdevice_destroy() will access idev->igroup->lock, but it is possible the idev is already freed at that time: iommufd_destroy(vdev_id) iommufd_vdevice_destroy(obj) iommufd_device_unbind(idev) iommufd_device_destroy(obj) mutex_lock(&idev->igroup->lock) mutex_lock(&vdev->idev->igroup->lock) (wait) iommufd_vdevice_abort(idev->vdev.obj) mutex_unlock(&idev->igroup->lock) kfree(obj) mutex_lock(&vdev->idev->igroup->lock) (PANIC) iommufd_vdevice_abort(obj) ... We also can't introduce some vdev side lock instead of idev side lock, vdev could also be freed right after iommufd_vdevice_destroy(). I think the only simple way is to let idev destruction wait for vdev destruction, that go back to the wait_event idea ... Thanks, Yilun ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-01 9:19 ` Xu Yilun @ 2025-07-01 12:13 ` Jason Gunthorpe 2025-07-02 2:23 ` Xu Yilun 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-07-01 12:13 UTC (permalink / raw) To: Xu Yilun Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote: > On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote: > > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote: > > > > > I need to reconsider this, seems we need a dedicated vdev lock to > > > synchronize concurrent vdev abort/destroy. > > > > It is not possible to be concurrent > > > > destroy is only called once after it is no longer possible to call > > abort. > > I'm almost about to drop the "abort twice" idea. [1] > > [1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@nvidia.com/ > > See from the flow below, > > T1. iommufd_device_unbind(idev) > iommufd_device_destroy(obj) > mutex_lock(&idev->igroup->lock) > iommufd_vdevice_abort(idev->vdev.obj) > mutex_unlock(&idev->igroup->lock) > kfree(obj) > > T2. iommufd_destroy(vdev_id) > iommufd_vdevice_destroy(obj) > mutex_lock(&vdev->idev->igroup->lock) > iommufd_vdevice_abort(obj); > mutex_unlock(&vdev->idev->igroup->lock) > kfree(obj) > > iommufd_vdevice_destroy() will access idev->igroup->lock, but it is > possible the idev is already freed at that time: > > iommufd_destroy(vdev_id) > iommufd_vdevice_destroy(obj) > iommufd_device_unbind(idev) > iommufd_device_destroy(obj) > mutex_lock(&idev->igroup->lock) > mutex_lock(&vdev->idev->igroup->lock) (wait) > iommufd_vdevice_abort(idev->vdev.obj) > mutex_unlock(&idev->igroup->lock) > kfree(obj) > mutex_lock(&vdev->idev->igroup->lock) (PANIC) > iommufd_vdevice_abort(obj) > ... Yes, you can't touch idev inside the destroy function at all, under any version. idev is only valid if you have a refcount on vdev. But why are you touching this lock? Arrange things so abort doesn't touch the idev?? Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-01 12:13 ` Jason Gunthorpe @ 2025-07-02 2:23 ` Xu Yilun 2025-07-02 9:13 ` Tian, Kevin 0 siblings, 1 reply; 32+ messages in thread From: Xu Yilun @ 2025-07-02 2:23 UTC (permalink / raw) To: Jason Gunthorpe Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Tue, Jul 01, 2025 at 09:13:15AM -0300, Jason Gunthorpe wrote: > On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote: > > On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote: > > > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote: > > > > > > > I need to reconsider this, seems we need a dedicated vdev lock to > > > > synchronize concurrent vdev abort/destroy. > > > > > > It is not possible to be concurrent > > > > > > destroy is only called once after it is no longer possible to call > > > abort. > > > > I'm almost about to drop the "abort twice" idea. [1] > > > > [1]: https://lore.kernel.org/linux-iommu/20250625123832.GF167785@nvidia.com/ > > > > See from the flow below, > > > > T1. iommufd_device_unbind(idev) > > iommufd_device_destroy(obj) > > mutex_lock(&idev->igroup->lock) > > iommufd_vdevice_abort(idev->vdev.obj) > > mutex_unlock(&idev->igroup->lock) > > kfree(obj) > > > > T2. iommufd_destroy(vdev_id) > > iommufd_vdevice_destroy(obj) > > mutex_lock(&vdev->idev->igroup->lock) > > iommufd_vdevice_abort(obj); > > mutex_unlock(&vdev->idev->igroup->lock) > > kfree(obj) > > > > iommufd_vdevice_destroy() will access idev->igroup->lock, but it is > > possible the idev is already freed at that time: > > > > iommufd_destroy(vdev_id) > > iommufd_vdevice_destroy(obj) > > iommufd_device_unbind(idev) > > iommufd_device_destroy(obj) > > mutex_lock(&idev->igroup->lock) > > mutex_lock(&vdev->idev->igroup->lock) (wait) > > iommufd_vdevice_abort(idev->vdev.obj) > > mutex_unlock(&idev->igroup->lock) > > kfree(obj) > > mutex_lock(&vdev->idev->igroup->lock) (PANIC) > > iommufd_vdevice_abort(obj) > > ... > > Yes, you can't touch idev inside the destroy function at all, under > any version. idev is only valid if you have a refcount on vdev. > > But why are you touching this lock? Arrange things so abort doesn't > touch the idev?? idev has a pointer idev->vdev to track the vdev's lifecycle. idev->igroup->lock protects the pointer. At the end of iommufd_vdevice_destroy() this pointer should be NULLed so that idev knows vdev is really destroyed. I haven't found a safer way for vdev to sync up its validness with idev w/o touching idev. I was thinking of using vdev->idev and some vdev lock for tracking instead. Then iommufd_vdevice_abort() doesn't touch idev. But it is still the same, just switch to put idev in risk: iommufd_destroy(vdev_id) iommufd_vdevice_destroy(obj) iommufd_device_unbind(idev) iommufd_device_destroy(obj) mutex_lock(&vdev->some_lock) mutex_lock(&idev->vdev->some_lock) (wait) iommufd_vdevice_abort(obj) mutex_unlock(&vdev->some_lock) kfree(obj) mutex_lock(&idev->vdev->some_lock) (PANIC) iommufd_vdevice_abort(idev->vdev.obj) ... Thanks, Yilun ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-02 2:23 ` Xu Yilun @ 2025-07-02 9:13 ` Tian, Kevin 2025-07-02 12:40 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2025-07-02 9:13 UTC (permalink / raw) To: Xu Yilun, Jason Gunthorpe Cc: will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun > From: Xu Yilun <yilun.xu@linux.intel.com> > Sent: Wednesday, July 2, 2025 10:24 AM > > On Tue, Jul 01, 2025 at 09:13:15AM -0300, Jason Gunthorpe wrote: > > On Tue, Jul 01, 2025 at 05:19:05PM +0800, Xu Yilun wrote: > > > On Mon, Jun 30, 2025 at 11:50:51AM -0300, Jason Gunthorpe wrote: > > > > On Mon, Jun 30, 2025 at 06:18:50PM +0800, Xu Yilun wrote: > > > > > > > > > I need to reconsider this, seems we need a dedicated vdev lock to > > > > > synchronize concurrent vdev abort/destroy. > > > > > > > > It is not possible to be concurrent > > > > > > > > destroy is only called once after it is no longer possible to call > > > > abort. > > > > > > I'm almost about to drop the "abort twice" idea. [1] > > > > > > [1]: https://lore.kernel.org/linux- > iommu/20250625123832.GF167785@nvidia.com/ > > > > > > See from the flow below, > > > > > > T1. iommufd_device_unbind(idev) > > > iommufd_device_destroy(obj) > > > mutex_lock(&idev->igroup->lock) > > > iommufd_vdevice_abort(idev->vdev.obj) > > > mutex_unlock(&idev->igroup->lock) > > > kfree(obj) > > > > > > T2. iommufd_destroy(vdev_id) > > > iommufd_vdevice_destroy(obj) > > > mutex_lock(&vdev->idev->igroup->lock) > > > iommufd_vdevice_abort(obj); > > > mutex_unlock(&vdev->idev->igroup->lock) > > > kfree(obj) > > > > > > iommufd_vdevice_destroy() will access idev->igroup->lock, but it is > > > possible the idev is already freed at that time: > > > > > > iommufd_destroy(vdev_id) > > > iommufd_vdevice_destroy(obj) > > > iommufd_device_unbind(idev) > > > iommufd_device_destroy(obj) > > > mutex_lock(&idev->igroup->lock) > > > mutex_lock(&vdev->idev->igroup->lock) (wait) > > > iommufd_vdevice_abort(idev->vdev.obj) > > > mutex_unlock(&idev->igroup->lock) > > > kfree(obj) > > > mutex_lock(&vdev->idev->igroup->lock) (PANIC) > > > iommufd_vdevice_abort(obj) > > > ... > > > > Yes, you can't touch idev inside the destroy function at all, under > > any version. idev is only valid if you have a refcount on vdev. > > > > But why are you touching this lock? Arrange things so abort doesn't > > touch the idev?? > > idev has a pointer idev->vdev to track the vdev's lifecycle. > idev->igroup->lock protects the pointer. At the end of > iommufd_vdevice_destroy() this pointer should be NULLed so that idev > knows vdev is really destroyed. > > I haven't found a safer way for vdev to sync up its validness with idev > w/o touching idev. > > I was thinking of using vdev->idev and some vdev lock for tracking > instead. Then iommufd_vdevice_abort() doesn't touch idev. But it is > still the same, just switch to put idev in risk: > > > iommufd_destroy(vdev_id) > iommufd_vdevice_destroy(obj) > iommufd_device_unbind(idev) > iommufd_device_destroy(obj) > mutex_lock(&vdev->some_lock) > mutex_lock(&idev->vdev->some_lock) (wait) panic could happen here, between acquiring idev->vdev and mutex(vdev->some_lock) as vdev might be destroyed/freed in-between. > iommufd_vdevice_abort(obj) > mutex_unlock(&vdev->some_lock) > kfree(obj) > mutex_lock(&idev->vdev->some_lock) (PANIC) > iommufd_vdevice_abort(idev->vdev.obj) > ... > I cannot find a safe way either, except using certain global lock. but comparing to that I'd prefer to the original wait approach... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-02 9:13 ` Tian, Kevin @ 2025-07-02 12:40 ` Jason Gunthorpe 2025-07-03 4:32 ` Tian, Kevin 2025-07-07 10:58 ` Xu Yilun 0 siblings, 2 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-07-02 12:40 UTC (permalink / raw) To: Tian, Kevin Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Wed, Jul 02, 2025 at 09:13:50AM +0000, Tian, Kevin wrote: > > > Yes, you can't touch idev inside the destroy function at all, under > > > any version. idev is only valid if you have a refcount on vdev. > > > > > > But why are you touching this lock? Arrange things so abort doesn't > > > touch the idev?? > > > > idev has a pointer idev->vdev to track the vdev's lifecycle. > > idev->igroup->lock protects the pointer. At the end of > > iommufd_vdevice_destroy() this pointer should be NULLed so that idev > > knows vdev is really destroyed. Well, that is destroy, not abort, but OK, there is an issue with destroy. > but comparing to that I'd prefer to the original wait approach... Okay, but lets try to keep the wait hidden inside the refcounting.. The issue here is we don't hold a refcount on idev while working with idev. Let's fix that and then things should work properly? Maybe something like this: diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 4e781aa9fc6329..9174fa7c972b80 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -178,12 +178,20 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev) mutex_unlock(&idev->igroup->lock); } +void iommufd_device_pre_destroy(struct iommufd_object *obj) +{ + struct iommufd_device *idev = + container_of(obj, struct iommufd_device, obj); + + /* Release the short term users on this */ + iommufd_device_remove_vdev(idev); +} + void iommufd_device_destroy(struct iommufd_object *obj) { struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); - iommufd_device_remove_vdev(idev); iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index b2e8e106c16158..387c630fdabfbd 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -151,6 +151,9 @@ static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx, if (refcount_dec_and_test(&to_destroy->shortterm_users)) return 0; + if (iommufd_object_ops[to_destroy->type].pre_destroy) + iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy); + if (wait_event_timeout(ictx->destroy_wait, refcount_read(&to_destroy->shortterm_users) == 0, msecs_to_jiffies(60000))) @@ -567,6 +570,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_access_destroy_object, }, [IOMMUFD_OBJ_DEVICE] = { + .pre_destroy = iommufd_device_pre_destroy, .destroy = iommufd_device_destroy, }, [IOMMUFD_OBJ_FAULT] = { diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 9451a311745f7b..cbf99daa7dc25d 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -135,6 +135,7 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj) mutex_lock(&vdev->idev->igroup->lock); iommufd_vdevice_abort(obj); mutex_unlock(&vdev->idev->igroup->lock); + iommufd_put_object(vdev->viommu->ictx, &vdev->idev->obj); } int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -180,13 +181,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) vdev->id = virt_id; vdev->viommu = viommu; refcount_inc(&viommu->obj.users); + + /* + * A reference is held on the idev so long as we have a pointer. + * iommufd_device_pre_destroy() will revoke it before the real + * destruction. + */ + vdev->idev = idev; + /* * iommufd_device_destroy() waits until idev->vdev is NULL before * freeing the idev, which only happens once the vdev is finished - * destruction. Thus we do not need refcounting on either idev->vdev or - * vdev->idev. + * destruction. */ - vdev->idev = idev; idev->vdev = vdev; curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); @@ -207,7 +214,8 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) out_unlock_igroup: mutex_unlock(&idev->igroup->lock); out_put_idev: - iommufd_put_object(ucmd->ictx, &idev->obj); + if (rc) + iommufd_put_object(ucmd->ictx, &idev->obj); out_put_viommu: iommufd_put_object(ucmd->ictx, &viommu->obj); return rc; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-02 12:40 ` Jason Gunthorpe @ 2025-07-03 4:32 ` Tian, Kevin 2025-07-03 11:21 ` Jason Gunthorpe 2025-07-07 10:58 ` Xu Yilun 1 sibling, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2025-07-03 4:32 UTC (permalink / raw) To: Jason Gunthorpe Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun > From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, July 2, 2025 8:41 PM > > On Wed, Jul 02, 2025 at 09:13:50AM +0000, Tian, Kevin wrote: > > > > Yes, you can't touch idev inside the destroy function at all, under > > > > any version. idev is only valid if you have a refcount on vdev. > > > > > > > > But why are you touching this lock? Arrange things so abort doesn't > > > > touch the idev?? > > > > > > idev has a pointer idev->vdev to track the vdev's lifecycle. > > > idev->igroup->lock protects the pointer. At the end of > > > iommufd_vdevice_destroy() this pointer should be NULLed so that idev > > > knows vdev is really destroyed. > > Well, that is destroy, not abort, but OK, there is an issue with > destroy. > > > but comparing to that I'd prefer to the original wait approach... > > Okay, but lets try to keep the wait hidden inside the refcounting.. > > The issue here is we don't hold a refcount on idev while working with > idev. Let's fix that and then things should work properly? > > Maybe something like this: > > diff --git a/drivers/iommu/iommufd/device.c > b/drivers/iommu/iommufd/device.c > index 4e781aa9fc6329..9174fa7c972b80 100644 > --- a/drivers/iommu/iommufd/device.c > +++ b/drivers/iommu/iommufd/device.c > @@ -178,12 +178,20 @@ static void iommufd_device_remove_vdev(struct > iommufd_device *idev) > mutex_unlock(&idev->igroup->lock); > } > > +void iommufd_device_pre_destroy(struct iommufd_object *obj) > +{ > + struct iommufd_device *idev = > + container_of(obj, struct iommufd_device, obj); > + > + /* Release the short term users on this */ > + iommufd_device_remove_vdev(idev); > +} > + > void iommufd_device_destroy(struct iommufd_object *obj) > { > struct iommufd_device *idev = > container_of(obj, struct iommufd_device, obj); > > - iommufd_device_remove_vdev(idev); > iommu_device_release_dma_owner(idev->dev); > iommufd_put_group(idev->igroup); > if (!iommufd_selftest_is_mock_dev(idev->dev)) > diff --git a/drivers/iommu/iommufd/main.c > b/drivers/iommu/iommufd/main.c > index b2e8e106c16158..387c630fdabfbd 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -151,6 +151,9 @@ static int > iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx, > if (refcount_dec_and_test(&to_destroy->shortterm_users)) > return 0; > > + if (iommufd_object_ops[to_destroy->type].pre_destroy) > + iommufd_object_ops[to_destroy- > >type].pre_destroy(to_destroy); > + > if (wait_event_timeout(ictx->destroy_wait, > refcount_read(&to_destroy->shortterm_users) > == 0, > msecs_to_jiffies(60000))) > @@ -567,6 +570,7 @@ static const struct iommufd_object_ops > iommufd_object_ops[] = { > .destroy = iommufd_access_destroy_object, > }, > [IOMMUFD_OBJ_DEVICE] = { > + .pre_destroy = iommufd_device_pre_destroy, > .destroy = iommufd_device_destroy, > }, > [IOMMUFD_OBJ_FAULT] = { > diff --git a/drivers/iommu/iommufd/viommu.c > b/drivers/iommu/iommufd/viommu.c > index 9451a311745f7b..cbf99daa7dc25d 100644 > --- a/drivers/iommu/iommufd/viommu.c > +++ b/drivers/iommu/iommufd/viommu.c > @@ -135,6 +135,7 @@ void iommufd_vdevice_destroy(struct > iommufd_object *obj) > mutex_lock(&vdev->idev->igroup->lock); > iommufd_vdevice_abort(obj); > mutex_unlock(&vdev->idev->igroup->lock); > + iommufd_put_object(vdev->viommu->ictx, &vdev->idev->obj); > } > > int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) > @@ -180,13 +181,19 @@ int iommufd_vdevice_alloc_ioctl(struct > iommufd_ucmd *ucmd) > vdev->id = virt_id; > vdev->viommu = viommu; > refcount_inc(&viommu->obj.users); > + > + /* > + * A reference is held on the idev so long as we have a pointer. > + * iommufd_device_pre_destroy() will revoke it before the real > + * destruction. > + */ > + vdev->idev = idev; > + > /* > * iommufd_device_destroy() waits until idev->vdev is NULL before > * freeing the idev, which only happens once the vdev is finished > - * destruction. Thus we do not need refcounting on either idev->vdev > or > - * vdev->idev. > + * destruction. > */ > - vdev->idev = idev; > idev->vdev = vdev; > > curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, > GFP_KERNEL); > @@ -207,7 +214,8 @@ int iommufd_vdevice_alloc_ioctl(struct > iommufd_ucmd *ucmd) > out_unlock_igroup: > mutex_unlock(&idev->igroup->lock); > out_put_idev: > - iommufd_put_object(ucmd->ictx, &idev->obj); > + if (rc) > + iommufd_put_object(ucmd->ictx, &idev->obj); but this sounds like a misuse of shortterm_users which is not intended to be held long beyond ioctl... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-03 4:32 ` Tian, Kevin @ 2025-07-03 11:21 ` Jason Gunthorpe 0 siblings, 0 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-07-03 11:21 UTC (permalink / raw) To: Tian, Kevin Cc: Xu Yilun, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Thu, Jul 03, 2025 at 04:32:26AM +0000, Tian, Kevin wrote: > but this sounds like a misuse of shortterm_users which is not intended > to be held long beyond ioctl... With the addition of the pre_destroy it purpose gets changed to 'beyond the ioctl or past pre_destroy' Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-02 12:40 ` Jason Gunthorpe 2025-07-03 4:32 ` Tian, Kevin @ 2025-07-07 10:58 ` Xu Yilun 2025-07-07 12:25 ` Jason Gunthorpe 1 sibling, 1 reply; 32+ messages in thread From: Xu Yilun @ 2025-07-07 10:58 UTC (permalink / raw) To: Jason Gunthorpe Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Wed, Jul 02, 2025 at 09:40:42AM -0300, Jason Gunthorpe wrote: > On Wed, Jul 02, 2025 at 09:13:50AM +0000, Tian, Kevin wrote: > > > > Yes, you can't touch idev inside the destroy function at all, under > > > > any version. idev is only valid if you have a refcount on vdev. > > > > > > > > But why are you touching this lock? Arrange things so abort doesn't > > > > touch the idev?? > > > > > > idev has a pointer idev->vdev to track the vdev's lifecycle. > > > idev->igroup->lock protects the pointer. At the end of > > > iommufd_vdevice_destroy() this pointer should be NULLed so that idev > > > knows vdev is really destroyed. > > Well, that is destroy, not abort, but OK, there is an issue with > destroy. > > > but comparing to that I'd prefer to the original wait approach... > > Okay, but lets try to keep the wait hidden inside the refcounting.. > > The issue here is we don't hold a refcount on idev while working with > idev. Let's fix that and then things should work properly? > > Maybe something like this: > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c > index 4e781aa9fc6329..9174fa7c972b80 100644 > --- a/drivers/iommu/iommufd/device.c > +++ b/drivers/iommu/iommufd/device.c > @@ -178,12 +178,20 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev) > mutex_unlock(&idev->igroup->lock); > } > > +void iommufd_device_pre_destroy(struct iommufd_object *obj) > +{ > + struct iommufd_device *idev = > + container_of(obj, struct iommufd_device, obj); > + > + /* Release the short term users on this */ > + iommufd_device_remove_vdev(idev); > +} > + > void iommufd_device_destroy(struct iommufd_object *obj) > { > struct iommufd_device *idev = > container_of(obj, struct iommufd_device, obj); > > - iommufd_device_remove_vdev(idev); > iommu_device_release_dma_owner(idev->dev); > iommufd_put_group(idev->igroup); > if (!iommufd_selftest_is_mock_dev(idev->dev)) > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index b2e8e106c16158..387c630fdabfbd 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -151,6 +151,9 @@ static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx, > if (refcount_dec_and_test(&to_destroy->shortterm_users)) > return 0; > > + if (iommufd_object_ops[to_destroy->type].pre_destroy) > + iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy); > + > if (wait_event_timeout(ictx->destroy_wait, > refcount_read(&to_destroy->shortterm_users) ==i 0, I still see concerns here. The pre_destroy() and wait_event is before the idev's users count decreasing to 0, which means a new user could get in. The worse thing is the new user could hold the shortterm_users until pre_destroy(), but there isn't a second pre_destroy()... I think extending the life of shortterm_users at runtime makes things complex. My idea is keep the original definition of shortterm_users at runtime, only repurpose it as a wait completion after users refcount == 0. I.e. idev increase its own shortterm_users, waiting for vdev to decrease it. diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 9876bf70980a..49b787bc0688 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -147,8 +147,15 @@ static void iommufd_device_remove_vdev(struct iommufd_device *idev) goto out_unlock; vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); - if (IS_ERR(vdev)) + if (IS_ERR(vdev)) { + /* + * vdev is removed from xarray by userspace, but is not + * destroyed. We should wait for vdev real destruction before + * idev destruction. + */ + iommufd_object_destroy_wait_prepare(&idev->obj); goto out_unlock; + } /* Should never happen */ if (WARN_ON(vdev != idev->vdev)) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 08f3c66366ea..6620006df8d1 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -180,6 +180,47 @@ static inline void iommufd_put_object(struct iommufd_ctx *ictx, wake_up_interruptible_all(&ictx->destroy_wait); } +/* + * The iommufd_object_destroy_wait_xx() are used to synchronize the last minute + * destructions of interrelated objects. E.g. the vdev's destroy() should + * always be called before idev's destroy(). + * + * iommufd_object_destroy_wait_prepare() should only be called by + * later-destroyer in its pre_destroy() ops, where the users & shortterm_users + * refcounts are all zeroed. I.e. no new user is possible. Use "users == 0 && + * shortterm_users == 1" as the "destroy wait" state, as this combination + * is not used in runtime usecase and it also prevents new users. + * + * The later-destroyer should use their own locking to ensure the + * first-destroyer still exists when calling this function. + */ +static inline void +iommufd_object_destroy_wait_prepare(struct iommufd_object *to_destroy) +{ + if (WARN_ON(refcount_read(&to_destroy->users) != 0 || + refcount_read(&to_destroy->shortterm_users) != 0)) + return; + + refcount_set(&to_destroy->shortterm_users, 1); +} + +/* + * iommufd_object_destroy_wait_complete() should be called by first-destroyer + * in its destroy() ops. The first-destroyer should not access the + * later-destroyer after calling this function. + */ +static inline void +iommufd_object_destroy_wait_complete(struct iommufd_ctx *ictx, + struct iommufd_object *to_destroy) +{ + if (refcount_read(&to_destroy->users) != 0 || + refcount_read(&to_destroy->shortterm_users) != 1) + return; + + refcount_dec(&to_destroy->shortterm_users); + wake_up_interruptible_all(&ictx->destroy_wait); +} + void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj); void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx, struct iommufd_object *obj); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index fbccfbd8fb03..148218293f55 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -96,15 +96,37 @@ struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id, return obj; } +static int iommufd_object_pre_destroy_wait(struct iommufd_ctx *ictx, + struct iommufd_object *to_destroy) +{ + if (!iommufd_object_ops[to_destroy->type].pre_destroy) + return 0; + + /* + * pre_destroy() may put the object in destroy wait state (users == 0 + * && shortterm_users == 1). Another thread completes the waiting by + * decreasing the shortterm_users back to 0. The helpers + * iommufd_object_destroy_wait_prepare() and + * iommufd_object_destroy_wait_complete() should be used for this + * purpose. + */ + iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy); + if (wait_event_timeout(ictx->destroy_wait, + refcount_read(&to_destroy->shortterm_users) == + 0, + msecs_to_jiffies(60000))) + return 0; + + pr_crit("Time out waiting for iommufd object to become free\n"); + return -EBUSY; +} + static int iommufd_object_dec_wait_shortterm(struct iommufd_ctx *ictx, struct iommufd_object *to_destroy) { if (refcount_dec_and_test(&to_destroy->shortterm_users)) return 0; - if (iommufd_object_ops[to_destroy->type].pre_destroy) - iommufd_object_ops[to_destroy->type].pre_destroy(to_destroy); - if (wait_event_timeout(ictx->destroy_wait, refcount_read(&to_destroy->shortterm_users) == 0, @@ -184,6 +206,10 @@ int iommufd_object_remove(struct iommufd_ctx *ictx, ret = iommufd_object_dec_wait_shortterm(ictx, obj); if (WARN_ON(ret)) return ret; + } else { + ret = iommufd_object_pre_destroy_wait(ictx, obj); + if (WARN_ON(ret)) + return ret; } iommufd_object_ops[obj->type].destroy(obj); diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index d49e8f0c8bb3..a7feefd5e472 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -103,11 +103,12 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj) { struct iommufd_vdevice *vdev = container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_device *idev = vdev->idev; - mutex_lock(&vdev->idev->igroup->lock); + mutex_lock(&idev->igroup->lock); iommufd_vdevice_abort(obj); - mutex_unlock(&vdev->idev->igroup->lock); - iommufd_put_object(vdev->viommu->ictx, &vdev->idev->obj); + mutex_unlock(&idev->igroup->lock); + iommufd_object_destroy_wait_complete(idev->ictx, &idev->obj); } int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -185,8 +186,7 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) out_unlock_igroup: mutex_unlock(&idev->igroup->lock); out_put_idev: - if (rc) - iommufd_put_object(ucmd->ictx, &idev->obj); + iommufd_put_object(ucmd->ictx, &idev->obj); out_put_viommu: iommufd_put_object(ucmd->ictx, &viommu->obj); return rc; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-07 10:58 ` Xu Yilun @ 2025-07-07 12:25 ` Jason Gunthorpe 2025-07-07 19:41 ` Xu Yilun 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2025-07-07 12:25 UTC (permalink / raw) To: Xu Yilun Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Mon, Jul 07, 2025 at 06:58:45PM +0800, Xu Yilun wrote: > I still see concerns here. The pre_destroy() and wait_event is before the > idev's users count decreasing to 0, which means a new user could get > in. Hum, yes it seems we could get a race with another vdevice being created during idevice destruction, but I think this could be fixed in a straightforward way with a flag 'being destroyed' in the idev under the lock if it is the only issue > The worse thing is the new user could hold the shortterm_users until > pre_destroy(), but there isn't a second pre_destroy()... This is OK, the point of pre_destroy is to remove the reference cycle. So it needs to prevent new vdevs from establishing references and remove the reference from any existing vdev. refcounts outside of this cycle are handled by the existing mechanisms - pre_destroy is only about the refcount loop. > I think extending the life of shortterm_users at runtime makes things > complex. Well, it keeps the refcounting working the way it is supposed to. Holding a pointer without a reference is nasty and is creating alot of these weird problems and strange flows. Hold the refcount along with the pointer and things follow the normal patterns. There is nothing inherently wrong with shortterm users for this besides its name. Think of it is as 'wait for users', and we clearly want to wait for the vdev user to put back its idev reference. I think there are too many gyrations with refcounts in this version below. It is hard to reason about if zero refcounts are re-elevated to non-zero. > vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); > - if (IS_ERR(vdev)) > + if (IS_ERR(vdev)) { > + /* > + * vdev is removed from xarray by userspace, but is not > + * destroyed. We should wait for vdev real destruction before > + * idev destruction. > + */ > + iommufd_object_destroy_wait_prepare(&idev->obj); > goto out_unlock; This is now really weird, we are mangling the refcounts and the reasoning why it is safe with the concurrent vdev thread is tricky.. > +/* > + * iommufd_object_destroy_wait_complete() should be called by first-destroyer > + * in its destroy() ops. The first-destroyer should not access the > + * later-destroyer after calling this function. > + */ > +static inline void > +iommufd_object_destroy_wait_complete(struct iommufd_ctx *ictx, > + struct iommufd_object *to_destroy) > +{ > + if (refcount_read(&to_destroy->users) != 0 || > + refcount_read(&to_destroy->shortterm_users) != 1) > + return; > + > + refcount_dec(&to_destroy->shortterm_users); > + wake_up_interruptible_all(&ictx->destroy_wait); > +} This looks weird too, why is refcount_read safe? That is usually not how this kind of logic is written. I'm not sure this is OK, it is not an obvious known-good pattern for lifetime management. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-07-07 12:25 ` Jason Gunthorpe @ 2025-07-07 19:41 ` Xu Yilun 0 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-07-07 19:41 UTC (permalink / raw) To: Jason Gunthorpe Cc: Tian, Kevin, will@kernel.org, aneesh.kumar@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org, nicolinc@nvidia.com, aik@amd.com, Williams, Dan J, baolu.lu@linux.intel.com, Xu, Yilun On Mon, Jul 07, 2025 at 09:25:02AM -0300, Jason Gunthorpe wrote: > On Mon, Jul 07, 2025 at 06:58:45PM +0800, Xu Yilun wrote: > > > I still see concerns here. The pre_destroy() and wait_event is before the > > idev's users count decreasing to 0, which means a new user could get > > in. > > Hum, yes it seems we could get a race with another vdevice being > created during idevice destruction, but I think this could be fixed in > a straightforward way with a flag 'being destroyed' in the idev under > the lock if it is the only issue > > > The worse thing is the new user could hold the shortterm_users until > > pre_destroy(), but there isn't a second pre_destroy()... > > This is OK, the point of pre_destroy is to remove the reference > cycle. So it needs to prevent new vdevs from establishing references > and remove the reference from any existing vdev. > > refcounts outside of this cycle are handled by the existing > mechanisms - pre_destroy is only about the refcount loop. That works for me. I didn't prefer a being destroyed flag cause it seems not align with the current refcounting mechanism. But I see it makes sense cause idev does deterministic destruction. The only concern is idev cannot actually prevent new reference by itself, need vdev to honor the flag. But just keep simple, don't over-engineering. Thanks, Yilun > > > I think extending the life of shortterm_users at runtime makes things > > complex. > > Well, it keeps the refcounting working the way it is supposed > to. Holding a pointer without a reference is nasty and is creating > alot of these weird problems and strange flows. Hold the refcount > along with the pointer and things follow the normal patterns. > > There is nothing inherently wrong with shortterm users for this > besides its name. Think of it is as 'wait for users', and we clearly > want to wait for the vdev user to put back its idev reference. > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun 2025-06-30 5:08 ` Baolu Lu 2025-06-30 6:27 ` Tian, Kevin @ 2025-06-30 19:34 ` Nicolin Chen 2025-07-08 8:12 ` Xu Yilun 2 siblings, 1 reply; 32+ messages in thread From: Nicolin Chen @ 2025-06-30 19:34 UTC (permalink / raw) To: Xu Yilun Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu, yilun.xu On Fri, Jun 27, 2025 at 11:38:06AM +0800, Xu Yilun wrote: > +static void iommufd_device_remove_vdev(struct iommufd_device *idev) > +{ > + struct iommufd_vdevice *vdev; > + > + mutex_lock(&idev->igroup->lock); > + /* vdev has been completely destroyed by userspace */ > + if (!idev->vdev) > + goto out_unlock; > + > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); > + if (IS_ERR(vdev)) { > + /* > + * vdev is removed from xarray by userspace, but is not > + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant, > + * safe to destroy vdev here. > + */ > + iommufd_vdevice_abort(&idev->vdev->obj); > + goto out_unlock; This is the case #3, i.e. a racing vdev destory, in the commit log? I think it is worth clarifying that there is a concurrent destroy: /* * An ongoing vdev destroy ioctl has removed the vdev from the * object xarray but has not finished iommufd_vdevice_destroy() * yet, as it is holding the same mutex. Destroy the vdev here, * i.e. the iommufd_vdevice_destroy() will be a NOP once it is * unlocked. */ > @@ -147,10 +183,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) > if (rc) > goto out_abort; > iommufd_object_finalize(ucmd->ictx, &vdev->obj); > - goto out_put_idev; > + goto out_unlock_igroup; > > out_abort: > iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); Looks like we will have to partially revert the _ucmd allocator, in this function: https://lore.kernel.org/all/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com/ Please try fixing the conflicts on top of Jason's for-next tree. Thanks Nicolin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy 2025-06-30 19:34 ` Nicolin Chen @ 2025-07-08 8:12 ` Xu Yilun 0 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-07-08 8:12 UTC (permalink / raw) To: Nicolin Chen Cc: jgg, jgg, kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, aik, dan.j.williams, baolu.lu, yilun.xu On Mon, Jun 30, 2025 at 12:34:03PM -0700, Nicolin Chen wrote: > On Fri, Jun 27, 2025 at 11:38:06AM +0800, Xu Yilun wrote: > > +static void iommufd_device_remove_vdev(struct iommufd_device *idev) > > +{ > > + struct iommufd_vdevice *vdev; > > + > > + mutex_lock(&idev->igroup->lock); > > + /* vdev has been completely destroyed by userspace */ > > + if (!idev->vdev) > > + goto out_unlock; > > + > > + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id); > > + if (IS_ERR(vdev)) { > > + /* > > + * vdev is removed from xarray by userspace, but is not > > + * destroyed/freed. Since iommufd_vdevice_abort() is reentrant, > > + * safe to destroy vdev here. > > + */ > > + iommufd_vdevice_abort(&idev->vdev->obj); > > + goto out_unlock; > > This is the case #3, i.e. a racing vdev destory, in the commit log? > > I think it is worth clarifying that there is a concurrent destroy: > /* > * An ongoing vdev destroy ioctl has removed the vdev from the > * object xarray but has not finished iommufd_vdevice_destroy() > * yet, as it is holding the same mutex. Applied this part. > * Destroy the vdev here, > * i.e. the iommufd_vdevice_destroy() will be a NOP once it is > * unlocked. > */ > > > @@ -147,10 +183,12 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) > > if (rc) > > goto out_abort; > > iommufd_object_finalize(ucmd->ictx, &vdev->obj); > > - goto out_put_idev; > > + goto out_unlock_igroup; > > > > out_abort: > > iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); > > +out_unlock_igroup: > > + mutex_unlock(&idev->igroup->lock); > > Looks like we will have to partially revert the _ucmd allocator, > in this function: > https://lore.kernel.org/all/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com/ > > Please try fixing the conflicts on top of Jason's for-next tree. Yes, will rebase for next version. Thanks, Yilun > > Thanks > Nicolin ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice 2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun 2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun 2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun @ 2025-06-27 3:38 ` Xu Yilun 2025-06-30 5:11 ` Baolu Lu 2025-07-04 15:06 ` Jason Gunthorpe 2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun 2025-06-27 3:38 ` [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun 4 siblings, 2 replies; 32+ messages in thread From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw) To: jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu Remove struct device *dev from struct vdevice. The dev pointer is the Plan B for vdevice to reference the physical device. As now vdev->idev is added without refcounting concern, just use vdev->idev->dev when needed. Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- drivers/iommu/iommufd/driver.c | 4 ++-- drivers/iommu/iommufd/iommufd_private.h | 1 - drivers/iommu/iommufd/viommu.c | 3 --- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 922cd1fe7ec2..942d402bba36 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -45,7 +45,7 @@ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, lockdep_assert_held(&viommu->vdevs.xa_lock); vdev = xa_load(&viommu->vdevs, vdev_id); - return vdev ? vdev->dev : NULL; + return vdev ? vdev->idev->dev : NULL; } EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, "IOMMUFD"); @@ -62,7 +62,7 @@ int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, xa_lock(&viommu->vdevs); xa_for_each(&viommu->vdevs, index, vdev) { - if (vdev->dev == dev) { + if (vdev->idev->dev == dev) { *vdev_id = vdev->id; rc = 0; break; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index f58aa4439c53..3700193471db 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -628,7 +628,6 @@ struct iommufd_vdevice { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_viommu *viommu; - struct device *dev; u64 id; /* per-vIOMMU virtual ID */ struct iommufd_device *idev; }; diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 632d1d7b8fd8..452a7a24d738 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -103,7 +103,6 @@ void iommufd_vdevice_abort(struct iommufd_object *obj) /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); - put_device(vdev->dev); vdev->viommu = NULL; idev->vdev = NULL; } @@ -159,8 +158,6 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) } vdev->id = virt_id; - vdev->dev = idev->dev; - get_device(idev->dev); vdev->viommu = viommu; refcount_inc(&viommu->obj.users); /* -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice 2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun @ 2025-06-30 5:11 ` Baolu Lu 2025-07-04 15:06 ` Jason Gunthorpe 1 sibling, 0 replies; 32+ messages in thread From: Baolu Lu @ 2025-06-30 5:11 UTC (permalink / raw) To: Xu Yilun, jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, yilun.xu On 6/27/25 11:38, Xu Yilun wrote: > Remove struct device *dev from struct vdevice. > > The dev pointer is the Plan B for vdevice to reference the physical > device. As now vdev->idev is added without refcounting concern, just > use vdev->idev->dev when needed. > > Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice 2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun 2025-06-30 5:11 ` Baolu Lu @ 2025-07-04 15:06 ` Jason Gunthorpe 1 sibling, 0 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-07-04 15:06 UTC (permalink / raw) To: Xu Yilun Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu On Fri, Jun 27, 2025 at 11:38:07AM +0800, Xu Yilun wrote: > Remove struct device *dev from struct vdevice. > > The dev pointer is the Plan B for vdevice to reference the physical > device. As now vdev->idev is added without refcounting concern, just > use vdev->idev->dev when needed. > > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> > --- > drivers/iommu/iommufd/driver.c | 4 ++-- > drivers/iommu/iommufd/iommufd_private.h | 1 - > drivers/iommu/iommufd/viommu.c | 3 --- > 3 files changed, 2 insertions(+), 6 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant 2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun ` (2 preceding siblings ...) 2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun @ 2025-06-27 3:38 ` Xu Yilun 2025-07-04 15:07 ` Jason Gunthorpe 2025-06-27 3:38 ` [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun 4 siblings, 1 reply; 32+ messages in thread From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw) To: jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu no_viommu is not applicable for some viommu/vdevice tests. Explicitly report the skipping, don't do it silently. Only add the prints. No functional change intended. Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- tools/testing/selftests/iommu/iommufd.c | 378 ++++++++++++------------ 1 file changed, 190 insertions(+), 188 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 1a8e85afe9aa..4a9b6e3b37fa 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2760,35 +2760,36 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) uint32_t fault_fd; uint32_t vdev_id; - if (self->device_id) { - test_ioctl_fault_alloc(&fault_id, &fault_fd); - test_err_hwpt_alloc_iopf( - ENOENT, dev_id, viommu_id, UINT32_MAX, - IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, - IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); - test_err_hwpt_alloc_iopf( - EOPNOTSUPP, dev_id, viommu_id, fault_id, - IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id, - IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); - test_cmd_hwpt_alloc_iopf( - dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID, - &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data, - sizeof(data)); + if (!dev_id) + SKIP(return, "Skipping test for variant no_viommu"); - /* Must allocate vdevice before attaching to a nested hwpt */ - test_err_mock_domain_replace(ENOENT, self->stdev_id, - iopf_hwpt_id); - test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); - test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); - EXPECT_ERRNO(EBUSY, - _test_ioctl_destroy(self->fd, iopf_hwpt_id)); - test_cmd_trigger_iopf(dev_id, fault_fd); + test_ioctl_fault_alloc(&fault_id, &fault_fd); + test_err_hwpt_alloc_iopf( + ENOENT, dev_id, viommu_id, UINT32_MAX, + IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_err_hwpt_alloc_iopf( + EOPNOTSUPP, dev_id, viommu_id, fault_id, + IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); + test_cmd_hwpt_alloc_iopf( + dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID, + &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data, + sizeof(data)); + + /* Must allocate vdevice before attaching to a nested hwpt */ + test_err_mock_domain_replace(ENOENT, self->stdev_id, + iopf_hwpt_id); + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, iopf_hwpt_id)); + test_cmd_trigger_iopf(dev_id, fault_fd); - test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); - test_ioctl_destroy(iopf_hwpt_id); - close(fault_fd); - test_ioctl_destroy(fault_id); - } + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(iopf_hwpt_id); + close(fault_fd); + test_ioctl_destroy(fault_id); } TEST_F(iommufd_viommu, vdevice_alloc) @@ -2849,169 +2850,170 @@ TEST_F(iommufd_viommu, vdevice_cache) uint32_t vdev_id = 0; uint32_t num_inv; - if (dev_id) { - test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); - - test_cmd_dev_check_cache_all(dev_id, - IOMMU_TEST_DEV_CACHE_DEFAULT); - - /* Check data_type by passing zero-length array */ - num_inv = 0; - test_cmd_viommu_invalidate(viommu_id, inv_reqs, - sizeof(*inv_reqs), &num_inv); - assert(!num_inv); - - /* Negative test: Invalid data_type */ - num_inv = 1; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, - sizeof(*inv_reqs), &num_inv); - assert(!num_inv); - - /* Negative test: structure size sanity */ - num_inv = 1; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs) + 1, &num_inv); - assert(!num_inv); - - num_inv = 1; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - 1, &num_inv); - assert(!num_inv); - - /* Negative test: invalid flag is passed */ - num_inv = 1; - inv_reqs[0].flags = 0xffffffff; - inv_reqs[0].vdev_id = 0x99; - test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs), &num_inv); - assert(!num_inv); - - /* Negative test: invalid data_uptr when array is not empty */ - num_inv = 1; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - test_err_viommu_invalidate(EINVAL, viommu_id, NULL, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs), &num_inv); - assert(!num_inv); - - /* Negative test: invalid entry_len when array is not empty */ - num_inv = 1; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - 0, &num_inv); - assert(!num_inv); - - /* Negative test: invalid cache_id */ - num_inv = 1; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs), &num_inv); - assert(!num_inv); - - /* Negative test: invalid vdev_id */ - num_inv = 1; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x9; - inv_reqs[0].cache_id = 0; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs), &num_inv); - assert(!num_inv); - - /* - * Invalidate the 1st cache entry but fail the 2nd request - * due to invalid flags configuration in the 2nd request. - */ - num_inv = 2; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - inv_reqs[0].cache_id = 0; - inv_reqs[1].flags = 0xffffffff; - inv_reqs[1].vdev_id = 0x99; - inv_reqs[1].cache_id = 1; - test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs), &num_inv); - assert(num_inv == 1); - test_cmd_dev_check_cache(dev_id, 0, 0); - test_cmd_dev_check_cache(dev_id, 1, - IOMMU_TEST_DEV_CACHE_DEFAULT); - test_cmd_dev_check_cache(dev_id, 2, - IOMMU_TEST_DEV_CACHE_DEFAULT); - test_cmd_dev_check_cache(dev_id, 3, - IOMMU_TEST_DEV_CACHE_DEFAULT); + if (!dev_id) + SKIP(return, "Skipping test for variant no_viommu"); + + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + + test_cmd_dev_check_cache_all(dev_id, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Check data_type by passing zero-length array */ + num_inv = 0; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: Invalid data_type */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs) + 1, &num_inv); + assert(!num_inv); + + num_inv = 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 1, &num_inv); + assert(!num_inv); + + /* Negative test: invalid flag is passed */ + num_inv = 1; + inv_reqs[0].flags = 0xffffffff; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid data_uptr when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, NULL, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid entry_len when array is not empty */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + 0, &num_inv); + assert(!num_inv); + + /* Negative test: invalid cache_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); + + /* Negative test: invalid vdev_id */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x9; + inv_reqs[0].cache_id = 0; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(!num_inv); - /* - * Invalidate the 1st cache entry but fail the 2nd request - * due to invalid cache_id configuration in the 2nd request. - */ - num_inv = 2; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - inv_reqs[0].cache_id = 0; - inv_reqs[1].flags = 0; - inv_reqs[1].vdev_id = 0x99; - inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; - test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, - IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, - sizeof(*inv_reqs), &num_inv); - assert(num_inv == 1); - test_cmd_dev_check_cache(dev_id, 0, 0); - test_cmd_dev_check_cache(dev_id, 1, - IOMMU_TEST_DEV_CACHE_DEFAULT); - test_cmd_dev_check_cache(dev_id, 2, - IOMMU_TEST_DEV_CACHE_DEFAULT); - test_cmd_dev_check_cache(dev_id, 3, - IOMMU_TEST_DEV_CACHE_DEFAULT); - - /* Invalidate the 2nd cache entry and verify */ - num_inv = 1; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - inv_reqs[0].cache_id = 1; - test_cmd_viommu_invalidate(viommu_id, inv_reqs, - sizeof(*inv_reqs), &num_inv); - assert(num_inv == 1); - test_cmd_dev_check_cache(dev_id, 0, 0); - test_cmd_dev_check_cache(dev_id, 1, 0); - test_cmd_dev_check_cache(dev_id, 2, - IOMMU_TEST_DEV_CACHE_DEFAULT); - test_cmd_dev_check_cache(dev_id, 3, - IOMMU_TEST_DEV_CACHE_DEFAULT); - - /* Invalidate the 3rd and 4th cache entries and verify */ - num_inv = 2; - inv_reqs[0].flags = 0; - inv_reqs[0].vdev_id = 0x99; - inv_reqs[0].cache_id = 2; - inv_reqs[1].flags = 0; - inv_reqs[1].vdev_id = 0x99; - inv_reqs[1].cache_id = 3; - test_cmd_viommu_invalidate(viommu_id, inv_reqs, - sizeof(*inv_reqs), &num_inv); - assert(num_inv == 2); - test_cmd_dev_check_cache_all(dev_id, 0); + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid flags configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0xffffffff; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 1; + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); - /* Invalidate all cache entries for nested_dev_id[1] and verify */ - num_inv = 1; - inv_reqs[0].vdev_id = 0x99; - inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; - test_cmd_viommu_invalidate(viommu_id, inv_reqs, - sizeof(*inv_reqs), &num_inv); - assert(num_inv == 1); - test_cmd_dev_check_cache_all(dev_id, 0); - test_ioctl_destroy(vdev_id); - } + /* + * Invalidate the 1st cache entry but fail the 2nd request + * due to invalid cache_id configuration in the 2nd request. + */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 0; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 2nd cache entry and verify */ + num_inv = 1; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 1; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache(dev_id, 0, 0); + test_cmd_dev_check_cache(dev_id, 1, 0); + test_cmd_dev_check_cache(dev_id, 2, + IOMMU_TEST_DEV_CACHE_DEFAULT); + test_cmd_dev_check_cache(dev_id, 3, + IOMMU_TEST_DEV_CACHE_DEFAULT); + + /* Invalidate the 3rd and 4th cache entries and verify */ + num_inv = 2; + inv_reqs[0].flags = 0; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].cache_id = 2; + inv_reqs[1].flags = 0; + inv_reqs[1].vdev_id = 0x99; + inv_reqs[1].cache_id = 3; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 2); + test_cmd_dev_check_cache_all(dev_id, 0); + + /* Invalidate all cache entries for nested_dev_id[1] and verify */ + num_inv = 1; + inv_reqs[0].vdev_id = 0x99; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; + test_cmd_viommu_invalidate(viommu_id, inv_reqs, + sizeof(*inv_reqs), &num_inv); + assert(num_inv == 1); + test_cmd_dev_check_cache_all(dev_id, 0); + test_ioctl_destroy(vdev_id); } FIXTURE(iommufd_device_pasid) -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant 2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun @ 2025-07-04 15:07 ` Jason Gunthorpe 0 siblings, 0 replies; 32+ messages in thread From: Jason Gunthorpe @ 2025-07-04 15:07 UTC (permalink / raw) To: Xu Yilun Cc: kevin.tian, will, aneesh.kumar, iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu On Fri, Jun 27, 2025 at 11:38:08AM +0800, Xu Yilun wrote: > no_viommu is not applicable for some viommu/vdevice tests. Explicitly > report the skipping, don't do it silently. > > Only add the prints. No functional change intended. > > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> > --- > tools/testing/selftests/iommu/iommufd.c | 378 ++++++++++++------------ > 1 file changed, 190 insertions(+), 188 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone 2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun ` (3 preceding siblings ...) 2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun @ 2025-06-27 3:38 ` Xu Yilun 4 siblings, 0 replies; 32+ messages in thread From: Xu Yilun @ 2025-06-27 3:38 UTC (permalink / raw) To: jgg, jgg, kevin.tian, will, aneesh.kumar Cc: iommu, linux-kernel, joro, robin.murphy, shuah, nicolinc, aik, dan.j.williams, baolu.lu, yilun.xu This tests the flow to tombstone vdevice when idevice is to be unbound before vdevice destruction. The expected result is: - idevice unbinding tombstones vdevice ID, the ID can't be repurposed anymore. - Even ioctl(IOMMU_DESTROY) can't free the tombstoned ID. - iommufd_fops_release() can still free everything. Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> --- tools/testing/selftests/iommu/iommufd.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 4a9b6e3b37fa..e1470a7a42cd 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -3016,6 +3016,20 @@ TEST_F(iommufd_viommu, vdevice_cache) test_ioctl_destroy(vdev_id); } +TEST_F(iommufd_viommu, vdevice_tombstone) +{ + uint32_t viommu_id = self->viommu_id; + uint32_t dev_id = self->device_id; + uint32_t vdev_id = 0; + + if (!dev_id) + SKIP(return, "Skipping test for variant no_viommu"); + + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); + test_ioctl_destroy(self->stdev_id); + EXPECT_ERRNO(ENOENT, _test_ioctl_destroy(self->fd, vdev_id)); +} + FIXTURE(iommufd_device_pasid) { int fd; -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-07-08 8:54 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-27 3:38 [PATCH v3 0/5] iommufd: Destroy vdevice on device unbind Xu Yilun 2025-06-27 3:38 ` [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun 2025-06-30 3:08 ` Baolu Lu 2025-06-30 7:24 ` Xu Yilun 2025-06-30 5:52 ` Tian, Kevin 2025-06-30 6:41 ` Xu Yilun 2025-06-30 19:50 ` Nicolin Chen 2025-07-08 8:45 ` Xu Yilun 2025-06-27 3:38 ` [PATCH v3 2/5] iommufd: Destroy vdevice on idevice destroy Xu Yilun 2025-06-30 5:08 ` Baolu Lu 2025-07-08 8:34 ` Xu Yilun 2025-06-30 6:27 ` Tian, Kevin 2025-06-30 10:18 ` Xu Yilun 2025-06-30 14:50 ` Jason Gunthorpe 2025-07-01 9:19 ` Xu Yilun 2025-07-01 12:13 ` Jason Gunthorpe 2025-07-02 2:23 ` Xu Yilun 2025-07-02 9:13 ` Tian, Kevin 2025-07-02 12:40 ` Jason Gunthorpe 2025-07-03 4:32 ` Tian, Kevin 2025-07-03 11:21 ` Jason Gunthorpe 2025-07-07 10:58 ` Xu Yilun 2025-07-07 12:25 ` Jason Gunthorpe 2025-07-07 19:41 ` Xu Yilun 2025-06-30 19:34 ` Nicolin Chen 2025-07-08 8:12 ` Xu Yilun 2025-06-27 3:38 ` [PATCH v3 3/5] iommufd/vdevice: Remove struct device reference from struct vdevice Xu Yilun 2025-06-30 5:11 ` Baolu Lu 2025-07-04 15:06 ` Jason Gunthorpe 2025-06-27 3:38 ` [PATCH v3 4/5] iommufd/selftest: Explicitly skip tests for inapplicable variant Xu Yilun 2025-07-04 15:07 ` Jason Gunthorpe 2025-06-27 3:38 ` [PATCH v3 5/5] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.