Re: [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Xu Yilun <yilun.xu@linux.intel.com>
Cc: kevin.tian@intel.com, will@kernel.org, aneesh.kumar@kernel.org,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	joro@8bytes.org, robin.murphy@arm.com, shuah@kernel.org,
	nicolinc@nvidia.com, aik@amd.com, dan.j.williams@intel.com,
	baolu.lu@linux.intel.com, yilun.xu@intel.com
Subject: Re: [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy
Date: Tue, 24 Jun 2025 11:53:46 -0300	[thread overview]
Message-ID: <20250624145346.GC150753@nvidia.com> (raw)
In-Reply-To: <20250623094946.1714996-4-yilun.xu@linux.intel.com>

On Mon, Jun 23, 2025 at 05:49:45PM +0800, Xu Yilun wrote:
> +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> +{
> +	bool vdev_removing = false;
> +
> +	mutex_lock(&idev->igroup->lock);
> +	if (idev->vdev) {
> +		struct iommufd_vdevice *vdev;
> +
> +		vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> +		if (IS_ERR(vdev)) {

This incrs obj.users which will cause a concurrent
iommufd_object_remove() to fail with -EBUSY, which we are trying to
avoid.

Also you can hit a race where the tombstone has NULL'd the entry but
the racing destroy will then load the NULL with xas_load() and hit this:

		if (WARN_ON(obj != to_destroy)) {

So, this doesn't look like it will work right to me..

You want somewhat different destroy logic:

/*
 * The caller must directly obtain a shortterm_users reference without a users
 * reference using its own locking to protect the pointer. This function always
 * puts back the shortterm_users reference.
 */
int iommufd_object_remove_tombstone(struct iommufd_ctx *ictx,
				    struct iommufd_object *to_destroy)
{
	XA_STATE(xas, &ictx->objects, to_destroy->id);
	struct iommufd_object *obj;
	int ret;

	xa_lock(&ictx->objects);
	obj = xas_load(&xas);
	if (xa_is_zero(obj) || obj == NULL) {
		/*
		 * Another thread is racing to destroy this, since we have the
		 * shortterm_users refcount the other thread has xa_unlocked()
		 * but not passed iommufd_object_dec_wait_shortterm().
		 */
		if (refcount_dec_and_test(&to_destroy->shortterm_users))
			wake_up_interruptible_all(&ictx->destroy_wait);
		ret = 0;
		goto err_xa;
	} else if (WARN_ON(obj != to_destroy)) {
		refcount_dec(&obj->shortterm_users);
		ret = -ENOENT;
		goto err_xa;
	}

	/*
	 * The object is still in the xarray, so this thread will try to destroy
	 * it. Put back the callers shortterm_users.
	 */
	refcount_dec(&obj->shortterm_users);

	if (!refcount_dec_if_one(&obj->users)) {
		ret = -EBUSY;
		goto err_xa;
	}

	/* Leave behind a tombstone to prevent re-use of this entry */
	xas_store(&xas, XA_ZERO_ENTRY);
	xa_unlock(&ictx->objects);

	/*
	 * Since users is zero any positive users_shortterm must be racing
	 * iommufd_put_object(), or we have a bug.
	 */
	ret = iommufd_object_dec_wait_shortterm(ictx, obj);
	if (WARN_ON(ret))
		return ret;

	iommufd_object_ops[obj->type].destroy(obj);
	kfree(obj);
	return 0;

err_xa:
	xa_unlock(&ictx->objects);

	/* The returned object reference count is zero */
	return ret;
}

Then you'd call it by doing something like:

static void iommufd_device_remove_vdev(struct iommufd_device *idev)
{
	struct iommufd_object *to_destroy = NULL;
	int ret;

	mutex_lock(&idev->igroup->lock);
	if (!idev->vdev) {
		mutex_unlock(&idev->igroup->lock);
		return;
	}
	if (refcount_inc_not_zero(&idev->vdev->obj.shortterm_users))
		to_destroy = &idev->vdev->obj;
	mutex_unlock(&idev->igroup->lock);

	if (to_destroy) {
		ret = iommufd_object_remove_tombstone(idev->ictx, to_destroy);
		if (WARN_ON(ret))
			return;
	}

	/*
	 * We don't know what thread is actually going to destroy the vdev, but
	 * once the vdev is destroyed the pointer is NULL'd. At this
	 * point idev->users is 0 so no other thread can set a new vdev.
	 */
	if (!wait_event_timeout(idev->ictx->destroy_wait,
				!READ_ONCE(idev->vdev),
				msecs_to_jiffies(60000)))
		pr_crit("Time out waiting for iommufd vdevice removed\n");
}

Though there is a cleaner option here, you could do:

	mutex_lock(&idev->igroup->lock);
	if (idev->vdev)
		iommufd_vdevice_abort(&idev->vdev->obj);
	mutex_unlock(&idev->igroup->lock);

And make it safe to call abort twice, eg by setting dev to NULL and
checking for that. First thread to get to the igroup lock, either via
iommufd_vdevice_destroy() or via the above will do the actual abort
synchronously without any wait_event_timeout. That seems better??

> +	/* vdev can't outlive idev, vdev->idev is always valid, need no refcnt */
> +	vdev->idev = idev;

So this means a soon as 'idev->vdev = NULL;' happens idev is an
invalid pointer. Need a WRITE_ONCE there.

I would rephrase the comment as 
 iommufd_device_destroy() waits until idev->vdev is NULL before
 freeing the idev, which only happens once the vdev is finished
 destruction. Thus we do not need refcounting on either idev->vdev or
 vdev->idev.

and group both assignments together.

>  	vdev->ictx = ucmd->ictx;
>  	vdev->id = virt_id;
>  	vdev->dev = idev->dev;
>  	get_device(idev->dev);
>  	vdev->viommu = viommu;
>  	refcount_inc(&viommu->obj.users);
> +	/* idev->vdev is protected by idev->igroup->lock, need no refcnt */
> +	idev->vdev = vdev;

This can be WRITE_ONCE too

Jason

next prev parent reply	other threads:[~2025-06-24 14:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-23  9:49 [PATCH v2 0/4] iommufd: Destroy vdevice on device unbind Xu Yilun
2025-06-23  9:49 ` [PATCH v2 1/4] iommufd: Add iommufd_object_tombstone_user() helper Xu Yilun
2025-06-24 13:35   ` Jason Gunthorpe
2025-06-25  7:24     ` Xu Yilun
2025-06-25  5:51   ` Aneesh Kumar K.V
2025-06-25  8:40     ` Xu Yilun
2025-06-23  9:49 ` [PATCH v2 2/4] iommufd/viommu: Fix the uninitialized iommufd_vdevice::ictx Xu Yilun
2025-06-24  3:24   ` Baolu Lu
2025-06-24  6:35     ` Xu Yilun
2025-06-23  9:49 ` [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy Xu Yilun
2025-06-24  3:32   ` Baolu Lu
2025-06-24  8:11     ` Xu Yilun
2025-06-24  8:28       ` Tian, Kevin
2025-06-24  8:12     ` Tian, Kevin
2025-06-25  1:55       ` Baolu Lu
2025-06-24  6:47   ` Xu Yilun
2025-06-24  8:22   ` Tian, Kevin
2025-06-26  4:59     ` Xu Yilun
2025-06-24 14:53   ` Jason Gunthorpe [this message]
2025-06-24 23:57     ` Tian, Kevin
2025-06-25  1:36       ` Jason Gunthorpe
2025-06-25  2:11         ` Tian, Kevin
2025-06-25 12:33           ` Jason Gunthorpe
2025-06-25 10:06     ` Xu Yilun
2025-06-25 12:38       ` Jason Gunthorpe
2025-06-26  3:31         ` Xu Yilun
2025-06-26 14:36           ` Jason Gunthorpe
2025-06-25  6:40   ` Aneesh Kumar K.V
2025-06-25  9:38     ` Xu Yilun
2025-06-23  9:49 ` [PATCH v2 4/4] iommufd/selftest: Add coverage for vdevice tombstone Xu Yilun
2025-06-24 13:41   ` Jason Gunthorpe
2025-06-25  8:29     ` Xu Yilun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250624145346.GC150753@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aik@amd.com \
    --cc=aneesh.kumar@kernel.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=yilun.xu@intel.com \
    --cc=yilun.xu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.