From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <kevin.tian@intel.com>, <will@kernel.org>, <joro@8bytes.org>,
<suravee.suthikulpanit@amd.com>, <robin.murphy@arm.com>,
<dwmw2@infradead.org>, <baolu.lu@linux.intel.com>,
<shuah@kernel.org>, <linux-kernel@vger.kernel.org>,
<iommu@lists.linux.dev>, <linux-arm-kernel@lists.infradead.org>,
<linux-kselftest@vger.kernel.org>, <eric.auger@redhat.com>,
<jean-philippe@linaro.org>, <mdf@kernel.org>,
<mshavit@google.com>, <shameerali.kolothum.thodi@huawei.com>,
<smostafa@google.com>, <yi.l.liu@intel.com>, <aik@amd.com>,
<zhangfei.gao@linaro.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl
Date: Tue, 29 Oct 2024 12:30:00 -0700 [thread overview]
Message-ID: <ZyE3uGyVx9ivJeHI@Asurada-Nvidia> (raw)
In-Reply-To: <20241029184801.GW6956@nvidia.com>
On Tue, Oct 29, 2024 at 03:48:01PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote:
> > On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
> > > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > > index 5fd3dd420290..e50113305a9c 100644
> > > > --- a/drivers/iommu/iommufd/device.c
> > > > +++ b/drivers/iommu/iommufd/device.c
> > > > @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > > > */
> > > > void iommufd_device_unbind(struct iommufd_device *idev)
> > > > {
> > > > + u32 vdev_id = 0;
> > > > +
> > > > + /* idev->vdev object should be destroyed prior, yet just in case.. */
> > > > + mutex_lock(&idev->igroup->lock);
> > > > + if (idev->vdev)
> > >
> > > Then should it have a WARN_ON here?
> >
> > It'd be a user space mistake that forgot to call the destroy ioctl
> > to the object, in which case I recall kernel shouldn't WARN_ON?
>
> But you can't get here because:
>
> refcount_inc(&idev->obj.users);
>
> And kernel doesn't destroy objects with elevated ref counts?
Hmm, this is not a ->destroy() but iommufd_device_unbind called
by VFIO. And we actually ran into this routine when QEMU didn't
destroy vdev. So, I added this chunk.
The iommufd_object_remove(vdev_id) here would destroy the vdev
where its destroy() does refcount_dec(&idev->obj.users). Then,
the following iommufd_object_destroy_user(.., &idev->obj) will
succeed.
With that said, let's just mandate userspace to destroy vdev.
> > > > + vdev_id = idev->vdev->obj.id;
> > > > + mutex_unlock(&idev->igroup->lock);
> > > > + /* Relying on xa_lock against a race with iommufd_destroy() */
> > > > + if (vdev_id)
> > > > + iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
> > >
> > > That doesn't seem right, iommufd_object_remove() should never be used
> > > to destroy an object that userspace created with an IOCTL, in fact
> > > that just isn't allowed.
> >
> > It was for our auto destroy feature.
>
> auto domains are "hidden" hwpts that are kernel managed. They are not
> "userspace created".
>
> "Usespace created" objects are ones that userspace is expected to call
> destroy on.
OK. I misunderstood that.
> If you destroy them behind the scenes in the kerenl then the objecd ID
> can be reallocated for something else and when userspace does DESTROY
> on the ID it thought was still allocated it will malfunction.
>
> So, only userspace can destroy objects that userspace created.
I see. That makes sense.
> > If user space forgot to destroy the object while trying to unplug
> > the device from VM. This saves the day.
>
> No, it should/does fail destroy of the VIOMMU object because the users
> refcount is elevated.
The vIOMMU object is refcount_dec also from the unbind() calling
remove(). But anyway, we aligned that userspace should destroy it
explicitly.
> > > Ugh, there is worse here, we can't hold a long term reference on a
> > > kernel owned object:
> > >
> > > idev->vdev = vdev;
> > > refcount_inc(&idev->obj.users);
> > >
> > > As it prevents the kernel from disconnecting it.
> >
> > Hmm, mind elaborating? I think the iommufd_fops_release() would
> > xa_for_each the object list that destroys the vdev object first
> > then this idev (and viommu too)?
>
> iommufd_device_unbind() can't fail, and if the object can't be
> destroyed because it has an elevated long term refcount it WARN's:
>
>
> ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
>
> /*
> * If there is a bug and we couldn't destroy the object then we did put
> * back the caller's users refcount and will eventually try to free it
> * again during close.
> */
> WARN_ON(ret);
>
> So you cannot take long term references on kernel owned objects. Only
> userspace owned objects.
OK. I think I had got this part. Gao ran into this WARN_ON at v3,
so I added iommufd_object_remove(vdev_id) in unbind() prior to
this iommufd_object_destroy_user(idev->ictx, &idev->obj).
> > OK. If user space forgot to destroy its vdev while unplugging the
> > device, it would not be allowed to hotplug another device (or the
> > same device) back to the same slot having the same RID, since the
> > RID on the vIOMMU would be occupied by the undestroyed vdev.
>
> Yes, that seems correct and obvious to me. Until the vdev is
> explicitly destroyed the ID is in-use.
>
> Good userspace should destroy the iommufd vDEVICE object before
> closing the VFIO file descriptor.
>
> If it doesn't, then the VDEVICE object remains even though the VFIO it
> was linked to is gone.
I see.
Thanks
Nicolin
next prev parent reply other threads:[~2024-10-29 19:32 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 23:50 [PATCH v5 00/13] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE) Nicolin Chen
2024-10-25 23:50 ` [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl Nicolin Chen
2024-10-28 3:11 ` Tian, Kevin
2024-10-28 20:18 ` Nicolin Chen
2024-10-29 15:58 ` Jason Gunthorpe
2024-10-29 17:29 ` Nicolin Chen
2024-10-29 18:48 ` Jason Gunthorpe
2024-10-29 19:30 ` Nicolin Chen [this message]
2024-10-30 0:08 ` Jason Gunthorpe
2024-10-25 23:50 ` [PATCH v5 02/13] iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage Nicolin Chen
2024-10-29 8:19 ` Tian, Kevin
2024-10-29 15:58 ` Jason Gunthorpe
2024-10-25 23:50 ` [PATCH v5 03/13] iommu/viommu: Add cache_invalidate to iommufd_viommu_ops Nicolin Chen
2024-10-29 8:19 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 04/13] iommufd/hw_pagetable: Enforce invalidation op on vIOMMU-based hwpt_nested Nicolin Chen
2024-10-29 8:22 ` Tian, Kevin
2024-10-29 16:04 ` Jason Gunthorpe
2024-10-30 0:41 ` Tian, Kevin
2024-10-29 16:01 ` Jason Gunthorpe
2024-10-25 23:50 ` [PATCH v5 05/13] iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE Nicolin Chen
2024-10-29 8:23 ` Tian, Kevin
2024-10-29 19:09 ` Jason Gunthorpe
2024-10-29 19:45 ` Nicolin Chen
2024-10-25 23:50 ` [PATCH v5 06/13] iommu: Add iommu_copy_struct_from_full_user_array helper Nicolin Chen
2024-10-29 8:24 ` Tian, Kevin
2024-10-30 4:08 ` Nicolin Chen
2024-10-25 23:50 ` [PATCH v5 07/13] iommufd/viommu: Add iommufd_viommu_find_dev helper Nicolin Chen
2024-10-27 15:02 ` Zhangfei Gao
2024-10-27 22:49 ` Nicolin Chen
2024-10-29 8:25 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 08/13] iommufd/selftest: Add mock_viommu_cache_invalidate Nicolin Chen
2024-10-29 8:25 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 09/13] iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command Nicolin Chen
2024-10-29 8:25 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 10/13] iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl Nicolin Chen
2024-10-29 8:26 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 11/13] Documentation: userspace-api: iommufd: Update vDEVICE Nicolin Chen
2024-10-29 8:40 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 12/13] iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate Nicolin Chen
2024-10-29 8:42 ` Tian, Kevin
2024-10-25 23:50 ` [PATCH v5 13/13] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED Nicolin Chen
2024-10-28 3:03 ` [PATCH v5 00/13] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE) Tian, Kevin
2024-10-28 14:17 ` Jason Gunthorpe
2024-10-29 8:51 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZyE3uGyVx9ivJeHI@Asurada-Nvidia \
--to=nicolinc@nvidia.com \
--cc=aik@amd.com \
--cc=baolu.lu@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mdf@kernel.org \
--cc=mshavit@google.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=shuah@kernel.org \
--cc=smostafa@google.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=zhangfei.gao@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox